vmstorage
Overview#
VictoriaMetrics storage is a highly optimized time series database implementation that serves as a drop-in replacement for Prometheus TSDB with significant performance improvements. This document provides a comprehensive study guide for understanding the vmstorage implementation.
Architecture Overview#
TBD
Data Model#
VictoriaMetrics storage consists of several key components working together:
| Concept | Purpose |
|---|---|
| MetricName | True identifier: cpu{host="x"}, global scope, the rest are all node scope |
| MetricID | uint64, the unique id of the metric (time series) |
| TSID | object, also the unique id for a time series, has more fields for sorting purpose |
| indexDB | Inverted index: labels → MetricIDs |
| table | Container for monthly partitions, LSM implementation |
| partition | One month of data with 3-tier LSM |
| part | One day of data, contains multiple blocks |
| Block | ~8K points for one TSID, compressed |
Data Structure to Disk Mapping#
Understand the table, partition, part and block hierarchy and how they map to the on-disk structure.
┌────────────────────────────────────────────────────────────────────────────┐
│ Data Structure → Disk Mapping │
├────────────────────────────────────────────────────────────────────────────┤
│ │
│ Storage vmstorage-data/ │
│ ├── idbCurr (indexDB) ───► ├── indexdb/<generation>/ │
│ │ └── tb (mergeset.Table) │ ├── parts.json │
│ │ └── fileParts[] │ └── <part>/ (items.bin, etc.) │
│ │ │
│ └── tb (table) ───► └── data/ │
│ └── ptws[] (partitions) ├── small/ │
│ └── partition "2024_07" │ └── 2024_07/ │
│ ├── smallParts[] ───► │ ├── parts.json │
│ │ └── part │ └── <part>/ │
│ │ ├── metaindex │ ├── metaindex.bin │
│ │ ├── indexFile │ ├── index.bin │
│ │ ├── timestamps │ ├── timestamps.bin │
│ │ └── values │ └── values.bin │
│ │ │
│ └── bigParts[] ───► └── big/ │
│ └── part └── 2024_07/ │
│ └── ... └── <part>/ │
│ │
└────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ data/small/2024_07/<part_id>/ │
│ ├── metadata.json ──────► { "RowsCount": 94057, "BlocksCount": 767, "MinTimestamp": ..., "MaxTimestamp": ... } │
│ ├── metaindex.bin ──────► Index of index.bin (small, loaded in memory) │
│ ├── index.bin ──────────► Contains ALL blockHeaders (767 headers in this example) │
│ │ ┌─────────────┬─────────────┬─────────────┬─────────────┬─────┐ │
│ │ │ blockHdr[0] │ blockHdr[1] │ blockHdr[2] │ blockHdr[3] │ ... │ (767 total) │
│ │ └──────┬──────┴──────┬──────┴──────┬──────┴──────┬──────┴─────┘ │
│ │ │ │ │ │ │
│ │ ▼ ▼ ▼ ▼ │
│ ├── timestamps.bin ─────► ┌───────────┬───────────┬───────────┬───────────┐ │
│ │ │ ts[0] │ ts[1] │ ts[2] │ ts[3] │ ... (compressed timestamp data) │
│ │ │ ≤8192 ts │ ≤8192 ts │ ≤8192 ts │ ≤8192 ts │ │
│ │ └───────────┴───────────┴───────────┴───────────┘ │
│ │ │ │ │ │
│ │ ▼ ▼ ▼ │
│ └── values.bin ─────────► ┌───────────┬───────────┬───────────┬───────────┐ │
│ │ val[0] │ val[1] │ val[2] │ val[3] │ ... (compressed value data) │
│ │ ≤8192 val │ ≤8192 val │ ≤8192 val │ ≤8192 val │ │
│ └───────────┴───────────┴───────────┴───────────┘ │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
│ │ BLOCK 0 │ │ BLOCK 1 │ │ BLOCK 2 │ │ BLOCK 3 │ ... (767 blocks total) │
│ └───────────┘ └───────────┘ └───────────┘ └───────────┘ │
│ ════════════════════════════════════════════════════════════════════════════════════════════════════════════ │
│ KEY INSIGHT: │
│ • One BLOCK = blockHeader + timestamps chunk + values chunk (all for ONE TSID, up to 8192 data points) │
│ • One PART = Many BLOCKs stored across 3 files (index.bin, timestamps.bin, values.bin) │
│ • Blocks are sorted by TSID within a part │
│ • Same TSID can have multiple blocks (if more than 8192 points) │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
LSM Tree#
There are TWO separate LSM trees in VictoriaMetrics:
- indexDB uses mergeset.Table (LSM tree for inverted index)
- Location: indexdb//
- Tiers: inmemoryParts → fileParts
- Each partition has its own LSM tree (for time-series data)
- Location: data/{small,big}/YYYY_MM/
- Tiers: inmemoryParts → smallParts → bigParts
Key Data Structures#
TSID (Time Series ID)#
The TSID provides hierarchical identification of time series, enabling efficient grouping and compression.
TSID vs MetricID#
- MetricID is for identification which is a unique identifier of a time series. (MetricID =
uint64(time.Now().UnixNano()) + 1) - TSID (including MetricID) is for sorting and grouping (has more fields)
type TSID struct { MetricGroupID uint64 // ID of metric group (e.g., "memory_usage") JobID uint32 // ID of job/service InstanceID uint32 // ID of instance/process MetricID uint64 // Unique ID of the metric }
IndexDB#
- The indexDB provides inverted index functionality for time series metadata. It enables fast lookups from metric names/labels to time series IDs (TSIDs)
- indexDB uses mergeset.Table (LSM tree for inverted index) as mentioned above
- see
createGlobalIndexesfunction for details
type indexDB struct {
name string
tb *mergeset.Table
s *Storage
// Cache for fast TagFilters -> MetricIDs lookup.
tagFiltersToMetricIDsCache *lrucache.Cache
// Cache for (date, tagFilter) -> loopsCount, which is used for reducing
// the amount of work when matching a set of filters.
loopsPerDateTagFilterCache *lrucache.Cache
// A cache that stores metricIDs that have been added to the index.
// The cache is not populated on startup nor does it store a complete set of
// metricIDs. A metricID is added to the cache either when a new entry is
// added to the global index or when the global index is searched for
// existing metricID (see is.createGlobalIndexes() and is.hasMetricID()).
//
// The cache is used solely for creating new index entries during the data
// ingestion (see Storage.RegisterMetricNames() and Storage.add())
metricIDCache *metricIDCache
// ...
}
| Name | Purpose |
|---|---|
MetricName → TSID |
Global metric name lookup (disabled by default) |
Tag → MetricIDs |
Global inverted index for tag filters |
MetricID → TSID |
Lookup TSID by MetricID |
MetricID → MetricName |
Lookup full metric name by MetricID |
DeletedMetricID |
Track deleted metrics |
Date → MetricID |
Per-day metric existence tracking |
(Date,Tag) → MetricIDs |
Per-day inverted index (main query path) |
(Date,MetricName) → TSID |
Per-day metric name to TSID lookup |
Write Path#
Read Path#
Read other posts