High-Cardinality Metrics and Dimensionality

Building on the concepts of distributed tracing and context propagation, this section delves into the challenges posed by high-cardinality metrics in modern monitoring systems. High cardinality refers to the number of unique label-value combinations in a given time series, which can lead to an explosion in the volume of unique series. This is particularly common in environments with high churn, such as ephemeral Kubernetes pods, where the rate at which new time series are created and old ones disappear can be quite rapid.

Understanding Time Series Churn

Time series churn is a critical factor in the design of metrics systems, especially when dealing with high-cardinality data. The use of labels such as pod_name in Prometheus can increase cardinality by 10x-100x in environments with high churn. This not only affects storage costs but also impacts query performance. To mitigate these effects, it’s essential to understand the sources of high cardinality and implement strategies to manage it effectively.

Architectural Patterns for Scaling

Both M3DB and TiDB (TiKV) offer architectural patterns designed to scale with high-cardinality metrics. M3DB, for instance, utilizes a distributed M3Coordinator for global query routing and write sharding, allowing for horizontal scaling by incrementing the number of shards and rebalancing the cluster topology [1]. TiDB, on the other hand, handles metrics via TiKV, which uses a Layered Lsm-tree (Log-Structured Merge-tree) for high-write throughput. Understanding these architectural patterns is crucial for designing metrics systems that can handle explosive cardinality.

Storage Cost Calculation

The storage cost for time series data can be roughly calculated using the formula: Cost = (Series Count * Samples per Second * Retention Period * Bytes per Sample) / Compression Ratio. A typical Prometheus sample takes about 1.3 to 2 bytes of disk space due to delta-delta encoding. However, compression algorithms like M3TSZ, based on Gorilla compression, can significantly reduce storage costs [2]. The choice of compression algorithm and the configuration of retention periods and resolutions can greatly impact the overall cost of storing high-cardinality metrics.

Managing High Cardinality

Managing high cardinality involves a combination of strategies, including the use of adaptive metrics, which can reduce stored time series by identifying unused labels, and optimizing storage configurations. For example, M3DB’s namespace configuration allows for multiple retention periods and resolutions for the same data stream, which can help in managing high-cardinality metrics. Additionally, tools like Grafana Cloud offer features like adaptive metrics that can reduce stored time series by 20-50% [1].

Sources

[1] https://last9.io/blog/how-to-manage-high-cardinality-metrics-in-prometheus/ [2] https://grafana.com/blog/2022/10/20/how-to-manage-high-cardinality-metrics-in-prometheus-and-kubernetes/