Skip to main content

On This Page

The 6 Questions to Ask Before Adding a High-Cardinality Label

2 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

The Checklist

A team recently added a pod_id label to debug a networking issue, resulting in a surge of 150,000 new time series per hour in Prometheus. This rapid increase in cardinality caused a production incident and highlighted the importance of carefully considering label additions.

Why This Matters

Ideal monitoring systems assume bounded cardinality, but real-world systems often exhibit high-cardinality labels that can overwhelm resources. Uncontrolled cardinality growth can lead to performance degradation, increased storage costs, and even system outages, as demonstrated by the team’s 3-hour production incident.

Key Insights

  • 150,000 series/hour: Rate of new time series created by the pod_id label addition.
  • Prometheus vs. ClickHouse: Prometheus pays cardinality costs at write time (memory), while ClickHouse incurs costs at query time (aggregation).
  • metric_relabel_configs: Prometheus feature for dropping labels to mitigate cardinality explosions.

Working Example

# Metrics to monitor for cardinality issues in Prometheus
prometheus_tsdb_head_series # Active series count
prometheus_tsdb_head_chunks_created_total # Rate of new chunks
prometheus_tsdb_symbol_table_size_bytes # Memory for interned strings
process_resident_memory_bytes # Actual memory usage

Practical Applications

  • Stripe: Likely uses careful label selection and cardinality management for billing metrics across millions of customers.
  • Pitfall: Adding request IDs as labels without considering the request rate can quickly overwhelm a Prometheus instance.

References:

Continue reading

Next article

Top 5 Agentic AI Website Builders for Rapid Application Development

Related Content