The 6 Questions to Ask Before Adding a High-Cardinality Label
These articles are AI-generated summaries. Please check the original sources for full details.
The Checklist
A team recently added a pod_id label to debug a networking issue, resulting in a surge of 150,000 new time series per hour in Prometheus. This rapid increase in cardinality caused a production incident and highlighted the importance of carefully considering label additions.
Why This Matters
Ideal monitoring systems assume bounded cardinality, but real-world systems often exhibit high-cardinality labels that can overwhelm resources. Uncontrolled cardinality growth can lead to performance degradation, increased storage costs, and even system outages, as demonstrated by the team’s 3-hour production incident.
Key Insights
- 150,000 series/hour: Rate of new time series created by the
pod_idlabel addition. - Prometheus vs. ClickHouse: Prometheus pays cardinality costs at write time (memory), while ClickHouse incurs costs at query time (aggregation).
metric_relabel_configs: Prometheus feature for dropping labels to mitigate cardinality explosions.
Working Example
# Metrics to monitor for cardinality issues in Prometheus
prometheus_tsdb_head_series # Active series count
prometheus_tsdb_head_chunks_created_total # Rate of new chunks
prometheus_tsdb_symbol_table_size_bytes # Memory for interned strings
process_resident_memory_bytes # Actual memory usage
Practical Applications
- Stripe: Likely uses careful label selection and cardinality management for billing metrics across millions of customers.
- Pitfall: Adding request IDs as labels without considering the request rate can quickly overwhelm a Prometheus instance.
References:
Continue reading
Next article
Top 5 Agentic AI Website Builders for Rapid Application Development
Related Content
Mastering Grafana: A Technical Guide to Open-Source Monitoring and Observability
Grafana consolidates time-series data from sources like Prometheus into dashboards for real-time monitoring and alerting of system performance metrics.
2026 Guide to Free Website Monitoring Tools: SaaS vs. Self-Hosted
Reviewing 2026's top free monitoring tools like UptimeRobot and Uptime Kuma, comparing 5-minute SaaS limits against 20-second self-hosted check frequencies.
How I Installed Nagios on EC2 and Created My Own Disk Monitoring Plugin
Nagios monitors server health and services with custom plugins, enabling real-time alerts for disk usage and more.