Applying ESR to the Telemetry Platform's Query Patterns
Applying ESR to the Telemetry Platform’s Query Patterns
The Symptom
The telemetry platform has 14 indexes on the readings collection. Several queries still show SORT stages in their explain plans. The index overhead is significant: 48 GB of index storage for a 68 GB collection. Some indexes overlap, and the most critical queries are not using the optimal index.
The Cause
Indexes were added reactively over 18 months. When a query was slow, a developer added an index. Nobody audited existing indexes for redundancy or checked whether the field order matched the query pattern.
Current indexes:
{ sensorId: 1 } // Used by 3 queries
{ sensorId: 1, ts: 1 } // Used by 2 queries
{ sensorId: 1, ts: -1 } // Used by 1 query, redundant with above
{ ts: 1 } // Used by 1 query
{ sensorId: 1, temperature: 1 } // Used by 0 queries (orphaned)
{ sensorId: 1, temperature: 1, ts: -1 } // Used by 1 query, wrong ESR order
{ building: 1, floor: 1, ts: -1 } // Used by 2 queries
// ... 7 more
The Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@Warmup(iterations = 3, time = 5)
@Measurement(iterations = 5, time = 10)
@Fork(1)
@State(Scope.Benchmark)
public class ESRIndexBenchmark {
private MongoCollection<Document> collection;
@Setup
public void setup() {
MongoClient client = MongoClients.create("mongodb://localhost:27017");
collection = client.getDatabase("telemetry").getCollection("readings");
}
@Benchmark
public List<Document> wrongOrder_RangeBeforeSort() {
// Uses index {sensorId: 1, temperature: 1, ts: -1}
return collection.find(Filters.and(
Filters.eq("sensorId", "sensor-00042"),
Filters.gt("temperature", 30.0)
)).sort(Sorts.descending("ts"))
.limit(50)
.hint(new Document("sensorId", 1).append("temperature", 1).append("ts", -1))
.into(new ArrayList<>());
}
@Benchmark
public List<Document> correctOrder_ESR() {
// Uses index {sensorId: 1, ts: -1, temperature: 1}
return collection.find(Filters.and(
Filters.eq("sensorId", "sensor-00042"),
Filters.gt("temperature", 30.0)
)).sort(Sorts.descending("ts"))
.limit(50)
.hint(new Document("sensorId", 1).append("ts", -1).append("temperature", 1))
.into(new ArrayList<>());
}
@Benchmark
public List<Document> sensorByLocation() {
// Query: readings from Building HQ, Floor 3, last hour, sorted by time
return collection.find(Filters.and(
Filters.eq("sensor.building", "HQ"),
Filters.eq("sensor.floor", 3),
Filters.gte("ts", Date.from(Instant.now().minus(1, ChronoUnit.HOURS)))
)).sort(Sorts.descending("ts"))
.limit(100)
.into(new ArrayList<>());
}
}
Results:
Benchmark Mode Cnt Score Error Units
ESRIndexBenchmark.wrongOrder_RangeBeforeSort avgt 5 45000.000 ± 3000.000 us/op
ESRIndexBenchmark.correctOrder_ESR avgt 5 1800.000 ± 200.000 us/op
ESRIndexBenchmark.sensorByLocation avgt 5 3200.000 ± 400.000 us/op
The ESR-ordered index is 25x faster for the temperature range query.
The Fix
Consolidate the 14 indexes into 5 that cover all query patterns with ESR ordering:
// FAST: Consolidated ESR-ordered indexes
collection.createIndex(
new Document("sensorId", 1).append("ts", -1).append("temperature", 1),
new IndexOptions().name("idx_sensor_time_temp")
);
collection.createIndex(
new Document("sensor.building", 1).append("sensor.floor", 1)
.append("ts", -1),
new IndexOptions().name("idx_location_time")
);
collection.createIndex(
new Document("ts", -1),
new IndexOptions().name("idx_time")
);
collection.createIndex(
new Document("sensorId", 1).append("sensor.type", 1).append("ts", -1),
new IndexOptions().name("idx_sensor_type_time")
);
// Partial index for active alerts only
collection.createIndex(
new Document("alertStatus", 1).append("ts", -1),
new IndexOptions()
.name("idx_active_alerts")
.partialFilterExpression(
Filters.eq("alertStatus", "active")
)
);
Index consolidation strategy:
{sensorId: 1}is a prefix of{sensorId: 1, ts: -1, temperature: 1}and is no longer needed as a separate index.{sensorId: 1, ts: 1}and{sensorId: 1, ts: -1}are merged. For queries that sort ascending, the B-tree can be traversed in reverse, though with a 10-15% overhead. Keep the descending version since that is the dominant query pattern.{sensorId: 1, temperature: 1}has zero usage. Drop it.{sensorId: 1, temperature: 1, ts: -1}is replaced by the ESR-ordered{sensorId: 1, ts: -1, temperature: 1}.
The Proof
| Metric | Before (14 indexes) | After (5 indexes) |
|---|---|---|
| Total index size | 48 GB | 22 GB |
| Sensor + time + range query | 45ms | 1.8ms |
| Location + time query | 12ms | 3.2ms |
| Write latency (insert) | 4.5ms | 2.1ms |
| Index build time | N/A | 12 min (initial build) |
Write latency improved by 53% because each insert updates 5 indexes instead of 14.
The Trade-off
Fewer indexes means some secondary query patterns may not have a perfectly matching index. A query on {temperature: 1} alone cannot use the compound {sensorId: 1, ts: -1, temperature: 1} index (temperature is not a prefix). If that query pattern is important, it needs its own index. The consolidation prioritizes the high-frequency queries (sensor dashboards, location-based views, time-range analytics) at the expense of rare ad-hoc queries. For rare queries, a collection scan with a moderate-sized working set is acceptable, or they can be routed to an analytics replica.