Applying ESR to the Telemetry Platform’s Query Patterns

The Symptom

The telemetry platform has 14 indexes on the readings collection. Several queries still show SORT stages in their explain plans. The index overhead is significant: 48 GB of index storage for a 68 GB collection. Some indexes overlap, and the most critical queries are not using the optimal index.

The Cause

Indexes were added reactively over 18 months. When a query was slow, a developer added an index. Nobody audited existing indexes for redundancy or checked whether the field order matched the query pattern.

Current indexes:

{ sensorId: 1 }                           // Used by 3 queries
{ sensorId: 1, ts: 1 }                    // Used by 2 queries
{ sensorId: 1, ts: -1 }                   // Used by 1 query, redundant with above
{ ts: 1 }                                 // Used by 1 query
{ sensorId: 1, temperature: 1 }           // Used by 0 queries (orphaned)
{ sensorId: 1, temperature: 1, ts: -1 }   // Used by 1 query, wrong ESR order
{ building: 1, floor: 1, ts: -1 }         // Used by 2 queries
// ... 7 more

The Benchmark

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@Warmup(iterations = 3, time = 5)
@Measurement(iterations = 5, time = 10)
@Fork(1)
@State(Scope.Benchmark)
public class ESRIndexBenchmark {

    private MongoCollection<Document> collection;

    @Setup
    public void setup() {
        MongoClient client = MongoClients.create("mongodb://localhost:27017");
        collection = client.getDatabase("telemetry").getCollection("readings");
    }

    @Benchmark
    public List<Document> wrongOrder_RangeBeforeSort() {
        // Uses index {sensorId: 1, temperature: 1, ts: -1}
        return collection.find(Filters.and(
            Filters.eq("sensorId", "sensor-00042"),
            Filters.gt("temperature", 30.0)
        )).sort(Sorts.descending("ts"))
          .limit(50)
          .hint(new Document("sensorId", 1).append("temperature", 1).append("ts", -1))
          .into(new ArrayList<>());
    }

    @Benchmark
    public List<Document> correctOrder_ESR() {
        // Uses index {sensorId: 1, ts: -1, temperature: 1}
        return collection.find(Filters.and(
            Filters.eq("sensorId", "sensor-00042"),
            Filters.gt("temperature", 30.0)
        )).sort(Sorts.descending("ts"))
          .limit(50)
          .hint(new Document("sensorId", 1).append("ts", -1).append("temperature", 1))
          .into(new ArrayList<>());
    }

    @Benchmark
    public List<Document> sensorByLocation() {
        // Query: readings from Building HQ, Floor 3, last hour, sorted by time
        return collection.find(Filters.and(
            Filters.eq("sensor.building", "HQ"),
            Filters.eq("sensor.floor", 3),
            Filters.gte("ts", Date.from(Instant.now().minus(1, ChronoUnit.HOURS)))
        )).sort(Sorts.descending("ts"))
          .limit(100)
          .into(new ArrayList<>());
    }
}

Results:

Benchmark                                     Mode  Cnt      Score     Error   Units
ESRIndexBenchmark.wrongOrder_RangeBeforeSort  avgt    5  45000.000 ± 3000.000  us/op
ESRIndexBenchmark.correctOrder_ESR            avgt    5   1800.000 ±  200.000  us/op
ESRIndexBenchmark.sensorByLocation            avgt    5   3200.000 ±  400.000  us/op

The ESR-ordered index is 25x faster for the temperature range query.

The Fix

Consolidate the 14 indexes into 5 that cover all query patterns with ESR ordering:

// FAST: Consolidated ESR-ordered indexes
collection.createIndex(
    new Document("sensorId", 1).append("ts", -1).append("temperature", 1),
    new IndexOptions().name("idx_sensor_time_temp")
);

collection.createIndex(
    new Document("sensor.building", 1).append("sensor.floor", 1)
        .append("ts", -1),
    new IndexOptions().name("idx_location_time")
);

collection.createIndex(
    new Document("ts", -1),
    new IndexOptions().name("idx_time")
);

collection.createIndex(
    new Document("sensorId", 1).append("sensor.type", 1).append("ts", -1),
    new IndexOptions().name("idx_sensor_type_time")
);

// Partial index for active alerts only
collection.createIndex(
    new Document("alertStatus", 1).append("ts", -1),
    new IndexOptions()
        .name("idx_active_alerts")
        .partialFilterExpression(
            Filters.eq("alertStatus", "active")
        )
);

Index consolidation strategy:

{sensorId: 1} is a prefix of {sensorId: 1, ts: -1, temperature: 1} and is no longer needed as a separate index.
{sensorId: 1, ts: 1} and {sensorId: 1, ts: -1} are merged. For queries that sort ascending, the B-tree can be traversed in reverse, though with a 10-15% overhead. Keep the descending version since that is the dominant query pattern.
{sensorId: 1, temperature: 1} has zero usage. Drop it.
{sensorId: 1, temperature: 1, ts: -1} is replaced by the ESR-ordered {sensorId: 1, ts: -1, temperature: 1}.

The Proof

Metric	Before (14 indexes)	After (5 indexes)
Total index size	48 GB	22 GB
Sensor + time + range query	45ms	1.8ms
Location + time query	12ms	3.2ms
Write latency (insert)	4.5ms	2.1ms
Index build time	N/A	12 min (initial build)

Write latency improved by 53% because each insert updates 5 indexes instead of 14.

The Trade-off

Fewer indexes means some secondary query patterns may not have a perfectly matching index. A query on {temperature: 1} alone cannot use the compound {sensorId: 1, ts: -1, temperature: 1} index (temperature is not a prefix). If that query pattern is important, it needs its own index. The consolidation prioritizes the high-frequency queries (sensor dashboards, location-based views, time-range analytics) at the expense of rare ad-hoc queries. For rare queries, a collection scan with a moderate-sized working set is acceptable, or they can be routed to an analytics replica.