Skip to main content
unbound mongodb at scale

Multi-Key Indexes and the $in Selectivity Trap

4 min read Chapter 30 of 72

Multi-Key Indexes and the $in Selectivity Trap

The Symptom

The alerting system queries readings for a set of sensors in a specific temperature range:

List<Document> results = collection.find(Filters.and(
    Filters.in("sensorId", sensorIds),     // 50-500 sensor IDs
    Filters.gt("temperature", threshold),
    Filters.gte("ts", Date.from(cutoff))
)).sort(Sorts.descending("ts"))
 .limit(100)
 .into(new ArrayList<>());

With the ESR-ordered index {sensorId: 1, ts: -1, temperature: 1}, this query takes 2ms when sensorIds has 5 entries but 850ms when it has 500 entries. The explain plan shows 500 separate index scans being merged.

The Cause

MongoDB treats $in as multiple equality matches. For a compound index {sensorId: 1, ts: -1, temperature: 1} and $in with 500 values, MongoDB performs 500 index scans (one per sensorId value), then merges the results. Each scan traverses the ts portion of the index to maintain sort order. With 500 scans, the merge operation becomes the bottleneck.

// explain() output for $in with 500 values
{
  "winningPlan": {
    "stage": "FETCH",
    "inputStage": {
      "stage": "SORT_MERGE",      // Merging 500 index scans
      "inputStages": [
        // 500 IXSCAN stages, one per sensorId
        { "stage": "IXSCAN", "indexBounds": { "sensorId": ["[\"sensor-00001\", ...]"] } },
        // ... 499 more
      ]
    }
  },
  "executionStats": {
    "totalKeysExamined": 125000,
    "totalDocsExamined": 100,
    "executionTimeMillis": 850
  }
}

125,000 keys examined for 100 returned documents.

The Benchmark

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Warmup(iterations = 3, time = 5)
@Measurement(iterations = 5, time = 10)
@Fork(1)
@State(Scope.Benchmark)
public class InQueryBenchmark {

    private MongoCollection<Document> collection;

    @Param({"5", "50", "100", "500"})
    private int inListSize;

    @Setup
    public void setup() {
        MongoClient client = MongoClients.create("mongodb://localhost:27017");
        collection = client.getDatabase("telemetry").getCollection("readings");
    }

    @Benchmark
    public List<Document> inQueryWithSort() {
        List<String> sensorIds = IntStream.range(0, inListSize)
            .mapToObj(i -> String.format("sensor-%05d", i))
            .toList();

        return collection.find(Filters.and(
            Filters.in("sensorId", sensorIds),
            Filters.gt("temperature", 30.0),
            Filters.gte("ts", Date.from(Instant.now().minus(1, ChronoUnit.HOURS)))
        )).sort(Sorts.descending("ts"))
          .limit(100)
          .into(new ArrayList<>());
    }

    @Benchmark
    public List<Document> inQueryWithoutSort() {
        List<String> sensorIds = IntStream.range(0, inListSize)
            .mapToObj(i -> String.format("sensor-%05d", i))
            .toList();

        return collection.find(Filters.and(
            Filters.in("sensorId", sensorIds),
            Filters.gt("temperature", 30.0),
            Filters.gte("ts", Date.from(Instant.now().minus(1, ChronoUnit.HOURS)))
        )).limit(100)
          .into(new ArrayList<>());
    }
}

Results:

Benchmark                                (inListSize)  Mode  Cnt    Score    Error  Units
InQueryBenchmark.inQueryWithSort                    5  avgt    5    2.000 ±  0.300  ms/op
InQueryBenchmark.inQueryWithSort                   50  avgt    5   45.000 ±  5.000  ms/op
InQueryBenchmark.inQueryWithSort                  100  avgt    5  180.000 ± 15.000  ms/op
InQueryBenchmark.inQueryWithSort                  500  avgt    5  850.000 ± 60.000  ms/op
InQueryBenchmark.inQueryWithoutSort                 5  avgt    5    1.500 ±  0.200  ms/op
InQueryBenchmark.inQueryWithoutSort                50  avgt    5    8.000 ±  1.000  ms/op
InQueryBenchmark.inQueryWithoutSort               100  avgt    5   15.000 ±  2.000  ms/op
InQueryBenchmark.inQueryWithoutSort               500  avgt    5   65.000 ±  8.000  ms/op

Without sort, the $in with 500 values takes 65ms (acceptable). With sort, it takes 850ms because the SORT_MERGE of 500 index scans dominates.

The Fix

Two approaches for large $in lists.

Approach 1: Batch the $in list and sort client-side.

// FAST: Batch $in queries and sort in application
public List<Document> getAlertReadings(List<String> sensorIds, double threshold,
                                        Instant cutoff, int limit) {
    int batchSize = 25;
    List<Document> allResults = new ArrayList<>();

    for (int i = 0; i < sensorIds.size(); i += batchSize) {
        List<String> batch = sensorIds.subList(i, Math.min(i + batchSize, sensorIds.size()));

        List<Document> batchResults = collection.find(Filters.and(
            Filters.in("sensorId", batch),
            Filters.gt("temperature", threshold),
            Filters.gte("ts", Date.from(cutoff))
        )).sort(Sorts.descending("ts"))
          .limit(limit)
          .into(new ArrayList<>());

        allResults.addAll(batchResults);
    }

    // Client-side sort and limit
    allResults.sort((a, b) -> b.getDate("ts").compareTo(a.getDate("ts")));
    return allResults.subList(0, Math.min(limit, allResults.size()));
}

With batches of 25, each query performs a 25-way SORT_MERGE, which takes approximately 10ms. For 500 sensors, 20 batches execute in 200ms total instead of a single 850ms query.

Approach 2: Use an aggregation pipeline with $group to avoid SORT_MERGE.

// FAST: Aggregation avoids SORT_MERGE for large $in
List<Document> results = collection.aggregate(List.of(
    Aggregates.match(Filters.and(
        Filters.in("sensorId", sensorIds),
        Filters.gt("temperature", threshold),
        Filters.gte("ts", Date.from(cutoff))
    )),
    Aggregates.sort(Sorts.descending("ts")),
    Aggregates.limit(limit)
)).into(new ArrayList<>());

The aggregation framework may choose a different execution strategy (block sort instead of SORT_MERGE) that performs better for large $in lists.

The Proof

Query pattern$in sizeBeforeAfter (batched)
With sort5045ms20ms (2 batches)
With sort100180ms40ms (4 batches)
With sort500850ms200ms (20 batches)

The Trade-off

Batching adds application complexity and increases the total number of queries. For 500 sensors in batches of 25, the application sends 20 queries instead of 1. Each query is fast, but the serial execution adds round trip overhead. In a cross-region deployment with 30ms round trip time, 20 queries add 600ms of network latency. Parallelize the batches with CompletableFuture to reduce the wall-clock time:

List<CompletableFuture<List<Document>>> futures = new ArrayList<>();
for (int i = 0; i < sensorIds.size(); i += batchSize) {
    List<String> batch = sensorIds.subList(i, Math.min(i + batchSize, sensorIds.size()));
    futures.add(CompletableFuture.supplyAsync(() ->
        collection.find(Filters.and(
            Filters.in("sensorId", batch),
            Filters.gt("temperature", threshold),
            Filters.gte("ts", Date.from(cutoff))
        )).sort(Sorts.descending("ts")).limit(limit).into(new ArrayList<>()),
        executor
    ));
}

This trades connection pool capacity for latency. Each parallel batch consumes a connection from the pool (sized in CH4).