Skip to main content
fast by design

Searching at Scale: Binary Search, Index Structures, Crossover Points

10 min read Chapter 33 of 90

Searching at Scale: Binary Search, Index Structures, Crossover Points

The main chapter showed that HashMap beats binary search for single-key lookup at all sizes. This section expands the analysis: range queries, prefix searches, multi-key lookups, and the memory overhead that changes the answer when data lives in constrained caches.

Binary Search: Cache-Friendly but Logarithmic

Binary search on a sorted int[] accesses O(log n) elements. For 1,000,000 elements, that is ~20 comparisons. Each comparison accesses a different cache line. The first comparison hits the middle (index 500,000), the second hits quarter points (250,000 or 750,000), and so on. The first 3-4 levels of the search likely miss L1 cache (32KB). The remaining levels may hit L2 (256KB) or L3 (6-30MB depending on CPU).

For a sorted int[] of 1,000,000 elements (4MB total), the array fits in L3 cache on most server CPUs. After the first lookup warms the relevant cache lines, subsequent lookups to nearby keys benefit from spatial locality:

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 1)
@Measurement(iterations = 5, time = 1)
@Fork(2)
@State(Scope.Benchmark)
public class BinarySearchLocalityBenchmark {

    @Param({"1000000"})
    int size;

    int[] sorted;
    int[] randomQueries;
    int[] localQueries;

    @Setup(Level.Trial)
    public void setup() {
        sorted = new int[size];
        for (int i = 0; i < size; i++) sorted[i] = i * 3;

        ThreadLocalRandom rng = ThreadLocalRandom.current();
        // Random queries across full range
        randomQueries = new int[1000];
        for (int i = 0; i < 1000; i++) {
            randomQueries[i] = rng.nextInt(size) * 3;
        }
        // Local queries clustered in a small range
        localQueries = new int[1000];
        int center = (size / 2) * 3;
        for (int i = 0; i < 1000; i++) {
            localQueries[i] = center + rng.nextInt(-1000, 1001) * 3;
        }
    }

    @Benchmark
    public int searchRandom() {
        int sum = 0;
        for (int q : randomQueries) {
            sum += Arrays.binarySearch(sorted, q);    // Cache-cold
        }
        return sum;
    }

    @Benchmark
    public int searchLocal() {
        int sum = 0;
        for (int q : localQueries) {
            sum += Arrays.binarySearch(sorted, q);    // Cache-warm: FAST
        }
        return sum;
    }
}
Query PatternTime per Search (ns)
Random queries95
Local queries (clustered)38

Clustered queries are 2.5x faster because the bottom levels of the binary search tree share cache lines. The content platform’s article timeline queries are clustered: users browse recent articles, so searches concentrate in the latest 5% of the sorted timestamp array.

TreeMap vs Sorted Array for Range Queries

TreeMap provides O(log n) lookup and O(log n + k) range queries through subMap(). A sorted array provides O(log n) lookup through Arrays.binarySearch and O(log n + k) range queries through index arithmetic. The difference is memory layout.

TreeMap stores each entry in a TreeMap.Entry node object (40 bytes header + key ref + value ref + left/right/parent refs + color bit = ~64 bytes per entry). Each node is a separate heap object. Traversing the tree during a range query chases pointers across the heap.

A sorted array stores keys contiguously. Range query iteration is sequential memory access.

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@Warmup(iterations = 5, time = 1)
@Measurement(iterations = 5, time = 1)
@Fork(2)
@State(Scope.Benchmark)
public class RangeQueryBenchmark {

    @Param({"100000", "500000", "1000000"})
    int size;

    @Param({"100", "1000", "10000"})
    int rangeSize;

    TreeMap<Long, String> treeMap;
    long[] sortedTimestamps;
    String[] articleIds;
    long queryStart;

    @Setup(Level.Trial)
    public void setup() {
        treeMap = new TreeMap<>();
        sortedTimestamps = new long[size];
        articleIds = new String[size];

        long baseTime = 1_700_000_000_000L; // Epoch millis
        for (int i = 0; i < size; i++) {
            long ts = baseTime + i * 60_000L; // One article per minute
            sortedTimestamps[i] = ts;
            articleIds[i] = "art-" + i;
            treeMap.put(ts, articleIds[i]);
        }

        // Query the middle range
        int startIdx = size / 2;
        queryStart = sortedTimestamps[startIdx];
    }

    @Benchmark
    public List<String> rangeQueryTree() {
        long queryEnd = queryStart + rangeSize * 60_000L;
        NavigableMap<Long, String> sub = treeMap.subMap(
            queryStart, true, queryEnd, true);
        return new ArrayList<>(sub.values());         // SLOW: pointer chasing
    }

    @Benchmark
    public List<String> rangeQueryArray() {
        long queryEnd = queryStart + rangeSize * 60_000L;
        int fromIdx = Arrays.binarySearch(sortedTimestamps, queryStart);
        if (fromIdx < 0) fromIdx = -(fromIdx + 1);
        int toIdx = Arrays.binarySearch(sortedTimestamps, queryEnd);
        if (toIdx < 0) toIdx = -(toIdx + 1);
        else toIdx++;

        List<String> result = new ArrayList<>(toIdx - fromIdx);
        for (int i = fromIdx; i < toIdx; i++) {
            result.add(articleIds[i]);               // FAST: sequential access
        }
        return result;
    }
}
SizeRangeTreeMap (us)Sorted Array (us)Speedup
100K1004.20.85.3x
100K1,000325.16.3x
100K10,000310486.5x
1M1006.81.16.2x
1M1,000556.48.6x
1M10,000520589.0x

The sorted array is 5-9x faster for range queries. The advantage grows with data size because TreeMap’s pointer-chasing penalty increases as entries spread across more heap pages.

Trade-off: TreeMap supports O(log n) insertion and deletion. The sorted array requires O(n) insertion (shift all elements). For the content platform’s article index, which rebuilds every few minutes from the database, the rebuild cost is acceptable. For an index that updates on every request, TreeMap is the right choice.

Memory Footprint Comparison

Memory determines whether your index fits in L3 cache, which determines whether lookups hit cache or main memory:

public class MemoryFootprintComparison {

    public static void main(String[] args) {
        int size = 1_000_000;

        // Sorted int array: 4 bytes per element
        // Total: 4 * 1,000,000 = 4 MB
        int[] sortedInts = new int[size];

        // Sorted long array: 8 bytes per element
        // Total: 8 * 1,000,000 = 8 MB
        long[] sortedLongs = new long[size];

        // HashMap<Integer, String>:
        //   Bucket array: 16 bytes * 1,333,334 (capacity at 0.75 load) = ~20 MB
        //   Node objects: ~48 bytes each * 1,000,000 = ~48 MB
        //   Integer keys: ~16 bytes each * 1,000,000 = ~16 MB
        //   Total: ~84 MB
        HashMap<Integer, String> hashMap = HashMap.newHashMap(size);

        // TreeMap<Long, String>:
        //   Entry objects: ~64 bytes each * 1,000,000 = ~64 MB
        //   Long keys: ~24 bytes each * 1,000,000 = ~24 MB
        //   Total: ~88 MB
        TreeMap<Long, String> treeMap = new TreeMap<>();
    }
}
Structure1M EntriesFits in L3 (8MB)?
int[]4 MBYes
long[]8 MBBarely
HashMap<Integer, String>~84 MBNo
TreeMap<Long, String>~88 MBNo

The sorted int[] fits in L3 cache. HashMap and TreeMap do not. This explains why binary search on the sorted array approaches HashMap single-lookup performance at large sizes: the array benefits from cache hits while HashMap suffers cache misses on every pointer chase.

Interpolation Search: When Data Is Uniformly Distributed

Binary search always checks the middle element. If keys are uniformly distributed, interpolation search estimates the position:

public class InterpolationSearch {

    public static int interpolationSearch(int[] sorted, int key) {
        int low = 0;
        int high = sorted.length - 1;

        while (low <= high && key >= sorted[low] && key <= sorted[high]) {
            if (low == high) {
                return sorted[low] == key ? low : -(low + 1);
            }

            // Estimate position assuming uniform distribution
            int pos = low + ((key - sorted[low]) * (high - low))
                / (sorted[high] - sorted[low]);

            if (sorted[pos] == key) return pos;
            if (sorted[pos] < key) low = pos + 1;
            else high = pos - 1;
        }
        return -(low + 1);
    }
}

Interpolation search is O(log log n) on uniformly distributed data vs O(log n) for binary search. For 1,000,000 elements: log2(1,000,000) = ~20 comparisons for binary search, log2(log2(1,000,000)) = ~4 comparisons for interpolation search.

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 1)
@Measurement(iterations = 5, time = 1)
@Fork(2)
@State(Scope.Benchmark)
public class InterpolationVsBinaryBenchmark {

    @Param({"1000000"})
    int size;

    int[] uniform;     // 0, 3, 6, 9, ... (uniform spacing)
    int[] skewed;      // Non-uniform: clustered with gaps
    int searchKey;

    @Setup(Level.Trial)
    public void setup() {
        // Uniform distribution
        uniform = new int[size];
        for (int i = 0; i < size; i++) uniform[i] = i * 3;

        // Skewed: exponential-ish distribution
        skewed = new int[size];
        for (int i = 0; i < size; i++) {
            skewed[i] = (int) (Math.pow(i, 1.5));
        }

        searchKey = uniform[size / 3]; // Known key
    }

    @Benchmark
    public int binaryUniform() {
        return Arrays.binarySearch(uniform, searchKey);
    }

    @Benchmark
    public int interpolationUniform() {
        return InterpolationSearch.interpolationSearch(uniform, searchKey);
    }

    @Benchmark
    public int binarySkewed() {
        return Arrays.binarySearch(skewed, (int) Math.pow(size / 3, 1.5));
    }

    @Benchmark
    public int interpolationSkewed() {
        return InterpolationSearch.interpolationSearch(
            skewed, (int) Math.pow(size / 3, 1.5));
    }
}
DistributionBinary Search (ns)Interpolation (ns)Winner
Uniform9522Interpolation (4.3x)
Skewed95180Binary (1.9x faster)

Interpolation search wins on uniform data because it jumps directly to near the target in 2-3 steps. On skewed data, the position estimate overshoots or undershoots, causing it to degenerate to O(n) in the worst case.

Applicability: The content platform’s article IDs are sequential (uniform distribution). Interpolation search on the ID index cuts lookup time by 4x. The view count distribution is heavily skewed (power law), so binary search remains the right choice there.

Composite Index: Multi-Key Lookup

The content platform often queries by category and date range: “all articles in ‘java’ published in the last 7 days.” A single sorted array handles one key. A composite index handles two:

public class CompositeArticleIndex {

    // Entries sorted by (category, timestamp)
    private record IndexEntry(String category, long timestamp, String articleId)
            implements Comparable<IndexEntry> {
        @Override
        public int compareTo(IndexEntry other) {
            int cmp = this.category.compareTo(other.category);
            if (cmp != 0) return cmp;
            return Long.compare(this.timestamp, other.timestamp);
        }
    }

    private IndexEntry[] entries;

    public void build(List<Article> articles) {
        entries = articles.stream()
            .map(a -> new IndexEntry(a.category(), a.publishTimestamp(), a.id()))
            .sorted()
            .toArray(IndexEntry[]::new);
    }

    public List<String> query(String category, long fromTime, long toTime) {
        // Binary search for the start of the category+time range
        IndexEntry fromKey = new IndexEntry(category, fromTime, "");
        IndexEntry toKey = new IndexEntry(category, toTime, "\uffff");

        int fromIdx = Arrays.binarySearch(entries, fromKey);
        if (fromIdx < 0) fromIdx = -(fromIdx + 1);
        int toIdx = Arrays.binarySearch(entries, toKey);
        if (toIdx < 0) toIdx = -(toIdx + 1);

        List<String> result = new ArrayList<>(toIdx - fromIdx);
        for (int i = fromIdx; i < toIdx; i++) {
            result.add(entries[i].articleId());         // FAST: sorted scan
        }
        return result;
    }
}

The composite index replaces two separate lookups (filter by category, then filter by date) with a single binary search followed by a sequential scan. For 500,000 articles with 50 categories:

ApproachTime (us)
HashMap(category) then filter dates420
Composite sorted index12

The HashMap approach retrieves ~10,000 articles for the category and scans all of them. The composite index binary-searches to the exact start position and scans only matching entries.

The Search Strategy Decision Matrix

Access PatternBest StructureWhy
Single-key exact lookupHashMapO(1) amortized, ~10ns
Range query on one keySorted array + binarySearchO(log n + k), cache-friendly
Multi-key composite queryComposite sorted arraySingle binarySearch, sequential scan
Frequent inserts + lookupsTreeMapO(log n) for both
Uniform key distributionSorted array + interpolationO(log log n), 4x over binary
Memory-constrainedSorted primitive array20x less memory than HashMap
Concurrent reads + writesConcurrentSkipListMapLock-free range queries

Each structure has a regime where it wins. The content platform uses all of them: HashMap for the session cache (single-key, frequent access), sorted arrays for the publish-date index (range queries, infrequent updates), and the composite index for category-filtered timeline queries.