Query Planning and Why Your Index Was Ignored
Query Planning and Why Your Index Was Ignored
The Black Box
The logistics platform’s delivery assignment query has an index on (warehouse_id, status). The query filters by both columns. The index should be used. EXPLAIN shows a sequential scan. The developer adds the index again, thinking it was not created. It already exists. The problem is not the index. The problem is the planner’s cost estimate.
The Mechanism
PostgreSQL’s query planner generates multiple candidate execution plans and estimates the cost of each. The cheapest plan wins. Cost is measured in abstract “cost units” that represent I/O and CPU work.
The two critical cost parameters:
seq_page_cost(default: 1.0): cost to read one page sequentiallyrandom_page_cost(default: 4.0): cost to read one page randomly (index lookup)
An index scan performs random I/O: jump to an index page, follow a pointer to a heap page, repeat. A sequential scan reads pages in order. The planner compares:
$$\text{Index cost} = \text{matching rows} \times \text{random_page_cost}$$ $$\text{Seq scan cost} = \text{total pages} \times \text{seq_page_cost}$$
When the filter matches a large fraction of the table, the sequential scan is cheaper because the total pages times 1.0 is less than the matching rows times 4.0.
-- Concept: EXPLAIN ANALYZE with BUFFERS for full visibility
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT package_id, status, assigned_driver
FROM delivery_assignments
WHERE warehouse_id = 'WH-042'
AND status = 'PENDING';
-- Plan A: Index Scan (planner's estimated cost: 12,432)
-- Index Scan using idx_assignments_warehouse_status on delivery_assignments
-- Index Cond: (warehouse_id = 'WH-042' AND status = 'PENDING')
-- Buffers: shared hit=142 read=18
-- Actual time: 0.8..2.1 ms
-- Rows: 342
-- Plan B: Sequential Scan (planner's estimated cost: 8,291) -- CHOSEN
-- Seq Scan on delivery_assignments
-- Filter: (warehouse_id = 'WH-042' AND status = 'PENDING')
-- Rows Removed by Filter: 892000
-- Buffers: shared hit=6200 read=1800
-- Actual time: 0.1..890.4 ms
-- Rows: 342
The planner chose the sequential scan (cost 8,291) over the index scan (cost 12,432). The sequential scan took 890ms. The index scan would have taken 2ms. The planner was wrong. Why?
Stale Statistics
The planner’s cost estimate depends on pg_statistic, which contains column statistics gathered by ANALYZE. If statistics are stale, the planner’s row count estimates are wrong, and it chooses the wrong plan.
-- Concept: stale statistics causing bad plan choices
-- Check when statistics were last gathered
SELECT
relname,
last_analyze,
last_autoanalyze,
n_live_tup,
n_dead_tup
FROM pg_stat_user_tables
WHERE relname = 'delivery_assignments';
-- relname | last_autoanalyze | n_live_tup | n_dead_tup
-- delivery_assignments | 2024-11-10 03:00:00 | 500000 | 120000
-- Autoanalyze ran 5 days ago. Since then, 400,000 new rows were inserted.
-- The planner thinks the table has 500,000 rows. It actually has 900,000.
-- The planner overestimates the fraction of rows matching status='PENDING'
-- because the statistics reflect the old distribution.
-- Fix: run ANALYZE manually
ANALYZE delivery_assignments;
-- After ANALYZE, re-run the query. The planner now has accurate statistics
-- and chooses the index scan.
Composite Index Column Order
A composite index on (warehouse_id, status) is a B-Tree sorted first by warehouse_id, then by status within each warehouse_id. The index can efficiently answer:
WHERE warehouse_id = 'WH-042'(leading column match)WHERE warehouse_id = 'WH-042' AND status = 'PENDING'(both columns match)
It cannot efficiently answer:
WHERE status = 'PENDING'(non-leading column only)
-- Concept: composite index column order determines usability
EXPLAIN ANALYZE
SELECT * FROM delivery_assignments WHERE status = 'PENDING';
-- The index on (warehouse_id, status) cannot be used.
-- The B-Tree is sorted by warehouse_id first. To find all 'PENDING' entries,
-- it would need to scan every warehouse_id subtree.
-- The planner correctly chooses a sequential scan.
-- To support queries filtering only by status, create a separate index:
CREATE INDEX idx_assignments_status ON delivery_assignments (status);
The Observable Consequence
Join Strategies
The planner chooses between three join algorithms, each with different performance characteristics:
| Strategy | When chosen | Cost |
|---|---|---|
| Nested Loop | One side has very few rows (<100) | $O(n \times m)$ but fast for small $n$ |
| Hash Join | No useful index, medium-sized tables | $O(n + m)$, builds hash table in memory |
| Merge Join | Both sides are sorted (index or pre-sorted) | $O(n + m)$ with no hash table |
-- Concept: join strategy visibility in EXPLAIN
EXPLAIN ANALYZE
SELECT da.package_id, da.assigned_driver, p.weight
FROM delivery_assignments da
JOIN packages p ON da.package_id = p.package_id
WHERE da.warehouse_id = 'WH-042';
-- Nested Loop (actual time=0.5..3.2 ms rows=342)
-- -> Index Scan using idx_assignments_warehouse on delivery_assignments da
-- Index Cond: (warehouse_id = 'WH-042')
-- Rows: 342
-- -> Index Scan using packages_pkey on packages p
-- Index Cond: (package_id = da.package_id)
-- Rows: 1 (per loop)
-- Total: 342 loops of 1 index lookup each. Fast because the outer side is small.
The Decision Rule
When a query uses a sequential scan and you have an index that should apply: run ANALYZE on the table first. If the plan does not change, check random_page_cost (set to 1.1 for SSDs). If the plan still does not change, check the selectivity of your filter. An index scan on a column where 60% of rows match is genuinely slower than a sequential scan.
For composite indexes, put the most selective column first if your queries always filter on both columns. If queries sometimes filter on only one column, you need a separate index on that column.
Never create an index and assume the planner will use it. Run EXPLAIN ANALYZE to verify. The planner’s decision is rational given its cost model and statistics. If the decision is wrong, the fix is usually stale statistics or a misconfigured cost parameter, not a missing index.