Skip to main content
fast by design

Query Performance: Execution Plans, Index Selection, and the Query the ORM Wrote

16 min read Chapter 49 of 90

Query Performance: Execution Plans, Index Selection, and the Query the ORM Wrote

The content platform serves 12,000 article search requests per second. Each search query joins articles, categories, and tags, filters by publication status, orders by relevance score, and returns 20 results. The JPA repository method that generates this query produces SQL the developer never reads. The database executes a plan the developer never sees. The query takes 45 ms. It should take 3 ms.

The gap between 45 ms and 3 ms is not hardware. It is not configuration. It is the query itself: what it asks for, how it asks for it, and the execution plan the database chooses. Understanding EXPLAIN ANALYZE output is the single most valuable skill for backend performance. Every other optimization (caching, pooling, sharding) is a workaround for queries that should have been fast in the first place.

This chapter covers execution plan reading, the specific queries the content platform’s ORM generates, and the tools for finding the queries that need attention.

EXPLAIN ANALYZE: What It Tells You

EXPLAIN shows the plan PostgreSQL would use. EXPLAIN ANALYZE runs the query and shows the plan with actual timing. Always use EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) for full diagnostic information:

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT a.id, a.title, a.slug, a.published_at
FROM articles a
WHERE a.status = 'published'
  AND a.category_id = 5
ORDER BY a.published_at DESC
LIMIT 20;

Output:

Limit  (cost=0.43..12.87 rows=20 width=68) (actual time=0.052..0.091 rows=20 loops=1)
  ->  Index Scan Backward using idx_articles_published_at on articles a
        (cost=0.43..4521.33 rows=7284 width=68) (actual time=0.051..0.088 rows=20 loops=1)
        Filter: ((status = 'published') AND (category_id = 5))
        Rows Removed by Filter: 142
        Buffers: shared hit=12
Planning Time: 0.134 ms
Execution Time: 0.112 ms

Every line in this output conveys information:

  • cost=0.43..12.87: Estimated startup cost (0.43) and total cost (12.87) in arbitrary units. These are planner estimates, not wall-clock time.
  • rows=20: Estimated row count. Compare with actual ... rows=20 to check estimate accuracy.
  • width=68: Average row width in bytes. Wider rows mean more I/O.
  • actual time=0.052..0.091: Wall-clock time for first row (0.052 ms) and last row (0.091 ms).
  • loops=1: How many times this node executed. Nested loops can have loops=1000+.
  • Buffers: shared hit=12: 12 pages read from the buffer cache. shared read=N would mean N pages fetched from disk.
  • Rows Removed by Filter: 142: The scan read 162 rows but only 20 passed the filter. This indicates the index does not cover the full WHERE clause.
  • Planning Time: Time spent choosing the plan. High values suggest excessive table statistics or complex joins.
  • Execution Time: Total wall-clock execution time.

The Five Scan Types That Matter

PostgreSQL chooses a scan type for each table access. Understanding why it chooses each one tells you whether the right indexes exist.

Seq Scan (Sequential Scan)

Reads every row in the table, page by page:

Seq Scan on articles  (cost=0.00..18924.00 rows=500000 width=120)
  (actual time=0.012..89.341 rows=500000 loops=1)
  Buffers: shared hit=8924

A Seq Scan reads all 8,924 pages of the articles table. This is correct when you need most of the rows (a reporting query that aggregates all articles). It is wrong when you need 20 rows out of 500,000. PostgreSQL chooses Seq Scan when no suitable index exists, when the query returns a large fraction of the table, or when the planner’s cost estimate says sequential I/O is cheaper than random I/O for the expected row count.

Index Scan

Uses a B-tree index to find matching rows, then fetches each row from the heap (table) for columns not in the index:

Index Scan using idx_articles_slug on articles a
  (cost=0.42..8.44 rows=1 width=120) (actual time=0.031..0.032 rows=1 loops=1)
  Index Cond: (slug = 'performance-engineering-guide')
  Buffers: shared hit=4

Four buffer hits: the index traversal (root page, internal page, leaf page) plus one heap page fetch. Index Scan is optimal for highly selective queries (returning few rows relative to table size).

Bitmap Index Scan + Bitmap Heap Scan

A two-phase approach. First, scan the index and build a bitmap of matching pages. Second, fetch those pages from the heap in physical order:

Bitmap Heap Scan on articles a  (cost=52.18..4318.22 rows=2500 width=120)
  (actual time=0.891..5.432 rows=2487 loops=1)
  Recheck Cond: (category_id = 5)
  Filter: (status = 'published')
  Rows Removed by Filter: 13
  Heap Blocks: exact=1842
  Buffers: shared hit=1856
  ->  Bitmap Index Scan on idx_articles_category_id  (cost=0.00..51.56 rows=2500 width=0)
        (actual time=0.612..0.612 rows=2500 loops=1)
        Buffers: shared hit=14

The Bitmap Index Scan finds 2,500 matching index entries and builds a bitmap. The Bitmap Heap Scan reads 1,842 heap pages in physical order instead of random order. This is faster than 2,500 random Index Scan lookups but slower than a single Index Scan returning one row. PostgreSQL uses Bitmap scans for medium-selectivity queries.

Hash Join

Builds a hash table from the smaller relation, then probes it with each row from the larger relation:

Hash Join  (cost=295.00..12847.00 rows=50000 width=156)
  (actual time=2.891..34.221 rows=49872 loops=1)
  Hash Cond: (a.category_id = c.id)
  Buffers: shared hit=9042
  ->  Seq Scan on articles a  (cost=0.00..10924.00 rows=500000 width=120)
        (actual time=0.008..21.112 rows=500000 loops=1)
        Buffers: shared hit=8924
  ->  Hash  (cost=170.00..170.00 rows=10000 width=36)
        (actual time=2.841..2.841 rows=10000 loops=1)
        Buckets: 16384  Batches: 1  Memory Usage: 597kB
        Buffers: shared hit=118
        ->  Seq Scan on categories c  (cost=0.00..170.00 rows=10000 width=36)
              (actual time=0.005..0.921 rows=10000 loops=1)
              Buffers: shared hit=118

The hash table uses 597 KB of memory. If it exceeded work_mem, PostgreSQL would spill to disk in multiple batches (Batches: N where N > 1). Hash Join is efficient for equi-joins (equality conditions) when one side fits in memory.

Nested Loop

For each row from the outer relation, scans the inner relation:

Nested Loop  (cost=0.43..1247.33 rows=20 width=156)
  (actual time=0.061..0.298 rows=20 loops=1)
  Buffers: shared hit=82
  ->  Index Scan Backward using idx_articles_published_at on articles a
        (cost=0.43..12.87 rows=20 width=120) (actual time=0.041..0.072 rows=20 loops=1)
        Filter: (status = 'published')
        Rows Removed by Filter: 3
        Buffers: shared hit=4
  ->  Index Scan using categories_pkey on categories c
        (cost=0.29..0.31 rows=1 width=36) (actual time=0.009..0.009 rows=1 loops=20)
        Index Cond: (id = a.category_id)
        Buffers: shared hit=60

Notice loops=20 on the inner scan. The categories primary key lookup executes 20 times, once per article row. Nested Loop is efficient when the outer side returns few rows and the inner side has an index. It is catastrophic when the outer side returns many rows: 10,000 outer rows with an inner index scan means 10,000 index lookups.

The Content Platform’s Article Search: ORM vs Hand-Tuned

The content platform has a search endpoint that finds articles by category, filtered by tags, ordered by a combination of recency and view count:

// JPA Repository method
@Query("SELECT a FROM Article a " +
       "JOIN a.categories c " +
       "JOIN a.tags t " +
       "WHERE c.slug = :categorySlug " +
       "AND t.name IN :tagNames " +
       "AND a.status = 'PUBLISHED' " +
       "ORDER BY (a.viewCount * 0.3 + EXTRACT(EPOCH FROM a.publishedAt) * 0.7) DESC")
Page<Article> findByCategoryAndTags(
    @Param("categorySlug") String categorySlug,
    @Param("tagNames") List<String> tagNames,
    Pageable pageable);

Hibernate generates:

-- ORM-generated SQL (simplified)
SELECT a.id, a.title, a.slug, a.content, a.status, a.view_count,
       a.published_at, a.created_at, a.updated_at, a.author_id,
       a.category_id, a.excerpt, a.featured_image, a.meta_description
FROM articles a
INNER JOIN article_categories ac ON a.id = ac.article_id
INNER JOIN categories c ON ac.category_id = c.id
INNER JOIN article_tags at ON a.id = at.article_id
INNER JOIN tags t ON at.tag_id = t.id
WHERE c.slug = 'performance'
  AND t.name IN ('java', 'jvm', 'gc')
  AND a.status = 'PUBLISHED'
ORDER BY (a.view_count * 0.3 + EXTRACT(EPOCH FROM a.published_at) * 0.7) DESC
LIMIT 20 OFFSET 0;

Three problems.

**Problem 1: SELECT ***. The ORM selects every column including content (average 15 KB per article). For a listing page that shows title, slug, excerpt, and published date, this fetches 300 KB of content data per page that the application discards.

Problem 2: Cartesian explosion. An article in 2 categories with 3 tags produces 6 rows in the join result. The database sorts 6 duplicate rows per article, the ORM deduplicates them in Java. With 500 matching articles, the intermediate result set is 3,000 rows instead of 500.

Problem 3: Computed ORDER BY. The expression (view_count * 0.3 + EXTRACT(EPOCH FROM published_at) * 0.7) cannot use any index. PostgreSQL must compute this for every matching row, then sort.

EXPLAIN ANALYZE for the ORM query:

Sort  (cost=28471.33..28478.83 rows=3000 width=15420)
  (actual time=42.112..42.198 rows=3000 loops=1)
  Sort Key: ((a.view_count * 0.3 + ...) DESC
  Sort Method: external merge  Disk: 45072kB
  Buffers: shared hit=12842, temp read=5634 temp written=5634
  ->  Hash Join  (cost=842.18..21247.33 rows=3000 width=15420)
        (actual time=4.221..28.441 rows=3000 loops=1)
        Hash Cond: (at.tag_id = t.id)
        Buffers: shared hit=12842
        ->  Hash Join  (cost=295.00..18924.00 rows=12000 width=15420)
              (actual time=2.891..22.112 rows=12000 loops=1)
              Hash Cond: (ac.category_id = c.id)
              ...

The sort spills to disk (external merge, Disk: 45072kB) because the 3,000 rows with their 15 KB content column exceed work_mem. Execution time: 42 ms.

The hand-tuned version:

-- FAST: Select only needed columns, eliminate duplicates before sorting
SELECT DISTINCT a.id, a.title, a.slug, a.excerpt, a.published_at, a.view_count
FROM articles a
WHERE a.status = 'published'
  AND EXISTS (
    SELECT 1 FROM article_categories ac
    JOIN categories c ON ac.category_id = c.id
    WHERE ac.article_id = a.id AND c.slug = 'performance'
  )
  AND EXISTS (
    SELECT 1 FROM article_tags at
    JOIN tags t ON at.tag_id = t.id
    WHERE at.article_id = a.id AND t.name IN ('java', 'jvm', 'gc')
  )
ORDER BY a.view_count DESC, a.published_at DESC
LIMIT 20;
Limit  (cost=0.87..52.33 rows=20 width=148)
  (actual time=0.312..0.891 rows=20 loops=1)
  Buffers: shared hit=124
  ->  Nested Loop Semi Join  (cost=0.87..8247.33 rows=3200 width=148)
        (actual time=0.311..0.878 rows=20 loops=1)
        Buffers: shared hit=124
        ->  Nested Loop Semi Join  (cost=0.43..4123.67 rows=6400 width=148)
              (actual time=0.198..0.612 rows=42 loops=1)
              Buffers: shared hit=62
              ->  Index Scan using idx_articles_status_viewcount on articles a
                    (cost=0.42..3124.00 rows=250000 width=148)
                    (actual time=0.041..0.112 rows=45 loops=1)
                    Index Cond: (status = 'published')
                    Buffers: shared hit=8
              ->  Index Scan using idx_article_categories_article_id on article_categories ac
                    (cost=0.29..0.35 rows=1 width=8)
                    (actual time=0.008..0.009 rows=1 loops=45)
                    ...
Planning Time: 0.412 ms
Execution Time: 0.934 ms

From 42 ms to 0.93 ms. The key changes:

  1. Projection: 148 bytes per row instead of 15,420. The sort stays in memory.
  2. EXISTS subqueries: Eliminate the cartesian product. The semi-join stops at the first match.
  3. Simplified ORDER BY: view_count DESC, published_at DESC uses a composite index. The computed expression could not.
  4. Early LIMIT pushdown: The Nested Loop Semi Join stops after 20 qualifying rows.

Parameter Sniffing and Plan Caching

PostgreSQL caches execution plans for prepared statements. The plan is optimized for the first set of parameter values. When subsequent calls use different parameter values that would benefit from a different plan, the cached plan may be suboptimal.

Consider the content platform’s article listing. The status column has three values:

statuscountpercentage
published487,00097.4%
draft12,0002.4%
archived1,0000.2%

A prepared statement for WHERE status = $1:

// This becomes a prepared statement after 5 executions
List<Article> articles = articleRepository.findByStatus(status);

If the first execution uses status = 'draft', PostgreSQL sees that draft matches 2.4% of rows and creates a plan using the index on status. This plan is cached. When the next call uses status = 'published' (97.4% of rows), the cached index scan plan is worse than a sequential scan because reading 97.4% of the table via random index lookups is slower than reading it sequentially.

PostgreSQL mitigates this with “custom” vs “generic” plans. After 5 executions with custom (parameter-specific) plans, PostgreSQL compares the custom plan cost with the generic plan cost and switches to generic if it is cheaper on average. But this heuristic fails when parameter distributions are bimodal (rare value vs common value).

Detect this with pg_stat_statements:

SELECT query,
       calls,
       mean_exec_time,
       stddev_exec_time,
       min_exec_time,
       max_exec_time
FROM pg_stat_statements
WHERE query LIKE '%articles%status%'
ORDER BY stddev_exec_time DESC;
                    query                     | calls |  mean  | stddev |  min  |  max
----------------------------------------------+-------+--------+--------+-------+-------
 SELECT ... FROM articles WHERE status = $1   |  8420 |  12.41 |  28.73 |  0.82 | 89.12

A standard deviation (28.73) larger than the mean (12.41) is a red flag. The same query sometimes takes 0.82 ms and sometimes 89 ms depending on which plan was cached.

The fix: use separate queries for each status value, or use CASE logic in the application layer:

// FAST: Separate queries force separate plans
public List<Article> findByStatus(String status) {
    if ("published".equals(status)) {
        return entityManager.createNativeQuery(
            "SELECT * FROM articles WHERE status = 'published' " +
            "ORDER BY published_at DESC LIMIT 100", Article.class)
            .getResultList();
    }
    return entityManager.createNativeQuery(
        "SELECT * FROM articles WHERE status = :status " +
        "ORDER BY published_at DESC LIMIT 100", Article.class)
        .setParameter("status", status)
        .getResultList();
}

pg_stat_statements: Finding the Queries That Matter

Most applications have thousands of distinct queries. Optimizing them all is impossible. pg_stat_statements identifies the queries that consume the most total time:

-- Top 10 queries by total execution time
SELECT
    substring(query, 1, 80) AS query_preview,
    calls,
    round(total_exec_time::numeric, 2) AS total_ms,
    round(mean_exec_time::numeric, 2) AS mean_ms,
    round(stddev_exec_time::numeric, 2) AS stddev_ms,
    rows,
    round((shared_blks_hit * 100.0 /
           NULLIF(shared_blks_hit + shared_blks_read, 0))::numeric, 1) AS cache_hit_pct
FROM pg_stat_statements
WHERE dbname = 'content_platform'
ORDER BY total_exec_time DESC
LIMIT 10;
            query_preview             | calls  | total_ms   | mean_ms | stddev_ms | rows    | cache_hit_pct
--------------------------------------+--------+------------+---------+-----------+---------+--------------
 SELECT a.id, a.title ... JOIN categ  | 842000 | 35364000.0 |   42.00 |     28.73 | 16840000|         87.2
 SELECT a.* FROM articles WHERE statu | 124000 |  1537600.0 |   12.40 |      4.21 | 12400000|         99.1
 UPDATE articles SET view_count = vie |  98000 |   980000.0 |   10.00 |      2.12 |    98000|         98.4
 INSERT INTO view_events (article_id, |  98000 |   490000.0 |    5.00 |      1.84 |    98000|         95.2
 SELECT t.name, COUNT(*) FROM tags t  |  12400 |   372000.0 |   30.00 |      8.42 |   124000|         78.4

The article search join query (row 1) dominates: 842,000 calls at 42 ms each = 35,364 seconds of total execution time. Optimizing this single query from 42 ms to 0.93 ms would save 34,554 seconds of database time per measurement period. No other optimization comes close.

Key columns to analyze:

  • total_exec_time: Calls x mean. A 1 ms query called 10 million times uses more total time than a 100 ms query called 1,000 times. Optimize by total time, not by mean time.
  • cache_hit_pct: Below 95% means the query’s working set exceeds the buffer pool. The tag aggregation query at 78.4% is doing significant disk I/O.
  • stddev_exec_time: High standard deviation relative to mean indicates plan instability or lock contention.
  • rows vs calls: Rows/calls gives average result set size. The article search returns 20 rows per call (16,840,000 / 842,000 = 20), confirming it is properly paginated.

The JPA N+1 Problem: EXPLAIN ANALYZE Proof

The N+1 problem is the most common ORM performance issue. It occurs when loading a collection relationship triggers one query per parent entity.

The content platform’s article listing page loads articles with their tags:

// Entity mapping
@Entity
public class Article {
    @Id
    private Long id;
    private String title;

    @ManyToMany(fetch = FetchType.LAZY)
    @JoinTable(name = "article_tags",
        joinColumns = @JoinColumn(name = "article_id"),
        inverseJoinColumns = @JoinColumn(name = "tag_id"))
    private Set<Tag> tags;
}

// Controller
@GetMapping("/articles")
public List<ArticleDto> listArticles() {
    List<Article> articles = articleRepository.findTop20ByStatusOrderByPublishedAtDesc("published");
    return articles.stream()
        .map(a -> new ArticleDto(a.getTitle(), a.getTags().stream()  // Triggers N queries
            .map(Tag::getName).toList()))
        .toList();
}

This generates 21 queries. The first query loads 20 articles:

-- Query 1: Load articles
SELECT a.id, a.title, a.slug, a.status, a.published_at, a.view_count
FROM articles a
WHERE a.status = 'published'
ORDER BY a.published_at DESC
LIMIT 20;
Limit  (cost=0.43..4.87 rows=20 width=120) (actual time=0.041..0.089 rows=20 loops=1)
  ->  Index Scan Backward using idx_articles_published_at on articles a
        (cost=0.43..35124.00 rows=487000 width=120) (actual time=0.039..0.084 rows=20 loops=1)
        Filter: (status = 'published')
        Buffers: shared hit=4
Planning Time: 0.089 ms
Execution Time: 0.112 ms

Then 20 individual tag queries:

-- Queries 2-21: Load tags for each article (repeated 20 times)
SELECT t.id, t.name
FROM tags t
INNER JOIN article_tags at ON t.id = at.tag_id
WHERE at.article_id = ?;  -- Different article_id each time
Nested Loop  (cost=0.58..16.63 rows=3 width=36) (actual time=0.022..0.031 rows=3 loops=1)
  Buffers: shared hit=6
  ->  Index Only Scan using article_tags_pkey on article_tags at
        (cost=0.29..8.31 rows=3 width=8) (actual time=0.012..0.015 rows=3 loops=1)
        Index Cond: (article_id = 42)
        Heap Fetches: 0
        Buffers: shared hit=3
  ->  Index Scan using tags_pkey on tags t
        (cost=0.29..0.31 rows=1 width=36) (actual time=0.004..0.004 rows=1 loops=3)
        Index Cond: (id = at.tag_id)
        Buffers: shared hit=3
Planning Time: 0.082 ms
Execution Time: 0.048 ms

Each individual tag query is fast (0.048 ms), but 20 of them add up: 20 x 0.048 = 0.96 ms for tag queries, plus 0.112 ms for the article query, plus 20 planning phases (20 x 0.082 = 1.64 ms). Total: 2.71 ms. This seems acceptable until you consider that each query is a separate round-trip to PostgreSQL. Under load, each round-trip includes network latency, connection acquisition, and potentially connection pool queuing.

At 1,000 concurrent requests, the 21 queries per request means 21,000 queries per second instead of 1,000. The connection pool saturates. The real cost is not the individual query execution time but the connection hold time and round-trip overhead.

The fix uses JOIN FETCH:

// FAST: Single query with JOIN FETCH
@Query("SELECT DISTINCT a FROM Article a " +
       "LEFT JOIN FETCH a.tags " +
       "WHERE a.status = 'PUBLISHED' " +
       "ORDER BY a.publishedAt DESC")
List<Article> findPublishedWithTags(Pageable pageable);

This generates one query:

SELECT DISTINCT a.id, a.title, a.slug, a.status, a.published_at,
       a.view_count, t.id, t.name
FROM articles a
LEFT JOIN article_tags at ON a.id = at.article_id
LEFT JOIN tags t ON at.tag_id = t.id
WHERE a.status = 'published'
ORDER BY a.published_at DESC;
Unique  (cost=0.87..52847.33 rows=20 width=156)
  (actual time=0.098..0.412 rows=20 loops=1)
  Buffers: shared hit=68
  ->  Nested Loop Left Join  (cost=0.87..52841.33 rows=60 width=156)
        (actual time=0.094..0.389 rows=60 loops=1)
        Buffers: shared hit=68
        ->  Index Scan Backward using idx_articles_published_at on articles a
              (cost=0.43..35124.00 rows=20 width=120)
              (actual time=0.042..0.089 rows=20 loops=1)
              Filter: (status = 'published')
              Buffers: shared hit=4
        ->  Index Scan using article_tags_pkey on article_tags at
              (cost=0.29..0.35 rows=3 width=16)
              (actual time=0.008..0.012 rows=3 loops=20)
              Index Cond: (article_id = a.id)
              Buffers: shared hit=64
Planning Time: 0.234 ms
Execution Time: 0.456 ms

One query, one round-trip, one connection acquisition. 0.456 ms instead of 2.71 ms. Under load, the improvement is larger because the connection pool pressure drops by 95% (1 query vs 21).

Trade-offs

Reading execution plans is diagnostic, not prescriptive. The plan that is optimal today may be wrong tomorrow as data distributions change. Maintaining hand-tuned native queries means losing Hibernate’s type safety and schema migration compatibility. pg_stat_statements adds approximately 5% overhead to query execution for tracking.

The biggest trade-off: rewriting ORM queries as native SQL increases maintenance burden. When the schema changes, native queries fail at runtime instead of compile time. The compromise is to use JPA for simple CRUD and native queries only for the small number of hot-path queries identified by pg_stat_statements.

PostgreSQL Execution Plan Tree