Autovacuum Tuning and Bloat Prevention
Autovacuum Tuning and Bloat Prevention
Section 1 explained why dead tuples accumulate. This section focuses on preventing that accumulation from becoming a problem: tuning autovacuum to keep pace with dead tuple generation, configuring fillfactor for HOT updates, and monitoring bloat in production.
Autovacuum Trigger Mechanics
Autovacuum is controlled by a launcher process that wakes every autovacuum_naptime (default: 1 minute) and checks each table:
Should vacuum table T?
dead_tuples(T) > autovacuum_vacuum_threshold + (autovacuum_vacuum_scale_factor * live_tuples(T))
Should analyze table T?
modified_tuples(T) > autovacuum_analyze_threshold + (autovacuum_analyze_scale_factor * live_tuples(T))
Default values:
SHOW autovacuum_vacuum_threshold; -- 50
SHOW autovacuum_vacuum_scale_factor; -- 0.2
SHOW autovacuum_analyze_threshold; -- 50
SHOW autovacuum_analyze_scale_factor; -- 0.1
SHOW autovacuum_naptime; -- 1min
SHOW autovacuum_max_workers; -- 3
For the content platform’s tables:
Table | live_tuples | Vacuum Trigger (default) | Dead Tuples at Trigger
articles | 1,000,000 | 50 + 0.2 * 1M = 200,050 | 200,050 (too late!)
article_view_counts | 100,000 | 50 + 0.2 * 100K = 20,050 | 20,050
view_events | 10,000,000 | 50 + 0.2 * 10M = 2M | 2,000,050 (way too late!)
categories | 20 | 50 + 0.2 * 20 = 54 | 54 (fine, rarely updated)
The articles table tolerates 200,050 dead tuples before vacuum runs. At 10k articles/hour insertion rate and occasional updates, this is acceptable. But view_events with 10M rows waits until 2M dead tuples accumulate. That is catastrophic.
Per-Table Autovacuum Settings
Override global settings for tables with different update patterns:
-- High-update counter table: aggressive vacuum
ALTER TABLE article_view_counts SET (
autovacuum_vacuum_scale_factor = 0.01, -- 1% dead triggers vacuum
autovacuum_vacuum_threshold = 100, -- at least 100 dead
autovacuum_analyze_scale_factor = 0.05, -- reanalyze at 5% change
autovacuum_analyze_threshold = 100,
autovacuum_vacuum_cost_delay = 2, -- 2ms throttle delay (default: 20ms)
autovacuum_vacuum_cost_limit = 1000 -- higher I/O budget
);
-- Large event table: lower scale factor, still aggressive
ALTER TABLE view_events SET (
autovacuum_vacuum_scale_factor = 0.005, -- 0.5% = 50,000 dead tuples
autovacuum_vacuum_threshold = 1000,
autovacuum_analyze_scale_factor = 0.02,
autovacuum_vacuum_cost_delay = 5,
autovacuum_vacuum_cost_limit = 800
);
-- Articles table: moderate, mostly inserts
ALTER TABLE articles SET (
autovacuum_vacuum_scale_factor = 0.05, -- 5% = 50,000 dead tuples
autovacuum_vacuum_threshold = 500,
autovacuum_analyze_scale_factor = 0.05
);
New trigger calculations:
Table | Trigger After Dead Tuples | At 1000 updates/sec: Time to Trigger
article_view_counts | 100 + 0.01 * 100K = 1100 | 1.1 seconds
view_events | 1000 + 0.005 * 10M = 51K | 51 seconds
articles | 500 + 0.05 * 1M = 50,500 | rarely hit (low update rate)
Cost-Based Vacuum Throttling
Vacuum uses a cost-based I/O throttling system to prevent it from overwhelming the disk:
SHOW vacuum_cost_page_hit; -- 1 (cost for reading a page from shared_buffers)
SHOW vacuum_cost_page_miss; -- 2 (cost for reading a page from OS cache)
SHOW vacuum_cost_page_dirty; -- 20 (cost for dirtying a page)
SHOW vacuum_cost_limit; -- 200 (budget per cycle)
SHOW vacuum_cost_delay; -- 0 (manual vacuum: no delay)
SHOW autovacuum_vacuum_cost_delay; -- 2ms (autovacuum delay between cycles)
SHOW autovacuum_vacuum_cost_limit; -- -1 (use vacuum_cost_limit: 200)
The throttling loop:
while (table has dead tuples):
process pages until accumulated_cost >= cost_limit (200)
sleep cost_delay (2ms for autovacuum)
reset accumulated_cost
With default settings (cost_limit=200, cost_delay=2ms):
- Each cycle processes approximately 200 pages from shared_buffers (200/1 = 200) or 100 pages from disk (200/2 = 100)
- 500 cycles per second (1000ms / 2ms delay)
- Throughput: approximately 100,000 pages/sec from cache, 50,000 pages/sec from disk
For the counter table (1,284 pages): vacuum completes in approximately 13ms. Fast enough to keep pace with 1000 updates/sec.
For the view_events table (125,000 pages): vacuum takes approximately 2.5 seconds with aggressive settings. At 1000 new dead tuples/sec, 2,500 dead tuples accumulate during vacuum. The table reaches steady state at approximately 53,500 dead tuples (trigger at 51K + accumulation during vacuum).
Increasing Autovacuum Workers
With 3 default workers and many tables needing vacuum:
-- If autovacuum can't keep up with multiple busy tables
ALTER SYSTEM SET autovacuum_max_workers = 5;
SELECT pg_reload_conf();
Each worker can vacuum one table at a time. With 3 workers, if article_view_counts, view_events, and articles all need vacuum simultaneously, they each get a worker. If a fourth table needs vacuum, it waits.
The cost_limit is shared across workers:
-- Default: 200 shared across all workers
-- Each worker gets: 200 / autovacuum_max_workers = 40 per worker (with 5 workers)
-- Override per table to ensure hot tables get sufficient budget:
ALTER TABLE article_view_counts SET (autovacuum_vacuum_cost_limit = 500);
-- This table's worker gets its own 500 budget, not shared
Fillfactor Optimization
Fillfactor determines the percentage of each page filled during INSERT:
-- Default fillfactor = 100 (fill pages completely)
-- Reduced fillfactor leaves space for HOT updates
-- Calculate optimal fillfactor:
-- Average tuple size after UPDATE should fit in remaining space
-- Check current tuple size
SELECT avg(pg_column_size(t.*)) AS avg_tuple_bytes
FROM article_view_counts t
LIMIT 10000;
avg_tuple_bytes
-----------------
72
Each page is 8192 bytes. Page overhead is 24 bytes. Usable space: 8168 bytes. At 72 bytes per tuple:
Tuples per full page (fillfactor=100): 8168 / 72 = 113 tuples
Tuples per page (fillfactor=70): 8168 * 0.70 / 72 = 79 tuples
Free space per page (fillfactor=70): 8168 * 0.30 = 2450 bytes
Updates before page full: 2450 / 72 = 34 HOT updates before needing new page
With vacuum running every 1.1 seconds and each page receiving approximately 1 update/sec at most (100,000 rows, 1000 updates/sec distributed), each page accumulates at most 1-2 dead tuples before vacuum runs. Fillfactor 70 provides far more headroom than needed. A more conservative fillfactor 85 would suffice:
-- Fillfactor 85: balance between HOT updates and space efficiency
ALTER TABLE article_view_counts SET (fillfactor = 85);
Fillfactor | Table Size | HOT Ratio | Headroom (updates before page full)
100 | 98 MB | 55.6% | 0 (no free space)
90 | 109 MB | 87.2% | 11 updates
85 | 115 MB | 92.4% | 17 updates
70 | 140 MB | 94.9% | 34 updates
50 | 196 MB | 96.1% | 56 updates
The sweet spot for the content platform is fillfactor 85: 92.4% HOT ratio with only 17% more storage than fillfactor 100. Below 70, the diminishing HOT improvement does not justify the storage increase.
Monitoring Autovacuum Effectiveness
Dashboard Query: Are We Keeping Up?
-- Primary monitoring query: run every 5 minutes, alert if thresholds exceeded
SELECT
schemaname || '.' || relname AS table_name,
n_live_tup,
n_dead_tup,
round(100.0 * n_dead_tup / nullif(n_live_tup + n_dead_tup, 0), 2) AS dead_pct,
last_autovacuum,
last_autoanalyze,
autovacuum_count,
autoanalyze_count,
pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
CASE
WHEN n_dead_tup > 0.1 * n_live_tup THEN 'WARNING: >10% dead'
WHEN n_dead_tup > 0.05 * n_live_tup THEN 'ELEVATED: >5% dead'
ELSE 'OK'
END AS status
FROM pg_stat_user_tables
WHERE schemaname = 'public'
ORDER BY n_dead_tup DESC;
table_name | n_live_tup | n_dead_tup | dead_pct | last_autovacuum | status
-----------------------+------------+------------+----------+---------------------+--------
public.view_events | 10000000 | 48291 | 0.48 | 2026-05-29 09:42:01 | OK
public.article_view_counts | 100000 | 1847 | 1.81 | 2026-05-29 09:42:12 | OK
public.articles | 1000000 | 2184 | 0.22 | 2026-05-29 09:41:47 | OK
Vacuum Duration Tracking
-- Track how long each autovacuum takes
-- Enable verbose autovacuum logging for tables > 1000 pages
ALTER SYSTEM SET log_autovacuum_min_duration = 100; -- log if vacuum takes > 100ms
SELECT pg_reload_conf();
Log output:
LOG: automatic vacuum of table "content.public.article_view_counts":
index scans: 1
pages: 0 removed, 1284 remain, 0 skipped due to pins, 1142 skipped frozen
tuples: 1847 removed, 100000 remain, 0 are dead but not yet removable
removable cutoff: 184729847
frozen: 0 pages from table (0 total) had 0 tuples frozen
index scan needed: 42 pages from table (3.27% of total) had 1847 dead item identifiers removed
I/O timings: read=0.00 write=2.14
avg read rate: 0.000 MB/s, avg write rate: 8.472 MB/s
buffer usage: 184 hits, 0 misses, 42 dirtied
WAL usage: 84 records, 42 full page images, 348160 bytes
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.04 s
Key metrics:
elapsed: 0.04 s: Vacuum completed in 40ms (well within tolerance)1142 skipped frozen: Visibility map allowed skipping 89% of pages42 pages modified: Only 3.27% of pages contained dead tuples
Bloat Estimation with pgstattuple
-- Install extension
CREATE EXTENSION IF NOT EXISTS pgstattuple;
-- Detailed bloat analysis (WARNING: full table scan, slow on large tables)
SELECT * FROM pgstattuple('article_view_counts');
table_len | tuple_count | tuple_len | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space | free_percent
------------+-------------+-----------+---------------+------------------+----------------+--------------------+------------+--------------
148897792 | 100000 | 7200000 | 4.83 | 1847 | 133384 | 0.09 | 140547408 | 94.39
free_percent = 94.39%: This looks alarming but is expected with fillfactor=85. The “free space” includes the intentionally-empty portion of each page reserved for HOT updates. True bloat is measured by:
-- True bloat: actual size vs minimum possible size
SELECT
pg_size_pretty(pg_relation_size('article_view_counts')) AS actual_size,
pg_size_pretty((100000 * 72 + 24 * (100000 / 113))::bigint) AS minimum_size,
round(pg_relation_size('article_view_counts')::numeric /
((100000 * 72 + 24 * (100000 / 113))::numeric), 2) AS actual_vs_minimum
FROM pg_class WHERE relname = 'article_view_counts';
actual_size | minimum_size | actual_vs_minimum
-------------+--------------+-------------------
142 MB | 7.4 MB | 19.19
Wait: 19x the minimum? This includes page overhead, alignment padding, tuple headers (23 bytes each), and fillfactor space. The more meaningful comparison:
-- Compare to expected size with fillfactor=85
-- Expected: 100000 tuples * 72 bytes / (8168 * 0.85) = 1037 pages = 8.1 MB (data only)
-- Plus: page headers, item pointers, alignment = approximately 12 MB
-- Plus: fillfactor overhead (15%): approximately 14 MB
-- Actual: 142 MB... something is wrong.
-- The table was created before fillfactor change. VACUUM does not shrink files.
-- Need VACUUM FULL or pg_repack to reclaim space.
Emergency Bloat Remediation
When a table has grown far beyond its working size:
Option 1: VACUUM FULL (Locking)
-- WARNING: Exclusive lock on table for duration (minutes for large tables)
VACUUM FULL article_view_counts;
Before: 892 MB (9.1x bloated)
After: 115 MB (fillfactor 85 applied)
Lock duration: 4.2 seconds
VACUUM FULL rewrites the entire table, applying the current fillfactor. All indexes are rebuilt. The exclusive lock blocks all reads and writes.
Option 2: pg_repack (Online)
-- Install extension
CREATE EXTENSION pg_repack;
-- Repack without exclusive lock (takes share lock briefly at start and end)
-- Run from command line:
-- pg_repack --table article_view_counts --no-kill-backend content
pg_repack creates a shadow table, copies live data with new fillfactor, swaps the tables atomically. The original table remains readable during the copy.
Before: 892 MB
During repack: 892 MB + 115 MB (temporary copy)
After: 115 MB
Lock duration: < 100ms (swap only)
Total time: 12 seconds
Option 3: CREATE TABLE AS (Planned Maintenance)
-- Create new table with desired structure
BEGIN;
CREATE TABLE article_view_counts_new (LIKE article_view_counts INCLUDING ALL)
WITH (fillfactor = 85,
autovacuum_vacuum_scale_factor = 0.01,
autovacuum_vacuum_threshold = 100);
INSERT INTO article_view_counts_new SELECT * FROM article_view_counts;
-- Swap tables
ALTER TABLE article_view_counts RENAME TO article_view_counts_old;
ALTER TABLE article_view_counts_new RENAME TO article_view_counts;
DROP TABLE article_view_counts_old;
COMMIT;
This approach requires a brief window where the table name does not exist (between RENAME operations). Use within a transaction to make it atomic.
Autovacuum Tuning Methodology
Step-by-step process for tuning a table:
Step 1: Characterize the Workload
-- Measure update rate over 1 hour
SELECT relname, n_tup_ins, n_tup_upd, n_tup_del, n_tup_hot_upd
FROM pg_stat_user_tables
WHERE relname = 'article_view_counts';
-- Wait 1 hour, measure again
-- Delta gives updates/hour
Step 2: Calculate Acceptable Dead Tuple Level
Target: dead tuples never exceed 5% of live tuples
Live tuples: 100,000
Max acceptable dead: 5,000
Update rate: 1000/sec
Time to accumulate 5000 dead: 5 seconds
Vacuum must complete within 5 seconds of triggering
Step 3: Set Scale Factor and Threshold
-- Trigger vacuum well before 5% dead
-- Target: trigger at 1% dead = 1000 dead tuples
-- scale_factor * live_tuples + threshold = 1000
-- 0.01 * 100000 + 0 = 1000
-- But add threshold for safety:
ALTER TABLE article_view_counts SET (
autovacuum_vacuum_scale_factor = 0.005, -- 500 from scale factor
autovacuum_vacuum_threshold = 500 -- plus 500 fixed
);
-- Total trigger: 500 + 500 = 1000 dead tuples
Step 4: Tune Vacuum Speed
-- Ensure vacuum can process entire table within deadline
-- Table: 1284 pages
-- Default: 200 cost_limit / 2ms delay = 100 pages/cycle, 500 cycles/sec = 50,000 pages/sec
-- Time to vacuum 1284 pages: 1284/50000 = 26ms
-- Well under 5-second deadline
-- But if table grows to 100,000 pages:
-- Time: 100000/50000 = 2 seconds
-- Still acceptable. Keep defaults unless table grows larger.
Step 5: Configure Fillfactor for HOT
-- Check HOT ratio before tuning
SELECT n_tup_hot_upd::float / nullif(n_tup_upd, 0) AS hot_ratio
FROM pg_stat_user_tables
WHERE relname = 'article_view_counts';
-- If hot_ratio < 0.9: consider reducing fillfactor
-- Current tuple size: 72 bytes
-- Page free space needed for 1 in-place update: 72 bytes
-- Fillfactor 85 gives: 8168 * 0.15 = 1225 bytes free = 17 updates worth
-- More than enough for typical vacuum cycle
ALTER TABLE article_view_counts SET (fillfactor = 85);
Step 6: Validate After Tuning
-- Wait 1 hour, then check effectiveness
SELECT
relname,
n_dead_tup,
n_tup_hot_upd::float / nullif(n_tup_upd, 0) AS hot_ratio,
autovacuum_count,
last_autovacuum,
pg_size_pretty(pg_relation_size(c.oid)) AS size
FROM pg_stat_user_tables s
JOIN pg_class c ON c.relname = s.relname
WHERE s.relname = 'article_view_counts';
Expected after tuning:
relname | n_dead_tup | hot_ratio | autovacuum_count | size
----------------------+------------+-----------+------------------+--------
article_view_counts | 847 | 0.92 | 3247 | 115 MB
847 dead tuples (< 1% of live), 92% HOT ratio, vacuum running frequently (3247 times in the measurement period), stable table size.
The Content Platform Autovacuum Configuration
Complete configuration applied in production:
-- Global settings (postgresql.conf)
ALTER SYSTEM SET autovacuum_naptime = '30s'; -- check every 30s (default 1min)
ALTER SYSTEM SET autovacuum_max_workers = 4; -- 4 parallel workers (default 3)
ALTER SYSTEM SET autovacuum_vacuum_cost_delay = 2; -- 2ms (default 2ms in PG16+)
ALTER SYSTEM SET autovacuum_vacuum_cost_limit = 400; -- higher budget shared
ALTER SYSTEM SET log_autovacuum_min_duration = 100; -- log vacuums > 100ms
SELECT pg_reload_conf();
-- Per-table: high-update counter table
ALTER TABLE article_view_counts SET (
fillfactor = 85,
autovacuum_vacuum_scale_factor = 0.005,
autovacuum_vacuum_threshold = 500,
autovacuum_vacuum_cost_delay = 2,
autovacuum_vacuum_cost_limit = 500,
autovacuum_analyze_scale_factor = 0.02,
autovacuum_analyze_threshold = 500
);
-- Per-table: large event table (append-mostly, occasional deletes)
ALTER TABLE view_events SET (
autovacuum_vacuum_scale_factor = 0.005,
autovacuum_vacuum_threshold = 5000,
autovacuum_vacuum_cost_delay = 5,
autovacuum_vacuum_cost_limit = 800
);
-- Per-table: articles (mostly inserts, rare updates)
-- Default settings are fine for this table
Monitoring Automation
Alerting query to run every 5 minutes:
-- Alert conditions:
-- 1. Any table with > 10% dead tuples
-- 2. Any table where autovacuum hasn't run in > 1 hour (and has > 1000 dead tuples)
-- 3. Any table approaching freeze threshold
SELECT
'BLOAT_WARNING' AS alert_type,
relname,
n_dead_tup,
round(100.0 * n_dead_tup / nullif(n_live_tup, 0), 2) AS dead_pct
FROM pg_stat_user_tables
WHERE n_dead_tup > 0.1 * n_live_tup
AND n_live_tup > 1000
UNION ALL
SELECT
'VACUUM_STALLED' AS alert_type,
relname,
n_dead_tup,
extract(epoch FROM now() - last_autovacuum) / 3600 AS hours_since_vacuum
FROM pg_stat_user_tables
WHERE last_autovacuum < now() - interval '1 hour'
AND n_dead_tup > 1000
UNION ALL
SELECT
'FREEZE_WARNING' AS alert_type,
c.relname,
age(c.relfrozenxid) AS xid_age,
round(age(c.relfrozenxid)::numeric / 200000000 * 100, 1) AS freeze_pct
FROM pg_class c
WHERE c.relkind = 'r'
AND age(c.relfrozenxid) > 150000000; -- 75% of force-vacuum threshold
With this monitoring in place, the content platform catches bloat issues before they impact query performance. The combination of per-table autovacuum tuning, fillfactor for HOT updates, and proactive monitoring keeps all tables within 5% of their ideal size during steady-state operation.