PostgreSQL Vacuum Part 3 — Production Monitoring, Bloat Removal, and Operations
Understanding Vacuum is different from actually handling it in production. Diagnose Bloat by the numbers with pgstattuple and pgstatindex, catch dead tuples, XID age, long-running transactions, and replication slot lag with a unified monitoring query set, compare VACUUM FULL, pg_repack, pg_squeeze, and REINDEX CONCURRENTLY by lock level and disk requirements, and prevent Bloat with fillfactor and HOT Updates — the final installment of the PostgreSQL Vacuum series.
Series overview
- Part 1 — Why VACUUM Exists: MVCC and Dead Tuples
- Part 2 — Autovacuum Tuning and XID Wraparound Disaster Prevention
- Part 3 — Production Monitoring, Bloat Removal, and Operations (current · series finale)
Table of Contents
- Precise Bloat Diagnosis: Using pgstattuple
- Production Monitoring Query Collection
- pg_stat_progress_vacuum — Real-Time Vacuum Tracking
- Bloat Removal Tool Comparison: VACUUM FULL vs pg_repack vs pg_squeeze
- OLTP / OLAP Mixed Environment Strategy
- Preventing Bloat with fillfactor
- FAQ
- Operations Checklist — Daily / Weekly / Monthly
- Series Final Summary
1. Precise Bloat Diagnosis: Using pgstattuple
n_dead_tup in pg_stat_user_tables works for a quick check. Accurate Bloat measurement requires the pgstattuple extension.
-- Install the extension (superuser required)
CREATE EXTENSION IF NOT EXISTS pgstattuple;
Measuring Table Bloat
-- Basic pgstattuple analysis
SELECT * FROM pgstattuple('public.orders');
Column interpretation:
| Column | Meaning | Decision threshold |
|---|---|---|
| table_len | Total file size of the table (bytes) | — |
| tuple_count | Number of live rows | — |
| tuple_len | Size of valid data | — |
| tuple_percent | Ratio of valid data | Lower = more Bloat |
| dead_tuple_count | Dead Tuple count | — |
| dead_tuple_percent | Dead Tuple ratio | Above 20% → VACUUM needed |
| free_space | Reusable space not yet returned to OS | — |
| free_percent | Reusable space ratio | Above 50% → consider VACUUM FULL / pg_repack |
-- Full-schema Bloat summary (run on a read-only replica for large databases)
SELECT *
FROM (
SELECT
schemaname,
tablename,
(pgstattuple(schemaname || '.' || tablename)).*
FROM pg_tables
WHERE schemaname = 'public'
) t
ORDER BY dead_tuple_percent DESC
LIMIT 20;
Warning:
pgstattupleperforms a full table scan and carries a significant I/O cost on large tables. In production, run it against specific tables only, or execute on a read-only replica.
Measuring Index Bloat
-- Index Bloat analysis
SELECT * FROM pgstatindex('public.orders_pkey');
| Column | Meaning | Decision threshold |
|---|---|---|
| avg_leaf_density | Data density of leaf pages | Below 20% → consider REINDEX |
| leaf_fragmentation | Leaf page fragmentation | Higher = worse performance |
-- Full-schema index Bloat summary
SELECT
c.relname AS index_name,
i.indrelid::regclass AS table_name,
pg_size_pretty(pg_relation_size(c.oid)) AS index_size,
(pgstatindex(c.oid)).avg_leaf_density,
(pgstatindex(c.oid)).leaf_fragmentation
FROM pg_index i
JOIN pg_class c ON c.oid = i.indexrelid
JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE n.nspname = 'public'
AND c.relkind = 'i'
ORDER BY pg_relation_size(c.oid) DESC
LIMIT 20;
2. Production Monitoring Query Collection
A set of queries ready to drop into a dashboard or cron-based alert script.
Comprehensive Vacuum Health Check (Dashboard Main Query)
WITH vacuum_stats AS (
SELECT
s.schemaname || '.' || s.relname AS table_name,
s.n_live_tup AS live_rows,
s.n_dead_tup AS dead_rows,
ROUND(100.0 * s.n_dead_tup
/ NULLIF(s.n_live_tup + s.n_dead_tup, 0), 1) AS dead_pct,
age(c.relfrozenxid) AS xid_age,
pg_size_pretty(pg_total_relation_size(c.oid)) AS total_size,
s.last_vacuum,
s.last_autovacuum,
s.vacuum_count,
s.autovacuum_count
FROM pg_stat_user_tables s
JOIN pg_class c ON c.relname = s.relname
AND c.relnamespace = (
SELECT oid FROM pg_namespace WHERE nspname = s.schemaname
)
)
SELECT
table_name,
live_rows,
dead_rows,
dead_pct,
xid_age,
total_size,
COALESCE(last_autovacuum::text, last_vacuum::text, 'NEVER') AS last_vacuumed,
CASE
WHEN xid_age > 1500000000 THEN 'XID CRITICAL'
WHEN dead_pct > 30 THEN 'BLOAT CRITICAL'
WHEN xid_age > 1000000000 THEN 'XID WARNING'
WHEN dead_pct > 15 THEN 'BLOAT WARNING'
WHEN last_autovacuum IS NULL
AND last_vacuum IS NULL THEN 'NEVER VACUUMED'
ELSE 'OK'
END AS health
FROM vacuum_stats
WHERE dead_rows > 100 OR xid_age > 100000000
ORDER BY
CASE WHEN xid_age > 1500000000 OR dead_pct > 30 THEN 0 ELSE 1 END,
dead_rows DESC;
Finding What Is Blocking Vacuum
-- ① Long-running transactions (Vacuum's biggest enemy)
SELECT
pid,
usename,
application_name,
state,
ROUND(EXTRACT(EPOCH FROM (now() - xact_start)) / 60, 1) AS tx_min,
LEFT(query, 100) AS query
FROM pg_stat_activity
WHERE xact_start IS NOT NULL
AND state NOT IN ('idle')
ORDER BY xact_start
LIMIT 10;
-- ② Inactive Replication Slots (cause Vacuum lag)
SELECT
slot_name,
active,
catalog_xmin,
age(catalog_xmin) AS catalog_age,
xmin,
age(xmin) AS xmin_age,
pg_size_pretty(
pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)
) AS wal_retained
FROM pg_replication_slots
ORDER BY age(xmin) DESC NULLS LAST;
-- ③ Vacuum Horizon holders by source
SELECT
'long_transaction' AS source,
pid::text AS identifier,
age(backend_xmin) AS horizon_age
FROM pg_stat_activity
WHERE backend_xmin IS NOT NULL
AND state != 'idle'
UNION ALL
SELECT
'replication_slot' AS source,
slot_name AS identifier,
age(xmin) AS horizon_age
FROM pg_replication_slots
WHERE xmin IS NOT NULL
ORDER BY horizon_age DESC
LIMIT 10;
Enabling Autovacuum Activity Logging
-- Log Autovacuum runs that take longer than 1 second
ALTER SYSTEM SET log_autovacuum_min_duration = '1s';
SELECT pg_reload_conf();
-- Then check pg_log for entries like:
-- automatic vacuum of table "mydb.public.orders": ...
-- automatic analyze of table "mydb.public.orders": ...
3. pg_stat_progress_vacuum — Real-Time Vacuum Tracking
The pg_stat_progress_vacuum view, introduced in PostgreSQL 9.6, provides real-time details on any Vacuum currently in progress.
-- Current Vacuum progress (PostgreSQL 17)
SELECT
p.pid,
p.datname,
p.relid::regclass AS table_name,
p.phase,
p.heap_blks_total,
p.heap_blks_scanned,
ROUND(100.0 * p.heap_blks_scanned
/ NULLIF(p.heap_blks_total, 0), 1) AS heap_scan_pct,
p.heap_blks_vacuumed,
p.index_vacuum_count,
p.num_dead_item_ids,
p.indexes_total,
p.indexes_processed,
ROUND(EXTRACT(EPOCH FROM now() - a.query_start) / 60, 1) AS running_min
FROM pg_stat_progress_vacuum p
JOIN pg_stat_activity a USING (pid);
Vacuum Phase Interpretation
| Phase | Meaning | Notes |
|---|---|---|
| initializing | Initialization | — |
| scanning heap | Scanning heap pages and collecting Dead Tuples | Usually the longest phase |
| vacuuming indexes | Removing Dead Pointers from indexes | Repeats once per index |
| vacuuming heap | Physically removing Dead Tuples from heap | — |
| cleaning up indexes | Finalizing index cleanup | — |
| truncating heap | Removing empty trailing pages | Space is returned here |
| performing final cleanup | Statistics update and wrap-up | — |
Tip: If
heap_blks_scannedstalls for a long time duringscanning heap, suspect lock contention or an I/O bottleneck.
4. Bloat Removal Tool Comparison: VACUUM FULL vs pg_repack vs pg_squeeze
When Dead Tuple removal is not enough and physical space reclamation is needed, three tools are available.
Side-by-Side Comparison
| Aspect | VACUUM FULL | pg_repack | pg_squeeze |
|---|---|---|---|
| Lock level | AccessExclusive (full table lock) | Short lock at start and end only | Short lock at start and end only |
| Online use | Not possible | Possible | Possible |
| Extra disk needed | ~1× table size | ~2× table size | ~2× table size |
| Implementation | PostgreSQL built-in | Trigger-based change capture | Logical decoding (server-side) |
| Automation | Manual | Manual (CLI tool) | Automatable (Background Worker) |
| PK/UK required | Not required | Required | Required |
| Index ordering | None | Yes (--order-by) | Yes (clustering_index) |
| Best for | Maintenance window, small tables | Medium/large tables, online Bloat removal | Continuous automated Bloat management |
Using pg_repack
# Install
apt install postgresql-17-repack
# or
CREATE EXTENSION pg_repack;
# Repack a specific table
pg_repack -h localhost -U postgres -d mydb -t public.orders
# Repack indexes only (data is fine, only indexes are bloated)
pg_repack -h localhost -U postgres -d mydb --only-indexes -t public.orders
# Repack the entire database (large maintenance operation)
pg_repack -h localhost -U postgres -d mydb --no-superuser-check
Warning:
pg_repackrequires approximately twice the table's disk size as working space. A 1 TB table needs up to 2 TB of additional free space during the operation. Verify available disk space before running.
Using pg_squeeze
-- Install
CREATE EXTENSION pg_squeeze;
-- Register a table for automatic Bloat monitoring
SELECT squeeze.add_squeeze_job(
tabschema := 'public',
tabname := 'orders',
threshold := 0.3, -- Auto-squeeze when Dead Tuple ratio exceeds 30%
max_retry := 3
);
-- Manual immediate squeeze
SELECT squeeze.squeeze_table('public', 'orders', NULL, NULL, NULL);
-- Check registered jobs
SELECT * FROM squeeze.tables;
pg_squeeze runs as a server-side Background Worker, requiring no external CLI. The table must have a PRIMARY KEY or UNIQUE constraint.
Removing Index Bloat: REINDEX CONCURRENTLY
-- Non-blocking index rebuild (PG 12+)
REINDEX INDEX CONCURRENTLY public.orders_pkey;
-- Rebuild all indexes on a table
REINDEX TABLE CONCURRENTLY public.orders;
Warning:
REINDEX CONCURRENTLYholds an xmin horizon for the duration of the run, which can indirectly affect Autovacuum on other tables. Run it outside peak hours.
5. OLTP / OLAP Mixed Environment Strategy
Running both OLTP (short, frequent transactions) and OLAP (long analytical queries) on the same PostgreSQL instance creates serious Vacuum problems.
The Core Conflict
[How a long OLAP query affects Vacuum]
OLAP query starts -----> 2-hour runtime -----> query ends
|
v Transaction snapshot created (Horizon frozen)
XID at snapshot creation time = "Transaction Horizon"
- What Vacuum can clean: Dead Tuples older than the Horizon
- What Vacuum cannot touch: Dead Tuples created after the Horizon started
Result: the OLTP table accumulates 2 hours worth of Dead Tuples -> Bloat
Three Mitigation Strategies
Strategy 1: Offload OLAP to a Read Replica (Recommended)
With OLAP queries removed from the Primary, Autovacuum can clean Dead Tuples without interference.
Strategy 2: Set idle_in_transaction_session_timeout
-- Apply timeouts to a dedicated analyst role
ALTER ROLE analyst SET statement_timeout = '30min';
ALTER ROLE analyst SET idle_in_transaction_session_timeout = '10min';
-- Apply database-wide
ALTER DATABASE mydb SET idle_in_transaction_session_timeout = '30min';
Strategy 3: Partitioning to Eliminate Vacuum Scope
Partition large history tables by date and manage old partitions with TRUNCATE or DROP. No Dead Tuples are generated, so Vacuum becomes unnecessary for those ranges.
-- Monthly partition table example
CREATE TABLE events (
id BIGSERIAL,
event_time TIMESTAMPTZ NOT NULL,
payload JSONB
) PARTITION BY RANGE (event_time);
CREATE TABLE events_2026_01 PARTITION OF events
FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');
-- Drop partitions older than 3 months (no VACUUM needed)
DROP TABLE events_2025_12;
| Approach | Advantage | Disadvantage |
|---|---|---|
| Read replica separation | Strongest isolation | Higher infrastructure cost |
| Timeout settings | Immediate to apply | Risk of interrupting analysis queries |
| Partitioning | Eliminates Vacuum entirely | Schema change required |
6. Preventing Bloat with fillfactor
Prevention is far cheaper than treatment. fillfactor instructs PostgreSQL to reserve free space on data pages ahead of time.
How HOT (Heap-Only Tuple) Update Works
When an UPDATE can write the new row version to the same page, PostgreSQL performs a HOT Update — modifying only the heap page without touching any indexes. This suppresses index Bloat and reduces Vacuum load.
[Page structure with fillfactor=90]
+-----------------------------------------------------+
| [row1][row2][row3]...[rowN] [10% free] |
+-----------------------------------------------------+
^
New version written here on UPDATE
-> HOT Update fires
-> No index modification
-- Apply fillfactor to a frequently updated table
CREATE TABLE sessions (
session_id UUID PRIMARY KEY,
user_id INT NOT NULL,
last_seen TIMESTAMPTZ,
data JSONB
) WITH (fillfactor = 80); -- 80% fill, 20% headroom
-- Apply to an existing table (takes effect after VACUUM FULL or pg_repack)
ALTER TABLE sessions SET (fillfactor = 80);
VACUUM FULL sessions; -- or use pg_repack for zero-downtime application
Recommended fillfactor Values
| Table type | Recommended fillfactor | Reason |
|---|---|---|
| Mostly-read reference tables | 100 (default) | No headroom needed |
| General OLTP tables | 90 | Moderate HOT Update opportunity |
| High-frequency update tables (sessions, counters) | 70–80 | Aggressive HOT Update |
| Append-Only tables (logs, events) | 100 | No UPDATEs |
7. FAQ
Q1. "Should I ever use VACUUM FULL?"
A: Not never — but almost never in production. VACUUM FULL takes an AccessExclusive Lock, blocking the entire table for the duration. It is appropriate when a maintenance window is available, the table is small (under a few tens of GB), and space reclamation is urgent. In almost every other case, pg_repack or pg_squeeze is the right choice.
Q2. "What happens if I turn off autovacuum?"
A: Never turn it off. Disabling autovacuum allows Dead Tuples to accumulate without limit and disables XID Wraparound protection, eventually causing the database to refuse all writes. If autovacuum feels like a problem, the answer is tuning — not disabling.
Q3. "Why doesn't the table shrink after VACUUM?"
A: Standard VACUUM removes Dead Tuples but does not return disk space to the OS. The reclaimed space is marked reusable and is filled first when new data arrives. To actually reduce the file size, VACUUM FULL or pg_repack is required. That said, if Bloat is low, a stable table size poses no performance problem.
Q4. "How do I tell if HOT Updates are happening?"
-- Check HOT Update ratio
SELECT
relname,
n_tup_upd AS total_updates,
n_tup_hot_upd AS hot_updates,
ROUND(100.0 * n_tup_hot_upd / NULLIF(n_tup_upd, 0), 1) AS hot_update_pct
FROM pg_stat_user_tables
WHERE n_tup_upd > 0
ORDER BY n_tup_upd DESC
LIMIT 20;
-- Higher hot_update_pct is better (minimizes index Bloat)
Q5. "VACUUM finished but Dead Tuples are still there — why?"
A: Vacuum cannot clean Dead Tuples that are newer than any open transaction's snapshot at the time Vacuum ran. If a long-running transaction was open when VACUUM executed, any Dead Tuples created after that transaction started are off-limits. Check pg_stat_activity for old transactions and clean them up first.
Q6. "How much does Bloat actually hurt performance?"
A: Bloat itself is not inherently bad — a moderate amount provides the headroom for HOT Updates. The problem is excessive Bloat. Once dead_pct climbs past 30–50%, sequential scan costs grow linearly and the extra heap lookups from index scans add latency. In practice, treating dead_pct > 20% as a warning threshold is common.
8. Operations Checklist — Daily / Weekly / Monthly
Daily Automated Monitoring (Set Up Cron/Alerts)
-- Alert 1: XID Age exceeds 1 billion
SELECT datname, age(datfrozenxid) AS xid_age
FROM pg_database
WHERE age(datfrozenxid) > 1000000000;
-- Alert 2: Dead Tuple ratio exceeds 30%
SELECT relname, n_dead_tup, n_live_tup,
ROUND(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 1) AS dead_pct
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
AND (100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0)) > 30;
-- Alert 3: Transaction open longer than 30 minutes
SELECT pid, usename, state,
ROUND(EXTRACT(EPOCH FROM (now() - xact_start)) / 60, 1) AS tx_min
FROM pg_stat_activity
WHERE xact_start < now() - interval '30 minutes'
AND state NOT IN ('idle');
-- Alert 4: Inactive Replication Slot accumulating WAL
SELECT slot_name, age(xmin) AS xmin_age,
pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS wal_lag
FROM pg_replication_slots
WHERE NOT active AND xmin IS NOT NULL;
Weekly Review Items
[ ] Check pg_stat_user_tables for tables with last_autovacuum older than one week
[ ] Review dead_pct trend per table (upward trend → tune autovacuum)
[ ] Inspect log_autovacuum_min_duration entries for long-running Autovacuum passes
[ ] Check pg_stat_progress_vacuum for any currently running Vacuum operations
[ ] Review Replication Slot health and accumulated WAL volume
Monthly Review Items
[ ] Run pgstattuple on the top 10 largest tables for precise Bloat measurement
[ ] Check pgstatindex avg_leaf_density per index (below 20% -> consider REINDEX)
[ ] Review XID Age trend graph per database
[ ] Run pg_repack or pg_squeeze on tables with excessive Bloat
[ ] Review autovacuum parameters: add per-table overrides for any newly large tables
[ ] Check for PostgreSQL minor version updates and plan for application
9. Series Final Summary
Over three parts, we have covered everything about PostgreSQL VACUUM. The complete picture in one diagram:
10 Core Principles for the Entire Series
- Never disable autovacuum — the moment you do, you are on the road to a Wraparound disaster
- Always override autovacuum parameters per table for large tables — global defaults are tuned for small tables
- On NVMe, set cost_delay=0 — the default throttle is a relic of the HDD era
- Alert immediately when XID Age exceeds 1 billion — left alone, the database will stop
- Long-running transactions are Vacuum's enemy — set idle_in_transaction_session_timeout
- VACUUM FULL is for maintenance windows only — use pg_repack or pg_squeeze in production
- Moderate Bloat is acceptable — dead_pct below 20% is within normal range
- Run OLAP queries on a read replica — long queries on the Primary block Vacuum
- Use fillfactor to encourage HOT Updates — prevention is far cheaper than remediation
- Measure regularly with pgstattuple — make decisions on data, not intuition
References
- PostgreSQL Official Documentation — Routine Vacuuming
- Microsoft Tech Blog — Managing bloat with pgstattuple
- Crunchy Data — Checking for PostgreSQL Bloat
- CYBERTEC — pg_squeeze
- AWS RDS — pg_repack Guide
- postgres.ai — How to deal with bloat