Thursday, June 4, 2026
All posts
Lv.3 IntermediatePostgreSQL
28 min readLv.3 Intermediate
SeriesPostgreSQL Vacuum Complete Guide · Part 3View series hub

PostgreSQL Vacuum Part 3 — Production Monitoring, Bloat Removal, and Operations

PostgreSQL Vacuum Part 3 — Production Monitoring, Bloat Removal, and Operations

Understanding Vacuum is different from actually handling it in production. Diagnose Bloat by the numbers with pgstattuple and pgstatindex, catch dead tuples, XID age, long-running transactions, and replication slot lag with a unified monitoring query set, compare VACUUM FULL, pg_repack, pg_squeeze, and REINDEX CONCURRENTLY by lock level and disk requirements, and prevent Bloat with fillfactor and HOT Updates — the final installment of the PostgreSQL Vacuum series.

Series overview

  • Part 1 — Why VACUUM Exists: MVCC and Dead Tuples
  • Part 2 — Autovacuum Tuning and XID Wraparound Disaster Prevention
  • Part 3 — Production Monitoring, Bloat Removal, and Operations (current · series finale)

Table of Contents

  1. Precise Bloat Diagnosis: Using pgstattuple
  2. Production Monitoring Query Collection
  3. pg_stat_progress_vacuum — Real-Time Vacuum Tracking
  4. Bloat Removal Tool Comparison: VACUUM FULL vs pg_repack vs pg_squeeze
  5. OLTP / OLAP Mixed Environment Strategy
  6. Preventing Bloat with fillfactor
  7. FAQ
  8. Operations Checklist — Daily / Weekly / Monthly
  9. Series Final Summary

1. Precise Bloat Diagnosis: Using pgstattuple

n_dead_tup in pg_stat_user_tables works for a quick check. Accurate Bloat measurement requires the pgstattuple extension.

-- Install the extension (superuser required)
CREATE EXTENSION IF NOT EXISTS pgstattuple;

Measuring Table Bloat

-- Basic pgstattuple analysis
SELECT * FROM pgstattuple('public.orders');

Column interpretation:

ColumnMeaningDecision threshold
table_lenTotal file size of the table (bytes)
tuple_countNumber of live rows
tuple_lenSize of valid data
tuple_percentRatio of valid dataLower = more Bloat
dead_tuple_countDead Tuple count
dead_tuple_percentDead Tuple ratioAbove 20% → VACUUM needed
free_spaceReusable space not yet returned to OS
free_percentReusable space ratioAbove 50% → consider VACUUM FULL / pg_repack
-- Full-schema Bloat summary (run on a read-only replica for large databases)
SELECT *
FROM (
    SELECT
        schemaname,
        tablename,
        (pgstattuple(schemaname || '.' || tablename)).*
    FROM pg_tables
    WHERE schemaname = 'public'
) t
ORDER BY dead_tuple_percent DESC
LIMIT 20;

Warning: pgstattuple performs a full table scan and carries a significant I/O cost on large tables. In production, run it against specific tables only, or execute on a read-only replica.

Measuring Index Bloat

-- Index Bloat analysis
SELECT * FROM pgstatindex('public.orders_pkey');
ColumnMeaningDecision threshold
avg_leaf_densityData density of leaf pagesBelow 20% → consider REINDEX
leaf_fragmentationLeaf page fragmentationHigher = worse performance
-- Full-schema index Bloat summary
SELECT
    c.relname                                        AS index_name,
    i.indrelid::regclass                             AS table_name,
    pg_size_pretty(pg_relation_size(c.oid))          AS index_size,
    (pgstatindex(c.oid)).avg_leaf_density,
    (pgstatindex(c.oid)).leaf_fragmentation
FROM pg_index i
JOIN pg_class c ON c.oid = i.indexrelid
JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE n.nspname = 'public'
  AND c.relkind = 'i'
ORDER BY pg_relation_size(c.oid) DESC
LIMIT 20;

2. Production Monitoring Query Collection

A set of queries ready to drop into a dashboard or cron-based alert script.

Comprehensive Vacuum Health Check (Dashboard Main Query)

WITH vacuum_stats AS (
    SELECT
        s.schemaname || '.' || s.relname                        AS table_name,
        s.n_live_tup                                            AS live_rows,
        s.n_dead_tup                                            AS dead_rows,
        ROUND(100.0 * s.n_dead_tup
            / NULLIF(s.n_live_tup + s.n_dead_tup, 0), 1)       AS dead_pct,
        age(c.relfrozenxid)                                     AS xid_age,
        pg_size_pretty(pg_total_relation_size(c.oid))           AS total_size,
        s.last_vacuum,
        s.last_autovacuum,
        s.vacuum_count,
        s.autovacuum_count
    FROM pg_stat_user_tables s
    JOIN pg_class c ON c.relname = s.relname
        AND c.relnamespace = (
            SELECT oid FROM pg_namespace WHERE nspname = s.schemaname
        )
)
SELECT
    table_name,
    live_rows,
    dead_rows,
    dead_pct,
    xid_age,
    total_size,
    COALESCE(last_autovacuum::text, last_vacuum::text, 'NEVER') AS last_vacuumed,
    CASE
        WHEN xid_age  > 1500000000 THEN 'XID CRITICAL'
        WHEN dead_pct > 30         THEN 'BLOAT CRITICAL'
        WHEN xid_age  > 1000000000 THEN 'XID WARNING'
        WHEN dead_pct > 15         THEN 'BLOAT WARNING'
        WHEN last_autovacuum IS NULL
         AND last_vacuum IS NULL   THEN 'NEVER VACUUMED'
        ELSE 'OK'
    END AS health
FROM vacuum_stats
WHERE dead_rows > 100 OR xid_age > 100000000
ORDER BY
    CASE WHEN xid_age > 1500000000 OR dead_pct > 30 THEN 0 ELSE 1 END,
    dead_rows DESC;

Finding What Is Blocking Vacuum

-- ① Long-running transactions (Vacuum's biggest enemy)
SELECT
    pid,
    usename,
    application_name,
    state,
    ROUND(EXTRACT(EPOCH FROM (now() - xact_start)) / 60, 1) AS tx_min,
    LEFT(query, 100) AS query
FROM pg_stat_activity
WHERE xact_start IS NOT NULL
  AND state NOT IN ('idle')
ORDER BY xact_start
LIMIT 10;

-- ② Inactive Replication Slots (cause Vacuum lag)
SELECT
    slot_name,
    active,
    catalog_xmin,
    age(catalog_xmin)  AS catalog_age,
    xmin,
    age(xmin)          AS xmin_age,
    pg_size_pretty(
        pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)
    )                  AS wal_retained
FROM pg_replication_slots
ORDER BY age(xmin) DESC NULLS LAST;

-- ③ Vacuum Horizon holders by source
SELECT
    'long_transaction'  AS source,
    pid::text           AS identifier,
    age(backend_xmin)   AS horizon_age
FROM pg_stat_activity
WHERE backend_xmin IS NOT NULL
  AND state != 'idle'
UNION ALL
SELECT
    'replication_slot'  AS source,
    slot_name           AS identifier,
    age(xmin)           AS horizon_age
FROM pg_replication_slots
WHERE xmin IS NOT NULL
ORDER BY horizon_age DESC
LIMIT 10;

Enabling Autovacuum Activity Logging

-- Log Autovacuum runs that take longer than 1 second
ALTER SYSTEM SET log_autovacuum_min_duration = '1s';
SELECT pg_reload_conf();

-- Then check pg_log for entries like:
-- automatic vacuum of table "mydb.public.orders": ...
-- automatic analyze of table "mydb.public.orders": ...

3. pg_stat_progress_vacuum — Real-Time Vacuum Tracking

The pg_stat_progress_vacuum view, introduced in PostgreSQL 9.6, provides real-time details on any Vacuum currently in progress.

-- Current Vacuum progress (PostgreSQL 17)
SELECT
    p.pid,
    p.datname,
    p.relid::regclass                                             AS table_name,
    p.phase,
    p.heap_blks_total,
    p.heap_blks_scanned,
    ROUND(100.0 * p.heap_blks_scanned
        / NULLIF(p.heap_blks_total, 0), 1)                       AS heap_scan_pct,
    p.heap_blks_vacuumed,
    p.index_vacuum_count,
    p.num_dead_item_ids,
    p.indexes_total,
    p.indexes_processed,
    ROUND(EXTRACT(EPOCH FROM now() - a.query_start) / 60, 1)     AS running_min
FROM pg_stat_progress_vacuum p
JOIN pg_stat_activity a USING (pid);

Vacuum Phase Interpretation

PhaseMeaningNotes
initializingInitialization
scanning heapScanning heap pages and collecting Dead TuplesUsually the longest phase
vacuuming indexesRemoving Dead Pointers from indexesRepeats once per index
vacuuming heapPhysically removing Dead Tuples from heap
cleaning up indexesFinalizing index cleanup
truncating heapRemoving empty trailing pagesSpace is returned here
performing final cleanupStatistics update and wrap-up

Tip: If heap_blks_scanned stalls for a long time during scanning heap, suspect lock contention or an I/O bottleneck.


4. Bloat Removal Tool Comparison: VACUUM FULL vs pg_repack vs pg_squeeze

When Dead Tuple removal is not enough and physical space reclamation is needed, three tools are available.

Side-by-Side Comparison

AspectVACUUM FULLpg_repackpg_squeeze
Lock levelAccessExclusive (full table lock)Short lock at start and end onlyShort lock at start and end only
Online useNot possiblePossiblePossible
Extra disk needed~1× table size~2× table size~2× table size
ImplementationPostgreSQL built-inTrigger-based change captureLogical decoding (server-side)
AutomationManualManual (CLI tool)Automatable (Background Worker)
PK/UK requiredNot requiredRequiredRequired
Index orderingNoneYes (--order-by)Yes (clustering_index)
Best forMaintenance window, small tablesMedium/large tables, online Bloat removalContinuous automated Bloat management

Using pg_repack

# Install
apt install postgresql-17-repack
# or
CREATE EXTENSION pg_repack;

# Repack a specific table
pg_repack -h localhost -U postgres -d mydb -t public.orders

# Repack indexes only (data is fine, only indexes are bloated)
pg_repack -h localhost -U postgres -d mydb --only-indexes -t public.orders

# Repack the entire database (large maintenance operation)
pg_repack -h localhost -U postgres -d mydb --no-superuser-check

Warning: pg_repack requires approximately twice the table's disk size as working space. A 1 TB table needs up to 2 TB of additional free space during the operation. Verify available disk space before running.

Using pg_squeeze

-- Install
CREATE EXTENSION pg_squeeze;

-- Register a table for automatic Bloat monitoring
SELECT squeeze.add_squeeze_job(
    tabschema  := 'public',
    tabname    := 'orders',
    threshold  := 0.3,       -- Auto-squeeze when Dead Tuple ratio exceeds 30%
    max_retry  := 3
);

-- Manual immediate squeeze
SELECT squeeze.squeeze_table('public', 'orders', NULL, NULL, NULL);

-- Check registered jobs
SELECT * FROM squeeze.tables;

pg_squeeze runs as a server-side Background Worker, requiring no external CLI. The table must have a PRIMARY KEY or UNIQUE constraint.

Removing Index Bloat: REINDEX CONCURRENTLY

-- Non-blocking index rebuild (PG 12+)
REINDEX INDEX CONCURRENTLY public.orders_pkey;

-- Rebuild all indexes on a table
REINDEX TABLE CONCURRENTLY public.orders;

Warning: REINDEX CONCURRENTLY holds an xmin horizon for the duration of the run, which can indirectly affect Autovacuum on other tables. Run it outside peak hours.


5. OLTP / OLAP Mixed Environment Strategy

Running both OLTP (short, frequent transactions) and OLAP (long analytical queries) on the same PostgreSQL instance creates serious Vacuum problems.

The Core Conflict

[How a long OLAP query affects Vacuum]

OLAP query starts -----> 2-hour runtime -----> query ends
         |
         v  Transaction snapshot created (Horizon frozen)

XID at snapshot creation time = "Transaction Horizon"

- What Vacuum can clean: Dead Tuples older than the Horizon
- What Vacuum cannot touch: Dead Tuples created after the Horizon started

Result: the OLTP table accumulates 2 hours worth of Dead Tuples -> Bloat

Three Mitigation Strategies

Strategy 1: Offload OLAP to a Read Replica (Recommended)

With OLAP queries removed from the Primary, Autovacuum can clean Dead Tuples without interference.

Strategy 2: Set idle_in_transaction_session_timeout

-- Apply timeouts to a dedicated analyst role
ALTER ROLE analyst SET statement_timeout = '30min';
ALTER ROLE analyst SET idle_in_transaction_session_timeout = '10min';

-- Apply database-wide
ALTER DATABASE mydb SET idle_in_transaction_session_timeout = '30min';

Strategy 3: Partitioning to Eliminate Vacuum Scope

Partition large history tables by date and manage old partitions with TRUNCATE or DROP. No Dead Tuples are generated, so Vacuum becomes unnecessary for those ranges.

-- Monthly partition table example
CREATE TABLE events (
    id         BIGSERIAL,
    event_time TIMESTAMPTZ NOT NULL,
    payload    JSONB
) PARTITION BY RANGE (event_time);

CREATE TABLE events_2026_01 PARTITION OF events
    FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');

-- Drop partitions older than 3 months (no VACUUM needed)
DROP TABLE events_2025_12;
ApproachAdvantageDisadvantage
Read replica separationStrongest isolationHigher infrastructure cost
Timeout settingsImmediate to applyRisk of interrupting analysis queries
PartitioningEliminates Vacuum entirelySchema change required

6. Preventing Bloat with fillfactor

Prevention is far cheaper than treatment. fillfactor instructs PostgreSQL to reserve free space on data pages ahead of time.

How HOT (Heap-Only Tuple) Update Works

When an UPDATE can write the new row version to the same page, PostgreSQL performs a HOT Update — modifying only the heap page without touching any indexes. This suppresses index Bloat and reduces Vacuum load.

[Page structure with fillfactor=90]

+-----------------------------------------------------+
| [row1][row2][row3]...[rowN]          [10% free]      |
+-----------------------------------------------------+
                                           ^
                                     New version written here on UPDATE
                                     -> HOT Update fires
                                     -> No index modification
-- Apply fillfactor to a frequently updated table
CREATE TABLE sessions (
    session_id UUID PRIMARY KEY,
    user_id    INT NOT NULL,
    last_seen  TIMESTAMPTZ,
    data       JSONB
) WITH (fillfactor = 80);  -- 80% fill, 20% headroom

-- Apply to an existing table (takes effect after VACUUM FULL or pg_repack)
ALTER TABLE sessions SET (fillfactor = 80);
VACUUM FULL sessions;  -- or use pg_repack for zero-downtime application

Recommended fillfactor Values

Table typeRecommended fillfactorReason
Mostly-read reference tables100 (default)No headroom needed
General OLTP tables90Moderate HOT Update opportunity
High-frequency update tables (sessions, counters)70–80Aggressive HOT Update
Append-Only tables (logs, events)100No UPDATEs

7. FAQ

Q1. "Should I ever use VACUUM FULL?"

A: Not never — but almost never in production. VACUUM FULL takes an AccessExclusive Lock, blocking the entire table for the duration. It is appropriate when a maintenance window is available, the table is small (under a few tens of GB), and space reclamation is urgent. In almost every other case, pg_repack or pg_squeeze is the right choice.

Q2. "What happens if I turn off autovacuum?"

A: Never turn it off. Disabling autovacuum allows Dead Tuples to accumulate without limit and disables XID Wraparound protection, eventually causing the database to refuse all writes. If autovacuum feels like a problem, the answer is tuning — not disabling.

Q3. "Why doesn't the table shrink after VACUUM?"

A: Standard VACUUM removes Dead Tuples but does not return disk space to the OS. The reclaimed space is marked reusable and is filled first when new data arrives. To actually reduce the file size, VACUUM FULL or pg_repack is required. That said, if Bloat is low, a stable table size poses no performance problem.

Q4. "How do I tell if HOT Updates are happening?"

-- Check HOT Update ratio
SELECT
    relname,
    n_tup_upd                                                    AS total_updates,
    n_tup_hot_upd                                                AS hot_updates,
    ROUND(100.0 * n_tup_hot_upd / NULLIF(n_tup_upd, 0), 1)     AS hot_update_pct
FROM pg_stat_user_tables
WHERE n_tup_upd > 0
ORDER BY n_tup_upd DESC
LIMIT 20;
-- Higher hot_update_pct is better (minimizes index Bloat)

Q5. "VACUUM finished but Dead Tuples are still there — why?"

A: Vacuum cannot clean Dead Tuples that are newer than any open transaction's snapshot at the time Vacuum ran. If a long-running transaction was open when VACUUM executed, any Dead Tuples created after that transaction started are off-limits. Check pg_stat_activity for old transactions and clean them up first.

Q6. "How much does Bloat actually hurt performance?"

A: Bloat itself is not inherently bad — a moderate amount provides the headroom for HOT Updates. The problem is excessive Bloat. Once dead_pct climbs past 30–50%, sequential scan costs grow linearly and the extra heap lookups from index scans add latency. In practice, treating dead_pct > 20% as a warning threshold is common.


8. Operations Checklist — Daily / Weekly / Monthly

Daily Automated Monitoring (Set Up Cron/Alerts)

-- Alert 1: XID Age exceeds 1 billion
SELECT datname, age(datfrozenxid) AS xid_age
FROM pg_database
WHERE age(datfrozenxid) > 1000000000;

-- Alert 2: Dead Tuple ratio exceeds 30%
SELECT relname, n_dead_tup, n_live_tup,
       ROUND(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 1) AS dead_pct
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
  AND (100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0)) > 30;

-- Alert 3: Transaction open longer than 30 minutes
SELECT pid, usename, state,
       ROUND(EXTRACT(EPOCH FROM (now() - xact_start)) / 60, 1) AS tx_min
FROM pg_stat_activity
WHERE xact_start < now() - interval '30 minutes'
  AND state NOT IN ('idle');

-- Alert 4: Inactive Replication Slot accumulating WAL
SELECT slot_name, age(xmin) AS xmin_age,
       pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS wal_lag
FROM pg_replication_slots
WHERE NOT active AND xmin IS NOT NULL;

Weekly Review Items

[ ] Check pg_stat_user_tables for tables with last_autovacuum older than one week
[ ] Review dead_pct trend per table (upward trend → tune autovacuum)
[ ] Inspect log_autovacuum_min_duration entries for long-running Autovacuum passes
[ ] Check pg_stat_progress_vacuum for any currently running Vacuum operations
[ ] Review Replication Slot health and accumulated WAL volume

Monthly Review Items

[ ] Run pgstattuple on the top 10 largest tables for precise Bloat measurement
[ ] Check pgstatindex avg_leaf_density per index (below 20% -> consider REINDEX)
[ ] Review XID Age trend graph per database
[ ] Run pg_repack or pg_squeeze on tables with excessive Bloat
[ ] Review autovacuum parameters: add per-table overrides for any newly large tables
[ ] Check for PostgreSQL minor version updates and plan for application

9. Series Final Summary

Over three parts, we have covered everything about PostgreSQL VACUUM. The complete picture in one diagram:

10 Core Principles for the Entire Series

  1. Never disable autovacuum — the moment you do, you are on the road to a Wraparound disaster
  2. Always override autovacuum parameters per table for large tables — global defaults are tuned for small tables
  3. On NVMe, set cost_delay=0 — the default throttle is a relic of the HDD era
  4. Alert immediately when XID Age exceeds 1 billion — left alone, the database will stop
  5. Long-running transactions are Vacuum's enemy — set idle_in_transaction_session_timeout
  6. VACUUM FULL is for maintenance windows only — use pg_repack or pg_squeeze in production
  7. Moderate Bloat is acceptable — dead_pct below 20% is within normal range
  8. Run OLAP queries on a read replica — long queries on the Primary block Vacuum
  9. Use fillfactor to encourage HOT Updates — prevention is far cheaper than remediation
  10. Measure regularly with pgstattuple — make decisions on data, not intuition

References

  • PostgreSQL Official Documentation — Routine Vacuuming
  • Microsoft Tech Blog — Managing bloat with pgstattuple
  • Crunchy Data — Checking for PostgreSQL Bloat
  • CYBERTEC — pg_squeeze
  • AWS RDS — pg_repack Guide
  • postgres.ai — How to deal with bloat

Share This Article

Series Navigation

PostgreSQL Vacuum Complete Guide

Current part 3 · 3 published

Explore this topic·Start with featured series

한국어

Follow new posts via RSS

Use RSS to get new posts and series updates directly.

Open RSS Guide