PostgreSQL Backup & Recovery Part 2 — pg_dump and pg_dumpall in Practice
Pg_dump is the most familiar backup tool, but it is a complement — not a replacement — for production disaster recovery. It excels at selective restore, version migration, and dev environment cloning. Output format choice (plain/custom/directory) determines recovery flexibility and performance, and pg_restore's parallel and selective restore capabilities are the key to cutting real-world incident response time. pg_dumpall captures cluster-wide objects (roles, tablespaces) but is plain-SQL only, making a hybrid strategy essential for large-scale recovery.
Series overview
- Part 1 — Backup Fundamentals and Strategy
- Part 2 — Logical Backup — pg_dump & pg_dumpall in Practice (you are here)
- Part 3 — Physical Backup — pg_basebackup and WAL Archiving (coming soon)
- Part 4 — PITR (Point-in-Time Recovery) Implementation Guide (coming soon)
- Part 5 — Enterprise Backup Tools Compared — pgBackRest vs Barman vs WAL-G (coming soon)
- Part 6 — Backup Automation & Monitoring, and Recovery Drills (coming soon)
Table of Contents
- Introduction — Where logical backup fits
- pg_dump vs pg_dumpall — What's the difference?
- Output formats explained
- pg_dump recipes
- pg_restore recipes
- pg_dumpall recipes
- Password security — Using .pgpass
- Backup automation scripts
- Performance optimization for large databases
- Limits of logical backup and when physical backup is required
- Closing
1. Introduction — Where logical backup fits
Part 1 established PostgreSQL's three backup approaches: logical backup, physical backup, and WAL archiving. Each plays a distinct role, and combining them is what makes a production-grade strategy.
Logical backup is usually the first method you encounter. pg_dump is PostgreSQL's standard bundled tool, exporting a database as SQL or an archive format. Its key advantage: it produces a consistent snapshot without blocking other users' reads or writes.
The official PostgreSQL documentation is explicit about its scope.
"Except for simple situations, pg_dump is not suitable as the regular backup tool for a production database."
— postgresql.org/docs/current/app-pgdump.html
pg_dump is powerful for selective restore, major-version migration, and dev/test environment cloning. For production disaster recovery that requires PITR, physical backup is mandatory. Logical backup is meant to complement physical backup, not replace it.
2. pg_dump vs pg_dumpall — What's the difference?
The two tools differ in scope from the start.
| Item | pg_dump | pg_dumpall |
|---|---|---|
| Scope | Single database | Entire cluster |
| Roles included | No | Yes |
| Tablespaces included | No | Yes |
| Output formats | plain, custom, directory, tar | plain (SQL) only |
| Parallel dump | Directory format only | Not supported |
| Selective restore | Supported | Not supported |
| Recovery tool | psql or pg_restore | psql |
Internally, pg_dumpall calls pg_dump for each database and additionally exports cluster-wide objects (roles, tablespaces) as SQL. Note that snapshot timestamps are not fully synchronized across individual databases.
In practice, a hybrid strategy is common: use pg_dumpall --globals-only for global objects and pg_dump with parallel options for individual databases.
3. Output formats explained
Format choice has a direct impact on recovery flexibility and performance. pg_dump supports four formats.
| Format | Flag | Compression | Parallel dump | Parallel restore | Selective restore | Best for |
|---|---|---|---|---|---|---|
| Plain (SQL) | -Fp | None | No | No | No | Small DBs, human-readable |
| Custom | -Fc | Built-in | No | Yes | Yes | Most production workloads |
| Directory | -Fd | Built-in | Yes | Yes | Yes | Large DBs, speed-critical |
| Tar | -Ft | None | No | Yes | Yes | Compatibility |
Plain
A human-readable SQL file restored with psql. No selective restore or parallel processing. File size grows rapidly with large databases.
Custom
The default choice for most production environments. Built-in compression, pg_restore-based recovery with table/schema-level selection, and parallel restore (pg_restore -j) in a single file.
Directory
The only format that supports parallel dump (-j). Each table gets its own file, which increases file count and complicates transfer and management. Use when dump speed is the priority on large databases.
Tar
Same structure as Custom but without compression. No compelling reason to prefer it over Custom unless compatibility demands it.
4. pg_dump recipes
4.1 Basic backup
# Basic custom-format backup
pg_dump -Fc -U postgres -h localhost -d mydb \
-f /backups/mydb_$(date +%Y%m%d_%H%M%S).dump
# Remote server backup
pg_dump -Fc -U postgres -h db.example.com -p 5432 -d mydb \
-f /backups/mydb_remote.dump
4.2 Compression tuning
# No compression — prioritize restore speed
pg_dump -Fc -Z 0 -U postgres -d mydb -f mydb_nocompress.dump
# Maximum compression — minimize storage cost
pg_dump -Fc -Z 9 -U postgres -d mydb -f mydb_maxcompress.dump
# External compression — combine plain format with pigz
pg_dump -Fp -U postgres -d mydb | pigz -p 4 > mydb.sql.gz
Using -Z 0 to disable compression can speed up the overall dump when CPU is not the bottleneck. The effect varies by storage I/O profile — measure before committing.
4.3 Selective backup — tables and schemas
# Specific tables only
pg_dump -Fc -U postgres -d mydb \
-t users -t orders \
-f critical_tables.dump
# Wildcard pattern
pg_dump -Fc -U postgres -d mydb \
-t 'public.user_*' \
-f user_tables.dump
# Entire schema
pg_dump -Fc -U postgres -d mydb \
-n analytics \
-f analytics_schema.dump
# Exclude specific tables
pg_dump -Fc -U postgres -d mydb \
-T audit_logs -T temp_sessions \
-f mydb_no_logs.dump
When dumping specific tables, the foreign-key targets of those tables are not included. Restore may fail with dependency errors — include related tables or use --disable-triggers.
4.4 Schema-only, data-only, and section dumps
# Schema (DDL) only — migration, schema version control
pg_dump -Fp -s -U postgres -d mydb -f schema_only.sql
# Data only — load into an environment where schema already exists
pg_dump -Fc -a -U postgres -d mydb -f data_only.dump
# Section-based splitting
pg_dump -Fc --section=pre-data -U postgres -d mydb -f pre_data.dump
pg_dump -Fc --section=data -U postgres -d mydb -f data.dump
pg_dump -Fc --section=post-data -U postgres -d mydb -f post_data.dump
Section splitting shines in large-scale restores. Loading data first and building indexes last significantly improves restore speed.
4.5 Handling very large files
# Work around file size limits with split
pg_dump -Fp -U postgres -d mydb | gzip | split -b 2G - /backups/mydb_part_
# Restore from split files
cat /backups/mydb_part_* | gunzip | psql -U postgres -d mydb
# Backup over SSH tunnel — stream directly to backup server
pg_dump -Fc -U postgres -d mydb | ssh user@backup-server "cat > /backups/mydb.dump"
4.6 PostgreSQL 18 new options
PostgreSQL 18 added finer dump control: --sequence-data includes sequence data that was previously omitted by default; --statistics-only, --no-statistics, --no-data, and --no-schema let you control exactly what is exported; --no-policies excludes Row Level Security policies, useful when migrating to environments with different policy sets.
# Include sequence data (PostgreSQL 18+)
pg_dump -Fc --sequence-data -U postgres -d mydb -f mydb_with_seq.dump
# Statistics only (PostgreSQL 18+)
pg_dump --statistics-only -U postgres -d mydb -f stats_only.dump
# Exclude RLS policies (PostgreSQL 18+)
pg_dump -Fc --no-policies -U postgres -d mydb -f mydb_no_rls.dump
Always verify compatibility with your running PostgreSQL version against the official documentation before using these options.
5. pg_restore recipes
pg_restore restores custom, directory, and tar format archives. Plain SQL format is restored with psql.
5.1 Basic restore
# Full DB restore (database must already exist)
pg_restore -U postgres -d mydb /backups/mydb.dump
# Create database automatically during restore
pg_restore -U postgres -C -d postgres /backups/mydb.dump
# Stop immediately on error
pg_restore -U postgres -d mydb --exit-on-error /backups/mydb.dump
5.2 Parallel restore
Parallel restore works with both custom and directory format archives, regardless of whether parallel dump was used to create them.
# Restore with 4 parallel workers
pg_restore -U postgres -d mydb -j 4 /backups/mydb.dump
# Directory format parallel restore
pg_restore -U postgres -d mydb -j 4 -Fd /backups/mydb_dir/
Temporarily adjusting server parameters before restore can further improve speed.
-- Temporary settings before restore (must revert after completion)
ALTER SYSTEM SET shared_buffers = '4GB';
ALTER SYSTEM SET work_mem = '32MB';
ALTER SYSTEM SET max_wal_size = '32GB';
ALTER SYSTEM SET checkpoint_timeout = '60min';
ALTER SYSTEM SET maintenance_work_mem = '2GB';
ALTER SYSTEM SET autovacuum = 'off';
SELECT pg_reload_conf();
autovacuum = off affects data safety — re-enable it immediately after restore completes. Tune the values above to match your server specifications.
5.3 Selective restore
# Restore a specific table
pg_restore -U postgres -d mydb -t users /backups/mydb.dump
# Restore a specific schema
pg_restore -U postgres -d mydb -n analytics /backups/mydb.dump
# List archive contents without restoring
pg_restore --list /backups/mydb.dump
# Restore a specific section — rebuild indexes and constraints only
pg_restore -U postgres -d mydb --section=post-data /backups/mydb.dump
5.4 Plain SQL restore
# Basic restore
psql -U postgres -d mydb < backup.sql
# Stop immediately on error (recommended)
psql -U postgres -d mydb \
-v ON_ERROR_STOP=1 \
-X \
< backup.sql
# Restore from compressed file
gunzip -c backup.sql.gz | psql -U postgres -d mydb
ON_ERROR_STOP=1 prevents partial loads by halting on any error during restore. Always set this for production restores.
6. pg_dumpall recipes
pg_dumpall backs up the entire cluster, including Roles and Tablespaces. It is essential for server migrations or DR environments where pg_dump alone cannot recover global objects.
6.1 Basic usage
# Full cluster backup
pg_dumpall -U postgres -f /backups/cluster_$(date +%Y%m%d).sql
# Global objects only (roles + tablespaces, no DB content)
pg_dumpall -U postgres --globals-only -f /backups/globals_$(date +%Y%m%d).sql
# Roles only
pg_dumpall -U postgres --roles-only -f /backups/roles.sql
# Tablespaces only
pg_dumpall -U postgres --tablespaces-only -f /backups/tablespaces.sql
6.2 Restore
# Full cluster restore (requires superuser)
psql -U postgres -f /backups/cluster_20260414.sql
# Recommended hybrid: restore global objects first, then individual DBs with pg_restore
psql -U postgres -f /backups/globals_20260414.sql
pg_restore -U postgres -C -d postgres /backups/mydb.dump
pg_dumpall outputs plain SQL only, so parallel restore is not possible. For large clusters, the recommended approach is to save global objects separately with pg_dumpall --globals-only and dump individual databases in parallel with pg_dump -Fd -j N.
7. Password security — Using .pgpass
Hard-coding passwords in backup scripts is a security risk. The .pgpass file keeps credentials out of source code.
# ~/.pgpass format: hostname:port:database:username:password
echo "localhost:5432:mydb:backup_user:securepassword" >> ~/.pgpass
chmod 600 ~/.pgpass
# Now runs without a password prompt
pg_dump -U backup_user -d mydb -Fc -f mydb.dump
Environment variables are also an option — but always unset after use.
export PGPASSWORD="securepassword"
pg_dump -U backup_user -d mydb -Fc -f mydb.dump
unset PGPASSWORD
PGPASSWORD can be visible in process listings, making it unsuitable as the default for long-running automation scripts. Prefer .pgpass or pg_service.conf.
8. Backup automation scripts
8.1 Daily backup script
#!/bin/bash
set -euo pipefail
DB_HOST="localhost"
DB_USER="backup_user"
DB_NAME="mydb"
BACKUP_DIR="/backups/postgresql"
RETENTION_DAYS=30
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BACKUP_FILE="${BACKUP_DIR}/${DB_NAME}_${TIMESTAMP}.dump"
LOG_FILE="/var/log/pg_backup.log"
mkdir -p "${BACKUP_DIR}"
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a "${LOG_FILE}"
}
log "Backup started: ${DB_NAME}"
if pg_dump -Fc \
-h "${DB_HOST}" \
-U "${DB_USER}" \
-d "${DB_NAME}" \
-f "${BACKUP_FILE}"; then
FILE_SIZE=$(du -sh "${BACKUP_FILE}" | cut -f1)
log "Backup complete: ${BACKUP_FILE} (${FILE_SIZE})"
else
log "ERROR: Backup failed!"
exit 1
fi
DELETED=$(find "${BACKUP_DIR}" -name "${DB_NAME}_*.dump" \
-mtime +${RETENTION_DAYS} -delete -print | wc -l)
log "Removed ${DELETED} expired backups (retention: ${RETENTION_DAYS} days)"
8.2 Cron registration
# Daily at 2 AM
0 2 * * * /usr/local/bin/pg_backup_daily.sh >> /var/log/pg_backup.log 2>&1
# Global objects weekly (Sunday at 1 AM)
0 1 * * 0 pg_dumpall -U postgres --globals-only \
-f /backups/globals_$(date +\%Y\%m\%d).sql
Automation scripts must include success/failure logging, a retention policy, and a recovery test plan. Verifying that a backup file was created is not enough.
9. Performance optimization for large databases
Each -j N parallel worker opens its own database connection (N + 1 total including the coordinator). Confirm that max_connections has sufficient headroom and monitor CPU, memory, and IOPS as you tune the worker count.
# Optimized large-DB dump on an 8-core server
pg_dump \
-Fd \
-j 6 \
-Z 0 \
-U postgres \
-d large_db \
-f /backups/large_db_dir/
# Parallel restore
pg_restore \
-Fd \
-j 6 \
-U postgres \
-d large_db \
/backups/large_db_dir/
More parallelism does not always mean faster. When a database has few tables and one giant table dominates, parallel gains are limited. Compression tuning (-Z 0 vs default) depends on whether the bottleneck is CPU or I/O — measure in your environment before committing.
10. Limits of logical backup and when physical backup is required
Knowing when logical backup is the right tool — and when it is not — is as important as knowing how to use it.
Where logical backup excels
- Selective restore of specific tables or schemas
- PostgreSQL major-version upgrades (lower to higher version)
- Moving data to a different OS or hardware architecture
- Cloning data into dev or test environments
- Long-term portable snapshots
Where physical backup is required
- PITR (Point-in-Time Recovery)
- Fast full restore of large databases (disaster recovery, RTO targets)
- Initializing a Streaming Replication standby
- Environments with sub-minute RPO requirements
Relying on pg_dump alone for production backup coverage is risky. Treat logical backup as a complement to physical backup — not a substitute.
11. Closing — A backup that has never been tested is not a backup
pg_dump and pg_restore are easy to learn, but each option makes a meaningful difference when a real failure hits. Decide in advance on format, parallelism, and selective restore scope — then run actual recovery tests on a regular schedule.
"A backup you have never tested is not a backup."
The next part covers physical backup — pg_basebackup and WAL archiving — the foundation for production PITR.