PostgreSQL Backup & Recovery Part 1 — Backup Fundamentals and Strategy
A cron job running and files piling up does not mean your backup works — it means you haven't tested recovery yet. PostgreSQL offers three fundamentally different backup approaches: logical backup (pg_dump, pg_dumpall) for portability and selective restore, physical backup (pg_basebackup) for full-cluster disaster recovery, and WAL archiving to push RPO down to seconds. The right choice is not about picking a tool — it is about working backwards from your RPO and RTO. Part 1 maps the full strategy: the 3-2-1 principle, monitoring, and retention policy included.
Series overview
- Part 1 — Backup Fundamentals and Strategy (you are here)
- Part 2 — Logical Backup — pg_dump & pg_dumpall in Practice (coming soon)
- Part 3 — Physical Backup — pg_basebackup and WAL Archiving (coming soon)
- Part 4 — PITR (Point-in-Time Recovery) Implementation Guide (coming soon)
- Part 5 — Enterprise Backup Tools Compared — pgBackRest vs Barman vs WAL-G (coming soon)
- Part 6 — Backup Automation & Monitoring, and Recovery Drills (coming soon)
Table of Contents
- Introduction — The "we have a backup" illusion
- The three PostgreSQL backup approaches
- RPO & RTO — Where backup strategy begins
- The 3-2-1 backup rule
- The PostgreSQL backup ecosystem in 2026
- The five most common backup mistakes
- Choosing the right backup approach
- Closing — Recoverability is the backup
1. Introduction — The "we have a backup" illusion
Many teams share a familiar experience.
"The cron job is running, files are showing up, the dashboard is green. We have a backup."
Yet when a real failure strikes, the difference between teams that recover successfully and teams that do not is not whether a backup file exists — it is whether the recovery procedure has been tested.
The PostgreSQL documentation makes this explicit.
Backups should be performed on a regular basis. Although the procedure is essentially simple, it is important to have a clear understanding of the underlying techniques and assumptions. — postgresql.org/docs/current/backup
In 2026, PostgreSQL underpins SaaS platforms, analytics pipelines, and high-availability systems at scale. The cost of a backup failure has grown accordingly. Part 1 reframes backup not as file creation but as verified recoverability, and lays out a complete strategic map.
2. The three PostgreSQL backup approaches
PostgreSQL offers three fundamentally different backup methods. Each solves a different problem, and no single method covers every scenario.
2.1 Logical Backup
Logical backup extracts data as SQL statements. The primary tools are pg_dump and pg_dumpall.
Key characteristics
- Selective backup and restore at the table or schema level
- Excellent portability across PostgreSQL versions (lower → higher version migration)
- Slow on large databases
- Not suitable for full-cluster disaster recovery (roles, tablespaces not included in
pg_dump)
# Single database backup (custom format — supports parallel restore)
pg_dump -U postgres -d mydb -Fc -f mydb.dump
# Full cluster (includes roles and tablespaces)
pg_dumpall -U postgres > full_cluster.sql
2.2 Physical Backup
Physical backup copies the entire PostgreSQL data directory at the file-system level. pg_basebackup is the built-in tool.
Key characteristics
- Fast even for large databases
- Optimal for full-cluster recovery (disaster recovery scenarios)
- The required foundation for PITR (Point-in-Time Recovery)
- Cannot restore individual tables or objects
- Only compatible within the same PostgreSQL major version
# Physical backup with WAL streaming and recovery config auto-generated
pg_basebackup -h localhost -U replicator \
-D /backups/basebackup \
-Fp -Xs -P -R
2.3 Continuous Archiving (WAL Archiving)
Continuous archiving copies WAL (Write-Ahead Log) files as they are generated, enabling recovery to any arbitrary point in time. It is the core mechanism behind PITR.
Key characteristics
- Recovery to any point in time regardless of the backup schedule
- Can bring RPO down to seconds
- Must be used in combination with a physical base backup
- High operational complexity
# postgresql.conf — enable WAL archiving
archive_mode = on
archive_command = 'cp %p /mnt/wal-archive/%f'
# For cloud environments:
# archive_command = 'wal-g wal-push %p'
Approach comparison
| Approach | Primary tool | Strengths | Weaknesses |
|---|---|---|---|
| Logical backup | pg_dump, pg_dumpall | Portability, selective restore | Slow on large DBs, not for full-cluster recovery |
| Physical backup | pg_basebackup | Fast full-cluster recovery, PITR foundation | Same major version only, no object-level restore |
| WAL archiving | archive_command | RPO in seconds, core PITR mechanism | High operational complexity |
3. RPO & RTO — Where backup strategy begins
Before choosing a tool, two concepts must be defined.
| Concept | Definition | Example |
|---|---|---|
| RPO (Recovery Point Objective) | Maximum tolerable data loss, expressed as time | RPO = 1 hour → backups must run at least hourly |
| RTO (Recovery Time Objective) | Maximum tolerable time to restore service after a failure | RTO = 4 hours → service must be back within 4 hours |
These two targets determine the backup method, frequency, and tooling. The correct sequence is to derive the approach from RPO/RTO first, not to pick a tool and work backwards.
RPO ≥ 24 hours? → daily pg_dump is sufficient
RPO ≤ 1 hour? → incremental backups or WAL archiving required
RPO = seconds? → continuous WAL archiving (PITR) is mandatory
Replicas are not backups. A replica improves availability, but if data is accidentally deleted on the primary, that deletion replicates to the replica immediately. From an RPO perspective, a replica cannot substitute for a backup.
4. The 3-2-1 backup rule
The 3-2-1 rule is the industry-standard backup retention framework for production PostgreSQL environments.
| Number | Meaning |
|---|---|
| 3 | Maintain 3 copies of data (including the original) |
| 2 | Store on 2 different media or storage types |
| 1 | Keep 1 copy off-site |
Real-world application
- Production database (original)
- Local NAS or a separate server (physical separation)
- AWS S3 / GCS / Azure Blob (geographic separation)
This principle covers hardware failure, accidental deletion, and physical disasters such as fire or flooding. A backup configuration that does not meet all three conditions leaves specific failure scenarios uncovered.
5. The PostgreSQL backup ecosystem in 2026
As of PostgreSQL 18, with version 17 still widely used in production estates, the backup tooling ecosystem is mature.
Built-in tools
| Tool | Type | Primary use |
|---|---|---|
| pg_dump | Logical | Single-database selective backup |
| pg_dumpall | Logical | Full cluster including roles |
| pg_basebackup | Physical | Full-cluster physical backup, PITR foundation |
| pg_restore | Recovery | Restore custom/directory format dumps |
Key open-source third-party tools
| Tool | Notable features | Recommended environment |
|---|---|---|
| pgBackRest | Parallel processing, incremental backup, encryption, cloud integration | Large-scale production, on-premises or cloud |
| Barman | Centralized multi-server management, Python-based | Enterprise, managing many PostgreSQL instances |
| WAL-G | Cloud-native, direct archiving to S3/GCS/Azure | Cloud-first environments |
| Databasus | Web UI, Docker, scheduling automation | DevOps teams including non-technical operators |
There is no universally "best" tool. The right choice depends on data scale, RPO/RTO targets, cloud vs. on-premises environment, and available operational expertise.
6. The five most common backup mistakes
❌ Mistake 1: Recovery has never been tested
A backup file that fails during restore is equivalent to having no backup at all. Periodic recovery testing must be built into the backup process itself — not deferred to "when something breaks." Schedule a full recovery drill at least once per quarter.
❌ Mistake 2: Treating a replica as a backup
A replica improves availability but does not replace a backup. If data is accidentally deleted on the primary, that deletion propagates to the replica immediately. Replicas and backups serve different roles.
❌ Mistake 3: Assuming pg_dump is enough
pg_dump captures only a logical snapshot of a single database. Cluster-level configuration such as roles and tablespaces is not included. Disaster recovery scenarios that require full-cluster restore need pg_dumpall or a physical backup.
❌ Mistake 4: No backup monitoring
Without alerts, you cannot know whether a cron job failed or whether WAL archiving is falling behind. Every backup process must include success/failure monitoring. The most common pattern is discovering that backups have been silently failing for days — at the exact moment of a failure.
❌ Mistake 5: No retention policy
Retaining old backups indefinitely causes storage costs to spiral; too short a retention window means a logical error discovered late cannot be recovered. Define explicit retention periods (e.g., 30 days daily, 3 months weekly, 1 year monthly) as policy.
7. Choosing the right backup approach
The selection flow below is driven by RPO/RTO targets and deployment environment.
In practice, combining logical and physical backup is the recommended approach rather than relying on a single tool.
- Physical backup → fast full-cluster recovery (disaster recovery)
- Logical backup → portability and selective restore (table-level, cross-version migration)
The two methods solve different problems, so running both widens coverage.
8. Closing — Recoverability is the backup
"The purpose of a backup is not to create a copy. It is to return your data to the state you need, at the moment you need it, within a predictable window of time."
A backup strategy begins with defining recovery scenarios, not selecting tools. Define RPO and RTO first, design a 3-2-1 retention structure, sustain it with monitoring and a retention policy, and validate the whole chain with regular recovery drills. Only when all four are in place can you genuinely say "we have a backup."
Part 2 covers pg_dump and pg_dumpall in depth — from format selection to automation.