Operating Patroni H/A Across Multiple Regions — Part 3: Async Replication + Standby Cluster Setup
Implement Part 1's Pattern B in practice. Configure a DC1 Primary Cluster and a DC2 Standby Cluster with independent etcd rings connected only by async WAL Streaming. Covers the full DR lifecycle: setting up the Replication Slot, writing patroni.yml for the Standby Cluster, verifying replication lag, manually promoting DC2 when DC1 fails, and reversing the roles after DC1 recovers. Includes Patroni 4.1's promote-cluster and demote-cluster commands.
Series — Operating Patroni H/A Across Multiple Regions
- Part 1 — Fundamentals and Architecture Design Principles
- Part 2 — Synchronous Multi-DC Setup in Practice
- Part 3 — Async Replication + Standby Cluster Setup (this post)
- Part 4 — Split-Brain Prevention (STONITH, Watchdog, Quorum)
- Part 5 — Failover Runbook and DR Drills
- Part 6 — Monitoring, Operational Automation, and Best Practices
Table of Contents
- What Is a Standby Cluster?
- Lab Environment and Architecture Overview
- Step 1 — Prepare a Replication Slot on DC1
- Step 2 — Configure DC2's Independent etcd Cluster
- Step 3 — Write patroni.yml for the DC2 Standby Cluster
- Step 4 — Start the Standby Cluster and Verify Replication
- Step 5 — Manual Promotion When DC1 Fails
- Step 6 — Return DC1 to Standby After Recovery
- Patroni 4.1 Commands: promote-cluster / demote-cluster
- Common Troubleshooting
- References
1. What Is a Standby Cluster?
Patroni's Standby Cluster feature runs cascading replication to a remote data center. Unlike a regular Replica node, a Standby Cluster maintains its own full HA structure inside DC2 (independent etcd + Patroni leader election) while keeping all its data synchronized with the WAL stream from DC1's Primary.
DC2 contains a special Standby Leader node. Inside DC2, the Standby Leader behaves like a regular Leader (holds the DCS lock, manages Cascade Replicas), but its data comes entirely from DC1's Primary via streaming replication.
Why Choose Standby Cluster?
- RPO > 0 is acceptable: when inter-region latency is high or write performance degradation is unacceptable
- Cost reduction: 2-DC async setup instead of 3-DC synchronous, minimizing infrastructure cost
- Regional read service: use the DC2 Standby Leader as a read-only endpoint to reduce latency
- DR test isolation: run failover simulations in DC2 independently, with no impact on DC1
- Cloud migration: use as a Zero-Downtime Migration path when moving regions or from on-premises to cloud
⚠️ The Standby Cluster and Primary Cluster must never share the same DCS scope. Always use independent etcd clusters or different namespaces.
2. Lab Environment and Architecture Overview
Node Layout
| Node | Region | IP (example) | Role |
|---|---|---|---|
| pg-seoul-1 | ap-northeast-2 (Seoul) | 10.1.0.10 | PostgreSQL Primary + etcd |
| pg-seoul-2 | ap-northeast-2 (Seoul) | 10.1.0.11 | PostgreSQL Replica + etcd |
| pg-seoul-3 | ap-northeast-2 (Seoul, AZ spread) | 10.1.0.12 | PostgreSQL Replica + etcd |
| pg-busan-1 | on-premise (Busan DR) | 10.2.0.10 | Standby Leader + etcd |
| pg-busan-2 | on-premise (Busan DR) | 10.2.0.11 | Cascade Replica + etcd |
| pg-busan-3 | on-premise (Busan DR) | 10.2.0.12 | Cascade Replica + etcd |
Each DC runs an independent 3-node etcd cluster. The two etcd clusters do not communicate with each other. Replication flows exclusively through PostgreSQL WAL Streaming.
Full Architecture
3. Step 1 — Prepare a Replication Slot on DC1
For the Standby Cluster to connect to DC1 and receive WAL without gaps, DC1 must hold a Replication Slot so the WAL segments are not discarded. Patroni's Permanent Replication Slot feature registers the slot in the DCS, keeping it alive even after a failover.
Register the Permanent Slot on DC1
# Add a Permanent Slot to DC1's dynamic config
patronictl -c /etc/patroni/patroni.yml edit-config
In the editor, add the slots section:
# DC1 Dynamic Configuration (stored in DCS)
slots:
standby_cluster_busan:
type: physical
cluster_type: primary # Created and maintained only on the Primary node
Verify the slot was created:
# Run on DC1 Primary
psql -U postgres -c "
SELECT slot_name, slot_type, active, restart_lsn
FROM pg_replication_slots
WHERE slot_name = 'standby_cluster_busan';
"
# Expected output:
# slot_name | slot_type | active | restart_lsn
# -----------------------+-----------+--------+-------------
# standby_cluster_busan | physical | f | 0/3000000
active: f means DC2 has not connected yet. After starting DC2, confirm it changes to active: t.
Update DC1's pg_hba for DC2 Replication
All DC2 nodes must be allowed to open replication connections to DC1.
# DC1 patroni.yml — add pg_hba entries
postgresql:
pg_hba:
- host replication replicator 10.2.0.10/32 md5 # pg-busan-1
- host replication replicator 10.2.0.11/32 md5 # pg-busan-2
- host replication replicator 10.2.0.12/32 md5 # pg-busan-3
# Apply pg_hba changes without a restart
patronictl -c /etc/patroni/patroni.yml reload pg-seoul-cluster
4. Step 2 — Configure DC2's Independent etcd Cluster
DC2's etcd cluster operates completely independently of DC1's etcd. The setup mirrors Part 2's Step 2 — only the IPs and node names change to match DC2.
# /etc/etcd/etcd.conf.yml (Busan node 1)
name: etcd-busan-1
data-dir: /var/lib/etcd/data
listen-client-urls: https://10.2.0.10:2379,https://127.0.0.1:2379
advertise-client-urls: https://10.2.0.10:2379
listen-peer-urls: https://10.2.0.10:2380
initial-advertise-peer-urls: https://10.2.0.10:2380
# DC2 nodes only — completely unrelated to DC1's etcd
initial-cluster: >
etcd-busan-1=https://10.2.0.10:2380,
etcd-busan-2=https://10.2.0.11:2380,
etcd-busan-3=https://10.2.0.12:2380
initial-cluster-state: new
initial-cluster-token: pg-busan-standby-cluster-v1 # Different token from DC1
client-transport-security:
cert-file: /etc/etcd/ssl/etcd-busan-1.pem
key-file: /etc/etcd/ssl/etcd-busan-1-key.pem
trusted-ca-file: /etc/etcd/ssl/ca.pem
client-cert-auth: true
peer-transport-security:
cert-file: /etc/etcd/ssl/etcd-busan-1.pem
key-file: /etc/etcd/ssl/etcd-busan-1-key.pem
trusted-ca-file: /etc/etcd/ssl/ca.pem
peer-client-cert-auth: true
# Intra-DC traffic — default timeouts are fine
heartbeat-interval: 100
election-timeout: 1000
5. Step 3 — Write patroni.yml for the DC2 Standby Cluster
The critical part of a Standby Cluster config is the bootstrap.dcs.standby_cluster section. This is where DC1's connection details and the Replication Slot name are declared. These values apply only at initial bootstrap; subsequent changes must go through the DCS via patronictl edit-config.
# /etc/patroni/patroni.yml (pg-busan-1 — Standby Leader candidate)
scope: pg-busan-standby # Must differ from DC1's scope name
namespace: /db/
name: pg-busan-1 # Must not match any DC1 member name
restapi:
listen: 0.0.0.0:8008
connect_address: 10.2.0.10:8008
certfile: /etc/patroni/ssl/patroni.pem
keyfile: /etc/patroni/ssl/patroni-key.pem
cafile: /etc/patroni/ssl/ca.pem
# Connect to DC2's independent etcd
etcd3:
hosts:
- 10.2.0.10:2379
- 10.2.0.11:2379
- 10.2.0.12:2379
protocol: https
cacert: /etc/etcd/ssl/ca.pem
cert: /etc/etcd/ssl/etcd-busan-1.pem
key: /etc/etcd/ssl/etcd-busan-1-key.pem
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 104857600 # 100 MB: generous allowance for inter-region lag
# Core: Standby Cluster config
standby_cluster:
# List all DC1 Primary Cluster nodes (ensures connection survives DC1 internal failover)
host: 10.1.0.10,10.1.0.11,10.1.0.12
port: 5432
# Must exactly match the Permanent Slot name created on DC1
primary_slot_name: standby_cluster_busan
create_replica_methods:
- basebackup
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
wal_level: replica
hot_standby: "on"
max_wal_senders: 10
max_replication_slots: 10
wal_log_hints: "on"
pg_hba:
- local all all trust
- host all all 127.0.0.1/32 md5
- host replication replicator 10.2.0.0/24 md5
postgresql:
listen: 0.0.0.0:5432
connect_address: 10.2.0.10:5432
data_dir: /var/lib/postgresql/17/main
bin_dir: /usr/lib/postgresql/17/bin
config_dir: /etc/postgresql/17/main
authentication:
replication:
username: replicator
password: "SecureRepPass123!"
superuser:
username: postgres
password: "SecureSuperPass123!"
rewind:
username: rewind_user
password: "SecureRewindPass123!"
tags:
nofailover: false
noloadbalance: false
dc: busan
Node naming rule: Every DC2 node name (pg-busan-1, pg-busan-2, pg-busan-3) must be unique across both clusters — no overlap with any DC1 member name. Overlapping names cause a Silent Failure where DC1 misidentifies DC2 nodes as Synchronous Standbys, creating data loss risk.
6. Step 4 — Start the Standby Cluster and Verify Replication
Start the Standby Cluster
# DC2 — start pg-busan-1 first (bootstraps as Standby Leader)
systemctl enable --now patroni
# Monitor bootstrap progress
journalctl -fu patroni
# Expected log flow:
# INFO: trying to bootstrap a standby leader
# INFO: trying to use basebackup from 10.1.0.10:5432
# INFO: replica has been created using basebackup
# INFO: bootstrapped as a standby leader
# Once pg-busan-1 is confirmed as Standby Leader, start the remaining nodes
ssh pg-busan-2 "systemctl enable --now patroni"
ssh pg-busan-3 "systemctl enable --now patroni"
Verify Replication State
# Check Standby Cluster topology from DC2
patronictl -c /etc/patroni/patroni.yml topology
# Expected output:
# + Cluster: pg-busan-standby (8901234567890123456) +----------------+-----------+
# | Member | Host | Role | State | TL | Lag in MB |
# +-------------+-----------------+----------------+---------+----+-----------+
# | pg-busan-1 | 10.2.0.10:5432 | Standby Leader | running | 3 | 0.0 |
# | pg-busan-2 | 10.2.0.11:5432 | Replica | running | 3 | 0.0 |
# | pg-busan-3 | 10.2.0.12:5432 | Replica | running | 3 | 0.0 |
# +-------------+-----------------+----------------+---------+----+-----------+
# Confirm replication connection from DC1 Primary
psql -U postgres -c "
SELECT application_name, client_addr, state, sync_state,
write_lag, flush_lag, replay_lag
FROM pg_stat_replication;
"
# pg-busan-1 should appear with sync_state = async
Monitor Replication Lag
# Check replay lag from DC2 Standby Leader
psql -U postgres -c "
SELECT now() - pg_last_xact_replay_timestamp() AS replication_delay;
"
# Compare WAL receive position vs replay position
psql -U postgres -c "
SELECT pg_is_in_recovery(),
pg_last_wal_receive_lsn(),
pg_last_wal_replay_lsn(),
pg_last_wal_receive_lsn() - pg_last_wal_replay_lsn() AS lag_bytes;
"
# Confirm Replication Slot is active on DC1
psql -U postgres -h 10.1.0.10 -c "
SELECT slot_name, active, restart_lsn
FROM pg_replication_slots
WHERE slot_name = 'standby_cluster_busan';
"
# active must be: t
7. Step 5 — Manual Promotion When DC1 Fails
In an async 2-DC setup, DC2 cannot independently determine DC1's state, so automatic failover is not safe. The operator must confirm DC1 is fully down before manually promoting DC2.
Pre-Promotion Checklist
[ ] 1. Are all PostgreSQL nodes on DC1 fully stopped?
[ ] 2. Has DC1's Patroni released its Leader Lock? (confirm etcd TTL expiry)
[ ] 3. Are all application connections to DC1 blocked?
[ ] 4. Has the current replication lag on DC2's Standby Leader been recorded?
[ ] 5. Is there any chance DC1 is partially alive? (network partition vs full outage)
⚠️ Promoting DC2 while DC1 is still alive causes Split-Brain — data conflicts and permanent data loss. You must confirm DC1 is fully stopped, or perform STONITH, before promoting.
Option A — patronictl promote-cluster (Patroni 4.1+ recommended)
Introduced in Patroni 4.1, promote-cluster removes the standby_cluster section and verifies the result in a single command.
# Run from a DC2 node
# After STONITH and confirming DC1 is fully stopped:
patronictl -c /etc/patroni/patroni.yml promote-cluster pg-busan-standby
# Expected output:
# + Cluster: pg-busan-standby (8901234567890123456) +---------+-----------+
# | Member | Host | Role | State | TL | Lag in MB |
# +-------------+-----------------+---------+---------+----+-----------+
# | pg-busan-1 | 10.2.0.10:5432 | Leader | running | 4 | |
# | pg-busan-2 | 10.2.0.11:5432 | Replica | running | 4 | 0.0 |
# | pg-busan-3 | 10.2.0.12:5432 | Replica | running | 4 | 0.0 |
# +-------------+-----------------+---------+---------+----+-----------+
# Success: cluster has been promoted
Option B — patronictl edit-config (backward compatible)
# Remove the standby_cluster section by setting it to null
patronictl -c /etc/patroni/patroni.yml edit-config \
--set standby_cluster=null \
--force
# Verify promotion
patronictl -c /etc/patroni/patroni.yml list
# Confirm PostgreSQL is actually running as Primary
psql -U postgres -h 10.2.0.10 -c "SELECT pg_is_in_recovery();"
# Result: f (false) -> successfully promoted to Primary
After Promotion — Switch Application Traffic
# Redirect HAProxy or DNS to the DC2 endpoint
systemctl reload haproxy
# Verify connection
psql -h haproxy-busan -p 5000 -U appuser -c \
"SELECT inet_server_addr(), pg_is_in_recovery();"
# Result: 10.2.0.1x | f -> connected to DC2 Primary
8. Step 6 — Return DC1 to Standby After Recovery
Once DC1 infrastructure is restored, reconfigure DC1 as a new Standby Cluster to receive WAL from DC2 (now the Primary).
Register a Permanent Slot on DC2
# DC2 is now Primary — register a slot for DC1
patronictl -c /etc/patroni/patroni.yml edit-config
# Add to DC2 Dynamic Configuration
slots:
standby_cluster_seoul:
type: physical
cluster_type: primary
Reconfigure DC1 as a Standby Cluster
# DC1 patroni.yml — add standby_cluster section
bootstrap:
dcs:
standby_cluster:
host: 10.2.0.10,10.2.0.11,10.2.0.12
port: 5432
primary_slot_name: standby_cluster_seoul
create_replica_methods:
- basebackup
# Re-initialize DC1's data directory (full re-sync from DC2)
# Recommended: full wipe since DC1 data may be inconsistent
rm -rf /var/lib/postgresql/17/main/*
# Start Patroni — it will basebackup from DC2 automatically
systemctl start patroni
# Check replication state
patronictl -c /etc/patroni/patroni.yml topology
Using demote-cluster (Patroni 4.1+)
With Patroni 4.1, demote-cluster converts an existing Primary Cluster to a Standby Cluster. Run it against DC1's scope to reconfigure DC1 as a standby pointing to DC2.
# Use DC1's patronictl config to demote DC1's scope
patronictl -c /etc/patroni/patroni-seoul.yml demote-cluster pg-seoul-cluster \
--standby-config host=10.2.0.10,10.2.0.11,10.2.0.12 \
--standby-config port=5432 \
--standby-config primary_slot_name=standby_cluster_seoul
9. Patroni 4.1 Commands: promote-cluster / demote-cluster
Patroni 4.1 introduced dedicated commands for Standby Cluster lifecycle management. Both are safer and more explicit than manually editing standby_cluster in edit-config.
| Command | Purpose | What it does |
|---|---|---|
patronictl promote-cluster | Standby → Primary | Removes standby_cluster section + verifies result |
patronictl demote-cluster | Primary → Standby | Inserts standby_cluster section + ensures clean demotion |
# promote-cluster example
patronictl -c /etc/patroni/patroni.yml promote-cluster pg-busan-standby
# demote-cluster example (DC1 back to Standby)
patronictl -c /etc/patroni/patroni-seoul.yml demote-cluster pg-seoul-cluster \
--standby-config host=10.2.0.10 \
--standby-config port=5432 \
--standby-config primary_slot_name=standby_cluster_seoul
Both commands are Patroni 4.1+ only. They include hardened logic to prevent mid-transition re-promotions. Use the latest available version when possible.
10. Common Troubleshooting
Issue 1: Standby Leader Cannot Connect to DC1
WARNING: master_start_timeout: Failed to connect to 10.1.0.10:5432
ERROR: standby_cluster: no primary found
# Test a direct replication connection from DC2 to DC1
psql "host=10.1.0.10,10.1.0.11,10.1.0.12 \
port=5432 \
user=replicator \
password=SecureRepPass123! \
target_session_attrs=read-write \
sslmode=require" \
-c "SELECT pg_is_in_recovery(), inet_server_addr();"
# Re-check whether DC2 IPs are allowed in pg_hba.conf
psql -U postgres -h 10.1.0.10 -c "SELECT * FROM pg_hba_file_rules();"
Issue 2: "postgresql.conf not found" Error During Bootstrap
FATAL: Patroni expects to find postgresql.conf in PGDATA of the remote primary
This occurs on Debian/Ubuntu package installs where postgresql.conf lives in /etc/postgresql/17/main/ while PGDATA is /var/lib/postgresql/17/main/.
# Create a symlink in PGDATA on DC1
ln -s /etc/postgresql/17/main/postgresql.conf \
/var/lib/postgresql/17/main/postgresql.conf
Issue 3: DC1 Comes Back Alive After Promotion — Split-Brain
DC1 recovers earlier than expected and starts accepting writes, creating a Split-Brain where both DCs serve writes simultaneously.
# 1. Immediately force-stop all PostgreSQL on DC1
ssh pg-seoul-1 "systemctl stop patroni && systemctl stop postgresql"
ssh pg-seoul-2 "systemctl stop patroni && systemctl stop postgresql"
ssh pg-seoul-3 "systemctl stop patroni && systemctl stop postgresql"
# 2. Assess DC2's current state and the scope of data loss
psql -h 10.2.0.10 -U postgres -c "
SELECT now(), pg_current_wal_lsn(), timeline_id
FROM pg_control_checkpoint();
"
# 3. Inspect DC1's diverged WAL records before reinitializing (if possible)
pg_waldump -n 1000 /var/lib/postgresql/17/main/pg_wal/
# 4. Re-initialize DC1 as a Standby Cluster
rm -rf /var/lib/postgresql/17/main/*
systemctl start patroni
Issue 4: pg_rewind Fails — Cannot Rejoin as Standby
pg_rewind: error: could not find previous WAL record at 0/3000000
pg_rewind requires either data-checksums enabled at initdb time or wal_log_hints = on.
# Check whether data-checksums is enabled
pg_controldata /var/lib/postgresql/17/main | grep "Data page checksum"
# Check wal_log_hints
psql -U postgres -c "SHOW wal_log_hints;"
# Force re-initialize without pg_rewind
patronictl -c /etc/patroni/patroni.yml reinit pg-busan-standby pg-busan-1 --force
References
- Patroni Official Documentation — Standby Cluster
- Patroni Official Documentation — Multi-Datacenter HA Configuration
- Patroni Official Documentation — patronictl
- CYBERTEC — Patroni: Cascading Replication with Standby Cluster
- Percona — Performing Standby Datacentre Promotions of a Patroni Cluster
- Patroni Release Notes 4.1.2 — promote-cluster / demote-cluster