Thursday, June 4, 2026
All posts
Lv.3 IntermediatePostgreSQL
20 min readLv.3 Intermediate
SeriesOperating Patroni H/A Across Multiple Regions · Part 3View series hub

Operating Patroni H/A Across Multiple Regions — Part 3: Async Replication + Standby Cluster Setup

Operating Patroni H/A Across Multiple Regions — Part 3: Async Replication + Standby Cluster Setup

Implement Part 1's Pattern B in practice. Configure a DC1 Primary Cluster and a DC2 Standby Cluster with independent etcd rings connected only by async WAL Streaming. Covers the full DR lifecycle: setting up the Replication Slot, writing patroni.yml for the Standby Cluster, verifying replication lag, manually promoting DC2 when DC1 fails, and reversing the roles after DC1 recovers. Includes Patroni 4.1's promote-cluster and demote-cluster commands.

Series — Operating Patroni H/A Across Multiple Regions

  • Part 1 — Fundamentals and Architecture Design Principles
  • Part 2 — Synchronous Multi-DC Setup in Practice
  • Part 3 — Async Replication + Standby Cluster Setup (this post)
  • Part 4 — Split-Brain Prevention (STONITH, Watchdog, Quorum)
  • Part 5 — Failover Runbook and DR Drills
  • Part 6 — Monitoring, Operational Automation, and Best Practices

Table of Contents

  1. What Is a Standby Cluster?
  2. Lab Environment and Architecture Overview
  3. Step 1 — Prepare a Replication Slot on DC1
  4. Step 2 — Configure DC2's Independent etcd Cluster
  5. Step 3 — Write patroni.yml for the DC2 Standby Cluster
  6. Step 4 — Start the Standby Cluster and Verify Replication
  7. Step 5 — Manual Promotion When DC1 Fails
  8. Step 6 — Return DC1 to Standby After Recovery
  9. Patroni 4.1 Commands: promote-cluster / demote-cluster
  10. Common Troubleshooting
  11. References

1. What Is a Standby Cluster?

Patroni's Standby Cluster feature runs cascading replication to a remote data center. Unlike a regular Replica node, a Standby Cluster maintains its own full HA structure inside DC2 (independent etcd + Patroni leader election) while keeping all its data synchronized with the WAL stream from DC1's Primary.

DC2 contains a special Standby Leader node. Inside DC2, the Standby Leader behaves like a regular Leader (holds the DCS lock, manages Cascade Replicas), but its data comes entirely from DC1's Primary via streaming replication.

Why Choose Standby Cluster?

  • RPO > 0 is acceptable: when inter-region latency is high or write performance degradation is unacceptable
  • Cost reduction: 2-DC async setup instead of 3-DC synchronous, minimizing infrastructure cost
  • Regional read service: use the DC2 Standby Leader as a read-only endpoint to reduce latency
  • DR test isolation: run failover simulations in DC2 independently, with no impact on DC1
  • Cloud migration: use as a Zero-Downtime Migration path when moving regions or from on-premises to cloud

⚠️ The Standby Cluster and Primary Cluster must never share the same DCS scope. Always use independent etcd clusters or different namespaces.


2. Lab Environment and Architecture Overview

Node Layout

NodeRegionIP (example)Role
pg-seoul-1ap-northeast-2 (Seoul)10.1.0.10PostgreSQL Primary + etcd
pg-seoul-2ap-northeast-2 (Seoul)10.1.0.11PostgreSQL Replica + etcd
pg-seoul-3ap-northeast-2 (Seoul, AZ spread)10.1.0.12PostgreSQL Replica + etcd
pg-busan-1on-premise (Busan DR)10.2.0.10Standby Leader + etcd
pg-busan-2on-premise (Busan DR)10.2.0.11Cascade Replica + etcd
pg-busan-3on-premise (Busan DR)10.2.0.12Cascade Replica + etcd

Each DC runs an independent 3-node etcd cluster. The two etcd clusters do not communicate with each other. Replication flows exclusively through PostgreSQL WAL Streaming.

Full Architecture


3. Step 1 — Prepare a Replication Slot on DC1

For the Standby Cluster to connect to DC1 and receive WAL without gaps, DC1 must hold a Replication Slot so the WAL segments are not discarded. Patroni's Permanent Replication Slot feature registers the slot in the DCS, keeping it alive even after a failover.

Register the Permanent Slot on DC1

# Add a Permanent Slot to DC1's dynamic config
patronictl -c /etc/patroni/patroni.yml edit-config

In the editor, add the slots section:

# DC1 Dynamic Configuration (stored in DCS)
slots:
  standby_cluster_busan:
    type: physical
    cluster_type: primary    # Created and maintained only on the Primary node

Verify the slot was created:

# Run on DC1 Primary
psql -U postgres -c "
  SELECT slot_name, slot_type, active, restart_lsn
  FROM pg_replication_slots
  WHERE slot_name = 'standby_cluster_busan';
"

# Expected output:
#       slot_name        | slot_type | active | restart_lsn
# -----------------------+-----------+--------+-------------
#  standby_cluster_busan | physical  | f      | 0/3000000

active: f means DC2 has not connected yet. After starting DC2, confirm it changes to active: t.

Update DC1's pg_hba for DC2 Replication

All DC2 nodes must be allowed to open replication connections to DC1.

# DC1 patroni.yml — add pg_hba entries
postgresql:
  pg_hba:
    - host replication replicator 10.2.0.10/32 md5   # pg-busan-1
    - host replication replicator 10.2.0.11/32 md5   # pg-busan-2
    - host replication replicator 10.2.0.12/32 md5   # pg-busan-3
# Apply pg_hba changes without a restart
patronictl -c /etc/patroni/patroni.yml reload pg-seoul-cluster

4. Step 2 — Configure DC2's Independent etcd Cluster

DC2's etcd cluster operates completely independently of DC1's etcd. The setup mirrors Part 2's Step 2 — only the IPs and node names change to match DC2.

# /etc/etcd/etcd.conf.yml (Busan node 1)
name: etcd-busan-1
data-dir: /var/lib/etcd/data

listen-client-urls: https://10.2.0.10:2379,https://127.0.0.1:2379
advertise-client-urls: https://10.2.0.10:2379

listen-peer-urls: https://10.2.0.10:2380
initial-advertise-peer-urls: https://10.2.0.10:2380

# DC2 nodes only — completely unrelated to DC1's etcd
initial-cluster: >
  etcd-busan-1=https://10.2.0.10:2380,
  etcd-busan-2=https://10.2.0.11:2380,
  etcd-busan-3=https://10.2.0.12:2380
initial-cluster-state: new
initial-cluster-token: pg-busan-standby-cluster-v1  # Different token from DC1

client-transport-security:
  cert-file: /etc/etcd/ssl/etcd-busan-1.pem
  key-file: /etc/etcd/ssl/etcd-busan-1-key.pem
  trusted-ca-file: /etc/etcd/ssl/ca.pem
  client-cert-auth: true

peer-transport-security:
  cert-file: /etc/etcd/ssl/etcd-busan-1.pem
  key-file: /etc/etcd/ssl/etcd-busan-1-key.pem
  trusted-ca-file: /etc/etcd/ssl/ca.pem
  peer-client-cert-auth: true

# Intra-DC traffic — default timeouts are fine
heartbeat-interval: 100
election-timeout: 1000

5. Step 3 — Write patroni.yml for the DC2 Standby Cluster

The critical part of a Standby Cluster config is the bootstrap.dcs.standby_cluster section. This is where DC1's connection details and the Replication Slot name are declared. These values apply only at initial bootstrap; subsequent changes must go through the DCS via patronictl edit-config.

# /etc/patroni/patroni.yml (pg-busan-1 — Standby Leader candidate)

scope: pg-busan-standby        # Must differ from DC1's scope name
namespace: /db/
name: pg-busan-1               # Must not match any DC1 member name

restapi:
  listen: 0.0.0.0:8008
  connect_address: 10.2.0.10:8008
  certfile: /etc/patroni/ssl/patroni.pem
  keyfile: /etc/patroni/ssl/patroni-key.pem
  cafile: /etc/patroni/ssl/ca.pem

# Connect to DC2's independent etcd
etcd3:
  hosts:
    - 10.2.0.10:2379
    - 10.2.0.11:2379
    - 10.2.0.12:2379
  protocol: https
  cacert: /etc/etcd/ssl/ca.pem
  cert: /etc/etcd/ssl/etcd-busan-1.pem
  key: /etc/etcd/ssl/etcd-busan-1-key.pem

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 104857600   # 100 MB: generous allowance for inter-region lag

    # Core: Standby Cluster config
    standby_cluster:
      # List all DC1 Primary Cluster nodes (ensures connection survives DC1 internal failover)
      host: 10.1.0.10,10.1.0.11,10.1.0.12
      port: 5432
      # Must exactly match the Permanent Slot name created on DC1
      primary_slot_name: standby_cluster_busan
      create_replica_methods:
        - basebackup

    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
        wal_level: replica
        hot_standby: "on"
        max_wal_senders: 10
        max_replication_slots: 10
        wal_log_hints: "on"

  pg_hba:
    - local   all             all                         trust
    - host    all             all         127.0.0.1/32    md5
    - host    replication     replicator  10.2.0.0/24     md5

postgresql:
  listen: 0.0.0.0:5432
  connect_address: 10.2.0.10:5432
  data_dir: /var/lib/postgresql/17/main
  bin_dir: /usr/lib/postgresql/17/bin
  config_dir: /etc/postgresql/17/main

  authentication:
    replication:
      username: replicator
      password: "SecureRepPass123!"
    superuser:
      username: postgres
      password: "SecureSuperPass123!"
    rewind:
      username: rewind_user
      password: "SecureRewindPass123!"

tags:
  nofailover: false
  noloadbalance: false
  dc: busan

Node naming rule: Every DC2 node name (pg-busan-1, pg-busan-2, pg-busan-3) must be unique across both clusters — no overlap with any DC1 member name. Overlapping names cause a Silent Failure where DC1 misidentifies DC2 nodes as Synchronous Standbys, creating data loss risk.


6. Step 4 — Start the Standby Cluster and Verify Replication

Start the Standby Cluster

# DC2 — start pg-busan-1 first (bootstraps as Standby Leader)
systemctl enable --now patroni

# Monitor bootstrap progress
journalctl -fu patroni

# Expected log flow:
# INFO: trying to bootstrap a standby leader
# INFO: trying to use basebackup from 10.1.0.10:5432
# INFO: replica has been created using basebackup
# INFO: bootstrapped as a standby leader

# Once pg-busan-1 is confirmed as Standby Leader, start the remaining nodes
ssh pg-busan-2 "systemctl enable --now patroni"
ssh pg-busan-3 "systemctl enable --now patroni"

Verify Replication State

# Check Standby Cluster topology from DC2
patronictl -c /etc/patroni/patroni.yml topology

# Expected output:
# + Cluster: pg-busan-standby (8901234567890123456) +----------------+-----------+
# | Member      | Host            | Role           | State   | TL | Lag in MB |
# +-------------+-----------------+----------------+---------+----+-----------+
# | pg-busan-1  | 10.2.0.10:5432  | Standby Leader | running |  3 |       0.0 |
# | pg-busan-2  | 10.2.0.11:5432  | Replica        | running |  3 |       0.0 |
# | pg-busan-3  | 10.2.0.12:5432  | Replica        | running |  3 |       0.0 |
# +-------------+-----------------+----------------+---------+----+-----------+

# Confirm replication connection from DC1 Primary
psql -U postgres -c "
  SELECT application_name, client_addr, state, sync_state,
         write_lag, flush_lag, replay_lag
  FROM pg_stat_replication;
"
# pg-busan-1 should appear with sync_state = async

Monitor Replication Lag

# Check replay lag from DC2 Standby Leader
psql -U postgres -c "
  SELECT now() - pg_last_xact_replay_timestamp() AS replication_delay;
"

# Compare WAL receive position vs replay position
psql -U postgres -c "
  SELECT pg_is_in_recovery(),
         pg_last_wal_receive_lsn(),
         pg_last_wal_replay_lsn(),
         pg_last_wal_receive_lsn() - pg_last_wal_replay_lsn() AS lag_bytes;
"

# Confirm Replication Slot is active on DC1
psql -U postgres -h 10.1.0.10 -c "
  SELECT slot_name, active, restart_lsn
  FROM pg_replication_slots
  WHERE slot_name = 'standby_cluster_busan';
"
# active must be: t

7. Step 5 — Manual Promotion When DC1 Fails

In an async 2-DC setup, DC2 cannot independently determine DC1's state, so automatic failover is not safe. The operator must confirm DC1 is fully down before manually promoting DC2.

Pre-Promotion Checklist

[ ] 1. Are all PostgreSQL nodes on DC1 fully stopped?
[ ] 2. Has DC1's Patroni released its Leader Lock? (confirm etcd TTL expiry)
[ ] 3. Are all application connections to DC1 blocked?
[ ] 4. Has the current replication lag on DC2's Standby Leader been recorded?
[ ] 5. Is there any chance DC1 is partially alive? (network partition vs full outage)

⚠️ Promoting DC2 while DC1 is still alive causes Split-Brain — data conflicts and permanent data loss. You must confirm DC1 is fully stopped, or perform STONITH, before promoting.

Option A — patronictl promote-cluster (Patroni 4.1+ recommended)

Introduced in Patroni 4.1, promote-cluster removes the standby_cluster section and verifies the result in a single command.

# Run from a DC2 node
# After STONITH and confirming DC1 is fully stopped:
patronictl -c /etc/patroni/patroni.yml promote-cluster pg-busan-standby

# Expected output:
# + Cluster: pg-busan-standby (8901234567890123456) +---------+-----------+
# | Member      | Host            | Role    | State   | TL | Lag in MB |
# +-------------+-----------------+---------+---------+----+-----------+
# | pg-busan-1  | 10.2.0.10:5432  | Leader  | running |  4 |           |
# | pg-busan-2  | 10.2.0.11:5432  | Replica | running |  4 |       0.0 |
# | pg-busan-3  | 10.2.0.12:5432  | Replica | running |  4 |       0.0 |
# +-------------+-----------------+---------+---------+----+-----------+
# Success: cluster has been promoted

Option B — patronictl edit-config (backward compatible)

# Remove the standby_cluster section by setting it to null
patronictl -c /etc/patroni/patroni.yml edit-config \
  --set standby_cluster=null \
  --force

# Verify promotion
patronictl -c /etc/patroni/patroni.yml list

# Confirm PostgreSQL is actually running as Primary
psql -U postgres -h 10.2.0.10 -c "SELECT pg_is_in_recovery();"
# Result: f (false) -> successfully promoted to Primary

After Promotion — Switch Application Traffic

# Redirect HAProxy or DNS to the DC2 endpoint
systemctl reload haproxy

# Verify connection
psql -h haproxy-busan -p 5000 -U appuser -c \
  "SELECT inet_server_addr(), pg_is_in_recovery();"
# Result: 10.2.0.1x | f -> connected to DC2 Primary

8. Step 6 — Return DC1 to Standby After Recovery

Once DC1 infrastructure is restored, reconfigure DC1 as a new Standby Cluster to receive WAL from DC2 (now the Primary).

Register a Permanent Slot on DC2

# DC2 is now Primary — register a slot for DC1
patronictl -c /etc/patroni/patroni.yml edit-config
# Add to DC2 Dynamic Configuration
slots:
  standby_cluster_seoul:
    type: physical
    cluster_type: primary

Reconfigure DC1 as a Standby Cluster

# DC1 patroni.yml — add standby_cluster section
bootstrap:
  dcs:
    standby_cluster:
      host: 10.2.0.10,10.2.0.11,10.2.0.12
      port: 5432
      primary_slot_name: standby_cluster_seoul
      create_replica_methods:
        - basebackup
# Re-initialize DC1's data directory (full re-sync from DC2)
# Recommended: full wipe since DC1 data may be inconsistent
rm -rf /var/lib/postgresql/17/main/*

# Start Patroni — it will basebackup from DC2 automatically
systemctl start patroni

# Check replication state
patronictl -c /etc/patroni/patroni.yml topology

Using demote-cluster (Patroni 4.1+)

With Patroni 4.1, demote-cluster converts an existing Primary Cluster to a Standby Cluster. Run it against DC1's scope to reconfigure DC1 as a standby pointing to DC2.

# Use DC1's patronictl config to demote DC1's scope
patronictl -c /etc/patroni/patroni-seoul.yml demote-cluster pg-seoul-cluster \
  --standby-config host=10.2.0.10,10.2.0.11,10.2.0.12 \
  --standby-config port=5432 \
  --standby-config primary_slot_name=standby_cluster_seoul

9. Patroni 4.1 Commands: promote-cluster / demote-cluster

Patroni 4.1 introduced dedicated commands for Standby Cluster lifecycle management. Both are safer and more explicit than manually editing standby_cluster in edit-config.

CommandPurposeWhat it does
patronictl promote-clusterStandby → PrimaryRemoves standby_cluster section + verifies result
patronictl demote-clusterPrimary → StandbyInserts standby_cluster section + ensures clean demotion
# promote-cluster example
patronictl -c /etc/patroni/patroni.yml promote-cluster pg-busan-standby

# demote-cluster example (DC1 back to Standby)
patronictl -c /etc/patroni/patroni-seoul.yml demote-cluster pg-seoul-cluster \
  --standby-config host=10.2.0.10 \
  --standby-config port=5432 \
  --standby-config primary_slot_name=standby_cluster_seoul

Both commands are Patroni 4.1+ only. They include hardened logic to prevent mid-transition re-promotions. Use the latest available version when possible.


10. Common Troubleshooting

Issue 1: Standby Leader Cannot Connect to DC1

WARNING: master_start_timeout: Failed to connect to 10.1.0.10:5432
ERROR: standby_cluster: no primary found
# Test a direct replication connection from DC2 to DC1
psql "host=10.1.0.10,10.1.0.11,10.1.0.12 \
      port=5432 \
      user=replicator \
      password=SecureRepPass123! \
      target_session_attrs=read-write \
      sslmode=require" \
  -c "SELECT pg_is_in_recovery(), inet_server_addr();"

# Re-check whether DC2 IPs are allowed in pg_hba.conf
psql -U postgres -h 10.1.0.10 -c "SELECT * FROM pg_hba_file_rules();"

Issue 2: "postgresql.conf not found" Error During Bootstrap

FATAL: Patroni expects to find postgresql.conf in PGDATA of the remote primary

This occurs on Debian/Ubuntu package installs where postgresql.conf lives in /etc/postgresql/17/main/ while PGDATA is /var/lib/postgresql/17/main/.

# Create a symlink in PGDATA on DC1
ln -s /etc/postgresql/17/main/postgresql.conf \
  /var/lib/postgresql/17/main/postgresql.conf

Issue 3: DC1 Comes Back Alive After Promotion — Split-Brain

DC1 recovers earlier than expected and starts accepting writes, creating a Split-Brain where both DCs serve writes simultaneously.

# 1. Immediately force-stop all PostgreSQL on DC1
ssh pg-seoul-1 "systemctl stop patroni && systemctl stop postgresql"
ssh pg-seoul-2 "systemctl stop patroni && systemctl stop postgresql"
ssh pg-seoul-3 "systemctl stop patroni && systemctl stop postgresql"

# 2. Assess DC2's current state and the scope of data loss
psql -h 10.2.0.10 -U postgres -c "
  SELECT now(), pg_current_wal_lsn(), timeline_id
  FROM pg_control_checkpoint();
"

# 3. Inspect DC1's diverged WAL records before reinitializing (if possible)
pg_waldump -n 1000 /var/lib/postgresql/17/main/pg_wal/

# 4. Re-initialize DC1 as a Standby Cluster
rm -rf /var/lib/postgresql/17/main/*
systemctl start patroni

Issue 4: pg_rewind Fails — Cannot Rejoin as Standby

pg_rewind: error: could not find previous WAL record at 0/3000000

pg_rewind requires either data-checksums enabled at initdb time or wal_log_hints = on.

# Check whether data-checksums is enabled
pg_controldata /var/lib/postgresql/17/main | grep "Data page checksum"

# Check wal_log_hints
psql -U postgres -c "SHOW wal_log_hints;"

# Force re-initialize without pg_rewind
patronictl -c /etc/patroni/patroni.yml reinit pg-busan-standby pg-busan-1 --force

References

  • Patroni Official Documentation — Standby Cluster
  • Patroni Official Documentation — Multi-Datacenter HA Configuration
  • Patroni Official Documentation — patronictl
  • CYBERTEC — Patroni: Cascading Replication with Standby Cluster
  • Percona — Performing Standby Datacentre Promotions of a Patroni Cluster
  • Patroni Release Notes 4.1.2 — promote-cluster / demote-cluster

Share This Article

Series Navigation

Operating Patroni H/A Across Multiple Regions

Current part 3 · 6 published

Explore this topic·Start with featured series

한국어

Follow new posts via RSS

Use RSS to get new posts and series updates directly.

Open RSS Guide