Operating Patroni H/A Across Multiple Regions — Part 2: Synchronous Multi-DC Setup in Practice
Put Part 1's Pattern A into production. Deploy PostgreSQL 17, Patroni 4.1.x, and etcd 3.5.x across Seoul, Tokyo, and Singapore, then walk step by step through TLS certificate issuance, etcd cluster setup, synchronous replication config, HAProxy routing, and automated failover verification.
Series — Operating Patroni H/A Across Multiple Regions
- Part 1 — Fundamentals and Architecture Design Principles
- Part 2 — Synchronous Multi-DC Setup in Practice (this post)
- Part 3 — Asynchronous Replication + Standby Cluster Setup
- Part 4 — Split-Brain Prevention (STONITH, Watchdog, Quorum)
- Part 5 — Failover Runbook and DR Drills
- Part 6 — Monitoring, Operational Automation, and Best Practices
Table of Contents
- Lab Environment and Prerequisites
- Step 1 — Issue TLS Certificates
- Step 2 — Configure the etcd Cluster
- Step 3 — Initialize PostgreSQL and Install Patroni
- Step 4 — Write patroni.yml
- Step 5 — HAProxy Connection Routing
- Step 6 — Cluster Verification and Failover Testing
- Common Troubleshooting
- References
1. Lab Environment and Prerequisites
This part implements Part 1's Pattern A (3 DCs, synchronous replication, automatic failover) as a real deployment. The steps below are based on the environment shown, but apply equally to AWS/GCP/Azure multi-region VMs or on-premises setups.
Node Layout
| Node | Region | IP (example) | Role |
|---|---|---|---|
| pg-seoul-1 | ap-northeast-2 (Seoul) | 10.1.0.10 | PostgreSQL Primary + etcd |
| pg-tokyo-1 | ap-northeast-1 (Tokyo) | 10.2.0.10 | PostgreSQL Replica + etcd |
| pg-singapore-1 | ap-southeast-1 (Singapore) | 10.3.0.10 | PostgreSQL Replica + etcd |
| haproxy-1 | ap-northeast-2 (Seoul, example) | 10.1.0.20 | HAProxy (application endpoint) |
Recommended specs (per node): 4 vCPU, 8 GB RAM, NVMe SSD OS: Ubuntu 22.04 LTS (or RHEL 9) Software: PostgreSQL 17, Patroni 4.1.x, etcd 3.5.x, HAProxy 2.8+
Inter-Region Port Allowlist
Before configuring the multi-region setup, open these ports in your Security Groups or firewall.
| Port | Purpose | Direction |
|---|---|---|
| 5432 | PostgreSQL | node ↔ node, app → node |
| 8008 | Patroni REST API | node ↔ node, HAProxy → node |
| 2379 | etcd client | node ↔ etcd |
| 2380 | etcd peer | etcd ↔ etcd |
2. Step 1 — Issue TLS Certificates
Inter-region traffic may traverse the public internet, so TLS is a hard requirement for both etcd and the Patroni REST API — not optional. This guide uses cfssl to set up a self-signed CA. For production, HashiCorp Vault or AWS ACM Private CA is preferable.
Subject Alternative Names (SANs) must include the node's IP address. If the IP is missing from the SAN, etcd TLS verification will fail and the setup will break at the very first step.
Generate the CA and Node Certificates
# Install cfssl (run on all nodes)
apt-get install -y golang-cfssl
# Create working directory
mkdir -p /etc/etcd/ssl && cd /etc/etcd/ssl
# Create CA config
cat > ca-config.json <<'EOF'
{
"signing": {
"default": { "expiry": "87600h" },
"profiles": {
"etcd": {
"expiry": "87600h",
"usages": ["signing", "key encipherment", "server auth", "client auth"]
}
}
}
}
EOF
# Create CA CSR
cat > ca-csr.json <<'EOF'
{
"CN": "etcd-ca",
"key": { "algo": "rsa", "size": 2048 },
"names": [{ "O": "MyOrg", "OU": "DBA Team" }]
}
EOF
# Issue CA certificate
cfssl gencert -initca ca-csr.json | cfssljson -bare ca
# Issue a certificate for the Seoul etcd node
cat > etcd-seoul-csr.json <<'EOF'
{
"CN": "etcd-seoul",
"hosts": ["10.1.0.10", "pg-seoul-1", "127.0.0.1"],
"key": { "algo": "rsa", "size": 2048 }
}
EOF
cfssl gencert \
-ca=ca.pem \
-ca-key=ca-key.pem \
-config=ca-config.json \
-profile=etcd \
etcd-seoul-csr.json | cfssljson -bare etcd-seoul
# Repeat for Tokyo and Singapore (change only the IP and hostname in "hosts")
Distribute the resulting ca.pem, etcd-{region}.pem, and etcd-{region}-key.pem to each node's /etc/etcd/ssl/ directory.
3. Step 2 — Configure the etcd Cluster
Place one etcd node in each of the three DCs. etcd uses the Raft consensus algorithm and remains operational as long as at least 2 of 3 nodes are alive.
Install etcd
# Run on all nodes
ETCD_VER=v3.5.17
curl -LO https://github.com/etcd-io/etcd/releases/download/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz
tar xzf etcd-${ETCD_VER}-linux-amd64.tar.gz
cp etcd-${ETCD_VER}-linux-amd64/etcd* /usr/local/bin/
etcd Config — Seoul Node (pg-seoul-1)
All three nodes must appear in initial-cluster. For the Tokyo and Singapore nodes, change only the name, listen-client-urls, listen-peer-urls, and advertise-* values to their respective IPs. Keep initial-cluster identical across all nodes.
# /etc/etcd/etcd.conf.yml (Seoul node)
name: etcd-seoul
data-dir: /var/lib/etcd/data
# Client traffic (Patroni -> etcd)
listen-client-urls: https://10.1.0.10:2379,https://127.0.0.1:2379
advertise-client-urls: https://10.1.0.10:2379
# Peer traffic (etcd <-> etcd, across regions)
listen-peer-urls: https://10.1.0.10:2380
initial-advertise-peer-urls: https://10.1.0.10:2380
# Initial cluster membership (list all 3 nodes)
initial-cluster: >
etcd-seoul=https://10.1.0.10:2380,
etcd-tokyo=https://10.2.0.10:2380,
etcd-singapore=https://10.3.0.10:2380
initial-cluster-state: new
initial-cluster-token: pg-multi-region-cluster-v1
# TLS for client <-> etcd traffic
client-transport-security:
cert-file: /etc/etcd/ssl/etcd-seoul.pem
key-file: /etc/etcd/ssl/etcd-seoul-key.pem
trusted-ca-file: /etc/etcd/ssl/ca.pem
client-cert-auth: true
# TLS for etcd <-> etcd peer traffic
peer-transport-security:
cert-file: /etc/etcd/ssl/etcd-seoul.pem
key-file: /etc/etcd/ssl/etcd-seoul-key.pem
trusted-ca-file: /etc/etcd/ssl/ca.pem
peer-client-cert-auth: true
# Adjust heartbeat/election timeouts for multi-region RTT
# The defaults (100ms/1000ms) can become unstable when inter-region RTT is high
heartbeat-interval: 500
election-timeout: 5000
# Automatic compaction (disk usage management)
auto-compaction-mode: revision
auto-compaction-retention: "1000"
Register and Start the etcd Service
cat > /etc/systemd/system/etcd.service <<'EOF'
[Unit]
Description=etcd key-value store
After=network.target
[Service]
Type=notify
User=etcd
ExecStart=/usr/local/bin/etcd --config-file=/etc/etcd/etcd.conf.yml
Restart=always
RestartSec=5s
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable --now etcd
Verify the etcd Cluster
# List cluster members
etcdctl \
--endpoints=https://10.1.0.10:2379,https://10.2.0.10:2379,https://10.3.0.10:2379 \
--cacert=/etc/etcd/ssl/ca.pem \
--cert=/etc/etcd/ssl/etcd-seoul.pem \
--key=/etc/etcd/ssl/etcd-seoul-key.pem \
member list -w table
# Expected output:
# +------------------+---------+----------------+------------------------+------------------------+
# | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS |
# +------------------+---------+----------------+------------------------+------------------------+
# | 8e9e05c52164694d | started | etcd-seoul | https://10.1.0.10:2380 | https://10.1.0.10:2379 |
# | 9b8e05c12164694a | started | etcd-tokyo | https://10.2.0.10:2380 | https://10.2.0.10:2379 |
# | ab8e05c12164694c | started | etcd-singapore | https://10.3.0.10:2380 | https://10.3.0.10:2379 |
# +------------------+---------+----------------+------------------------+------------------------+
4. Step 3 — Initialize PostgreSQL and Install Patroni
Install PostgreSQL
# Run on all PG nodes (Ubuntu 22.04)
apt-get install -y postgresql-common
/usr/share/postgresql-common/pgdg/apt.postgresql.org.sh
apt-get install -y postgresql-17
# Patroni handles cluster initialization, so remove the default cluster
pg_dropcluster --stop 17 main
Install Patroni
# Recommended: use a Python virtual environment
apt-get install -y python3-pip python3-venv
python3 -m venv /opt/patroni
/opt/patroni/bin/pip install patroni[etcd3] psycopg[binary]
# Create symlinks
ln -s /opt/patroni/bin/patroni /usr/local/bin/patroni
ln -s /opt/patroni/bin/patronictl /usr/local/bin/patronictl
5. Step 4 — Write patroni.yml
The key to synchronous multi-DC replication is synchronous_mode: true. With this option enabled, the Primary requires WAL to be written to at least one Synchronous Replica before acknowledging a transaction commit.
patroni.yml — Seoul Node (Primary candidate)
# /etc/patroni/patroni.yml (pg-seoul-1)
scope: pg-multiregion # Cluster name (same on all nodes)
namespace: /db/ # etcd key namespace
name: pg-seoul-1 # Unique node name (different per node)
# -- REST API -------------------------------------------------------
restapi:
listen: 0.0.0.0:8008
connect_address: 10.1.0.10:8008
# TLS so HAProxy can perform HTTPS health checks
certfile: /etc/patroni/ssl/patroni.pem
keyfile: /etc/patroni/ssl/patroni-key.pem
cafile: /etc/patroni/ssl/ca.pem
# -- DCS: etcd3 ----------------------------------------------------
etcd3:
hosts:
- 10.1.0.10:2379 # Seoul etcd
- 10.2.0.10:2379 # Tokyo etcd
- 10.3.0.10:2379 # Singapore etcd
protocol: https
cacert: /etc/etcd/ssl/ca.pem
cert: /etc/etcd/ssl/etcd-seoul.pem
key: /etc/etcd/ssl/etcd-seoul-key.pem
# -- Bootstrap (used only for initial cluster initialization) ------
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576 # Replicas more than 1 MB behind are excluded from failover candidates
# Core: enable synchronous replication
synchronous_mode: true
synchronous_mode_strict: false # If true, writes block when no sync replica is available
synchronous_node_count: 1 # Maintain at least 1 sync replica
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
wal_level: replica
hot_standby: "on"
max_wal_senders: 10
max_replication_slots: 10
wal_log_hints: "on" # Required for pg_rewind
archive_mode: "on"
archive_command: 'pgbackrest --stanza=main archive-push %p'
initdb:
- encoding: UTF8
- data-checksums # Improves pg_rewind reliability
pg_hba:
- local all all trust
- host all all 127.0.0.1/32 md5
- host replication replicator 10.0.0.0/8 md5
- hostssl all all 10.0.0.0/8 md5
# -- PostgreSQL ----------------------------------------------------
postgresql:
listen: 0.0.0.0:5432
connect_address: 10.1.0.10:5432
data_dir: /var/lib/postgresql/17/main
bin_dir: /usr/lib/postgresql/17/bin
config_dir: /etc/postgresql/17/main
authentication:
replication:
username: replicator
password: "SecureRepPass123!"
superuser:
username: postgres
password: "SecureSuperPass123!"
rewind:
username: rewind_user
password: "SecureRewindPass123!"
parameters:
unix_socket_directories: '/var/run/postgresql'
# -- Tags ----------------------------------------------------------
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
dc: seoul # Custom tag for HAProxy region-based routing
For Tokyo and Singapore nodes, update name, restapi.connect_address, postgresql.connect_address, tags.dc, and etcd3.cert/key (use each region's certificate).
archive_command requires pgbackrest to be installed and configured first. If WAL archiving is not needed yet, set archive_mode: "off" and enable it later.
Register and Start the Patroni Service
cat > /etc/systemd/system/patroni.service <<'EOF'
[Unit]
Description=Patroni PostgreSQL HA
After=syslog.target network.target etcd.service
Requires=etcd.service
[Service]
Type=simple
User=postgres
Group=postgres
ExecStart=/usr/local/bin/patroni /etc/patroni/patroni.yml
Restart=on-failure
RestartSec=5s
KillMode=process
TimeoutStopSec=30
LimitNOFILE=65536
Environment="MALLOC_ARENA_MAX=1"
[Install]
WantedBy=multi-user.target
EOF
# Start Seoul first (it becomes the initial Primary)
systemctl enable --now patroni
# Once Seoul is confirmed as Primary, start Tokyo then Singapore
ssh pg-tokyo-1 "systemctl enable --now patroni"
ssh pg-singapore-1 "systemctl enable --now patroni"
6. Step 5 — HAProxy Connection Routing
Patroni's REST API is purpose-built for HAProxy health checks. HAProxy determines PostgreSQL node status through the Patroni REST API — it does not connect directly to PostgreSQL to check state.
/primary: returns HTTP 200 only when the node holds the Leader Lock/replica: returns 200 when the node is a healthy running Replica/synchronous: returns 200 only for a Synchronous Replica node
In a synchronous replication setup, routing reads to /synchronous backends ensures read durability — only data-safe replicas serve reads.
haproxy.cfg
# /etc/haproxy/haproxy.cfg
global
maxconn 200
log /dev/log local0
defaults
log global
mode tcp
retries 2
timeout client 30m
timeout connect 4s
timeout server 30m
timeout check 5s
# -- Stats dashboard -----------------------------------------------
listen stats
bind *:7000
mode http
stats enable
stats uri /
stats refresh 10s
# -- Write traffic -> Primary only (port 5000) ---------------------
listen pg_primary
bind *:5000
option httpchk OPTIONS /primary
http-check expect status 200
default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
server pg-seoul-1 10.1.0.10:5432 check port 8008 check-ssl ssl ca-file /etc/haproxy/ssl/ca.pem
server pg-tokyo-1 10.2.0.10:5432 check port 8008 check-ssl ssl ca-file /etc/haproxy/ssl/ca.pem
server pg-singapore-1 10.3.0.10:5432 check port 8008 check-ssl ssl ca-file /etc/haproxy/ssl/ca.pem
# -- Read traffic -> Replica round-robin (port 5001) ---------------
listen pg_replicas
bind *:5001
balance roundrobin
option httpchk OPTIONS /replica
http-check expect status 200
default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
server pg-seoul-1 10.1.0.10:5432 check port 8008 check-ssl ssl ca-file /etc/haproxy/ssl/ca.pem
server pg-tokyo-1 10.2.0.10:5432 check port 8008 check-ssl ssl ca-file /etc/haproxy/ssl/ca.pem
server pg-singapore-1 10.3.0.10:5432 check port 8008 check-ssl ssl ca-file /etc/haproxy/ssl/ca.pem
# -- Sync Replica reads only (port 5002, read durability) ---------
listen pg_sync_replicas
bind *:5002
balance roundrobin
option httpchk OPTIONS /synchronous
http-check expect status 200
default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
server pg-seoul-1 10.1.0.10:5432 check port 8008 check-ssl ssl ca-file /etc/haproxy/ssl/ca.pem
server pg-tokyo-1 10.2.0.10:5432 check port 8008 check-ssl ssl ca-file /etc/haproxy/ssl/ca.pem
server pg-singapore-1 10.3.0.10:5432 check port 8008 check-ssl ssl ca-file /etc/haproxy/ssl/ca.pem
HAProxy marks any node returning an error on /primary as down and routes all writes to the single node returning 200 (the Primary). In the /replica and /synchronous backend groups, the Primary returns 503, so it is naturally excluded from read traffic.
7. Step 6 — Cluster Verification and Failover Testing
Check Cluster Status
# View full cluster topology
patronictl -c /etc/patroni/patroni.yml topology
# Expected output:
# + Cluster: pg-multiregion (7891234567890123456) +--------------+-----------+
# | Member | Host | Role | State | TL | Lag in MB |
# +-----------------+-----------------+--------------+---------+----+-----------+
# | pg-seoul-1 | 10.1.0.10:5432 | Leader | running | 1 | |
# | pg-tokyo-1 | 10.2.0.10:5432 | Sync Standby | running | 1 | 0.0 |
# | pg-singapore-1 | 10.3.0.10:5432 | Replica | running | 1 | 0.0 |
# +-----------------+-----------------+--------------+---------+----+-----------+
# Confirm synchronous replication state (run on Primary)
psql -U postgres -c "
SELECT application_name, sync_state, write_lag, flush_lag, replay_lag
FROM pg_stat_replication;
"
A node with sync_state = sync is the Synchronous Replica. If a node shows async, verify that synchronous_node_count in the DCS config is being applied as intended.
Simulate Automatic Failover
# -- Test 1: force-stop the Primary (Seoul) -----------------------
# On the Seoul node:
systemctl stop patroni
# Monitor failover detection from another node
watch -n1 "patronictl -c /etc/patroni/patroni.yml list"
# Tokyo or Singapore promotes to Leader within ~10–30 seconds
# (Tokyo promotes first if it was the Synchronous Replica)
# -- Test 2: planned switchover via patronictl --------------------
patronictl -c /etc/patroni/patroni.yml switchover \
--master pg-seoul-1 \
--candidate pg-tokyo-1 \
--scheduled now \
--force
# -- Test 3: single etcd node failure (verify quorum holds) -------
# Stop Singapore's etcd node
ssh pg-singapore-1 "systemctl stop etcd"
# Confirm the etcd cluster operates normally with 2/3 quorum
etcdctl \
--endpoints=https://10.1.0.10:2379,https://10.2.0.10:2379 \
--cacert=/etc/etcd/ssl/ca.pem \
--cert=/etc/etcd/ssl/etcd-seoul.pem \
--key=/etc/etcd/ssl/etcd-seoul-key.pem \
endpoint health
Restore Seoul After Failover
# Restart Patroni on Seoul — it rejoins as a Replica
systemctl start patroni
# Patroni automatically syncs via pg_basebackup from the current Primary
# Check progress:
patronictl -c /etc/patroni/patroni.yml list
# If automatic rejoin fails, reinitialize manually
patronictl -c /etc/patroni/patroni.yml reinit pg-multiregion pg-seoul-1
8. Common Troubleshooting
Issue 1: etcd TLS Connection Failure
CRITICAL: get_cluster
etcd3 exception: <class 'etcd3.exceptions.ConnectionFailedError'>
The most common cause is a missing IP address in the certificate SAN.
# Check the certificate's Subject Alternative Names
openssl x509 -in /etc/etcd/ssl/etcd-seoul.pem -text | grep -A2 "Subject Alternative"
# Test a direct etcd client connection
etcdctl --endpoints=https://10.1.0.10:2379 \
--cacert=/etc/etcd/ssl/ca.pem \
--cert=/etc/etcd/ssl/etcd-seoul.pem \
--key=/etc/etcd/ssl/etcd-seoul-key.pem \
get / --prefix --keys-only
Issue 2: HAProxy REST API TLS Health Check Failure
The check-ssl option must come after the host address in a server line. Wrong ordering causes an SSL handshake error.
# Wrong (check-ssl before the host address)
server pg-seoul-1 check-ssl ssl ca-file /etc/haproxy/ssl/ca.pem 10.1.0.10:5432 check port 8008
# Correct (check-ssl after the host address)
server pg-seoul-1 10.1.0.10:5432 check port 8008 check-ssl ssl ca-file /etc/haproxy/ssl/ca.pem
Issue 3: Writes Blocked in synchronous_mode
When synchronous_mode_strict: true is set and all Sync Replicas go down, the Primary refuses writes. In an emergency, temporarily disable strict mode.
# Check current dynamic config
patronictl -c /etc/patroni/patroni.yml show-config
# Disable strict mode temporarily
patronictl -c /etc/patroni/patroni.yml edit-config
# In the editor, change synchronous_mode_strict: false and save
Issue 4: Unstable etcd Leader Election Due to Inter-Region RTT
# Check etcd logs for election/timeout warnings
journalctl -u etcd | grep -i "election\|timeout\|heartbeat"
# Tuning principle:
# heartbeat-interval = 2–3x RTT, election-timeout = 10x heartbeat-interval
# Example for Seoul–Singapore RTT ~70ms:
# heartbeat-interval: 200
# election-timeout: 2000
Measure the actual RTT first (ping -c 20 <remote-ip>) before setting timeout values. The defaults (heartbeat 100ms, election timeout 1000ms) can become unstable when inter-region RTT exceeds 50ms.
References
- Patroni Official Documentation — Multi-Datacenter HA Configuration
- Patroni Official Documentation — REST API Health Check Endpoints
- Percona — HAProxy with Patroni Health Check Endpoints and Debugging
- pgEdge — Using Patroni to Build a Highly Available Postgres Cluster: HAProxy
- CYBERTEC — Patroni etcd Clusters: Introduction and How-To