Thursday, June 4, 2026
All posts
Lv.3 IntermediatePostgreSQL
22 min readLv.3 Intermediate
SeriesOperating Patroni H/A Across Multiple Regions · Part 2View series hub

Operating Patroni H/A Across Multiple Regions — Part 2: Synchronous Multi-DC Setup in Practice

Operating Patroni H/A Across Multiple Regions — Part 2: Synchronous Multi-DC Setup in Practice

Put Part 1's Pattern A into production. Deploy PostgreSQL 17, Patroni 4.1.x, and etcd 3.5.x across Seoul, Tokyo, and Singapore, then walk step by step through TLS certificate issuance, etcd cluster setup, synchronous replication config, HAProxy routing, and automated failover verification.

Series — Operating Patroni H/A Across Multiple Regions

  • Part 1 — Fundamentals and Architecture Design Principles
  • Part 2 — Synchronous Multi-DC Setup in Practice (this post)
  • Part 3 — Asynchronous Replication + Standby Cluster Setup
  • Part 4 — Split-Brain Prevention (STONITH, Watchdog, Quorum)
  • Part 5 — Failover Runbook and DR Drills
  • Part 6 — Monitoring, Operational Automation, and Best Practices

Table of Contents

  1. Lab Environment and Prerequisites
  2. Step 1 — Issue TLS Certificates
  3. Step 2 — Configure the etcd Cluster
  4. Step 3 — Initialize PostgreSQL and Install Patroni
  5. Step 4 — Write patroni.yml
  6. Step 5 — HAProxy Connection Routing
  7. Step 6 — Cluster Verification and Failover Testing
  8. Common Troubleshooting
  9. References

1. Lab Environment and Prerequisites

This part implements Part 1's Pattern A (3 DCs, synchronous replication, automatic failover) as a real deployment. The steps below are based on the environment shown, but apply equally to AWS/GCP/Azure multi-region VMs or on-premises setups.

Node Layout

NodeRegionIP (example)Role
pg-seoul-1ap-northeast-2 (Seoul)10.1.0.10PostgreSQL Primary + etcd
pg-tokyo-1ap-northeast-1 (Tokyo)10.2.0.10PostgreSQL Replica + etcd
pg-singapore-1ap-southeast-1 (Singapore)10.3.0.10PostgreSQL Replica + etcd
haproxy-1ap-northeast-2 (Seoul, example)10.1.0.20HAProxy (application endpoint)

Recommended specs (per node): 4 vCPU, 8 GB RAM, NVMe SSD OS: Ubuntu 22.04 LTS (or RHEL 9) Software: PostgreSQL 17, Patroni 4.1.x, etcd 3.5.x, HAProxy 2.8+

Inter-Region Port Allowlist

Before configuring the multi-region setup, open these ports in your Security Groups or firewall.

PortPurposeDirection
5432PostgreSQLnode ↔ node, app → node
8008Patroni REST APInode ↔ node, HAProxy → node
2379etcd clientnode ↔ etcd
2380etcd peeretcd ↔ etcd

2. Step 1 — Issue TLS Certificates

Inter-region traffic may traverse the public internet, so TLS is a hard requirement for both etcd and the Patroni REST API — not optional. This guide uses cfssl to set up a self-signed CA. For production, HashiCorp Vault or AWS ACM Private CA is preferable.

Subject Alternative Names (SANs) must include the node's IP address. If the IP is missing from the SAN, etcd TLS verification will fail and the setup will break at the very first step.

Generate the CA and Node Certificates

# Install cfssl (run on all nodes)
apt-get install -y golang-cfssl

# Create working directory
mkdir -p /etc/etcd/ssl && cd /etc/etcd/ssl

# Create CA config
cat > ca-config.json <<'EOF'
{
  "signing": {
    "default": { "expiry": "87600h" },
    "profiles": {
      "etcd": {
        "expiry": "87600h",
        "usages": ["signing", "key encipherment", "server auth", "client auth"]
      }
    }
  }
}
EOF

# Create CA CSR
cat > ca-csr.json <<'EOF'
{
  "CN": "etcd-ca",
  "key": { "algo": "rsa", "size": 2048 },
  "names": [{ "O": "MyOrg", "OU": "DBA Team" }]
}
EOF

# Issue CA certificate
cfssl gencert -initca ca-csr.json | cfssljson -bare ca

# Issue a certificate for the Seoul etcd node
cat > etcd-seoul-csr.json <<'EOF'
{
  "CN": "etcd-seoul",
  "hosts": ["10.1.0.10", "pg-seoul-1", "127.0.0.1"],
  "key": { "algo": "rsa", "size": 2048 }
}
EOF

cfssl gencert \
  -ca=ca.pem \
  -ca-key=ca-key.pem \
  -config=ca-config.json \
  -profile=etcd \
  etcd-seoul-csr.json | cfssljson -bare etcd-seoul

# Repeat for Tokyo and Singapore (change only the IP and hostname in "hosts")

Distribute the resulting ca.pem, etcd-{region}.pem, and etcd-{region}-key.pem to each node's /etc/etcd/ssl/ directory.


3. Step 2 — Configure the etcd Cluster

Place one etcd node in each of the three DCs. etcd uses the Raft consensus algorithm and remains operational as long as at least 2 of 3 nodes are alive.

Install etcd

# Run on all nodes
ETCD_VER=v3.5.17
curl -LO https://github.com/etcd-io/etcd/releases/download/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz
tar xzf etcd-${ETCD_VER}-linux-amd64.tar.gz
cp etcd-${ETCD_VER}-linux-amd64/etcd* /usr/local/bin/

etcd Config — Seoul Node (pg-seoul-1)

All three nodes must appear in initial-cluster. For the Tokyo and Singapore nodes, change only the name, listen-client-urls, listen-peer-urls, and advertise-* values to their respective IPs. Keep initial-cluster identical across all nodes.

# /etc/etcd/etcd.conf.yml (Seoul node)
name: etcd-seoul
data-dir: /var/lib/etcd/data

# Client traffic (Patroni -> etcd)
listen-client-urls: https://10.1.0.10:2379,https://127.0.0.1:2379
advertise-client-urls: https://10.1.0.10:2379

# Peer traffic (etcd <-> etcd, across regions)
listen-peer-urls: https://10.1.0.10:2380
initial-advertise-peer-urls: https://10.1.0.10:2380

# Initial cluster membership (list all 3 nodes)
initial-cluster: >
  etcd-seoul=https://10.1.0.10:2380,
  etcd-tokyo=https://10.2.0.10:2380,
  etcd-singapore=https://10.3.0.10:2380
initial-cluster-state: new
initial-cluster-token: pg-multi-region-cluster-v1

# TLS for client <-> etcd traffic
client-transport-security:
  cert-file: /etc/etcd/ssl/etcd-seoul.pem
  key-file: /etc/etcd/ssl/etcd-seoul-key.pem
  trusted-ca-file: /etc/etcd/ssl/ca.pem
  client-cert-auth: true

# TLS for etcd <-> etcd peer traffic
peer-transport-security:
  cert-file: /etc/etcd/ssl/etcd-seoul.pem
  key-file: /etc/etcd/ssl/etcd-seoul-key.pem
  trusted-ca-file: /etc/etcd/ssl/ca.pem
  peer-client-cert-auth: true

# Adjust heartbeat/election timeouts for multi-region RTT
# The defaults (100ms/1000ms) can become unstable when inter-region RTT is high
heartbeat-interval: 500
election-timeout: 5000

# Automatic compaction (disk usage management)
auto-compaction-mode: revision
auto-compaction-retention: "1000"

Register and Start the etcd Service

cat > /etc/systemd/system/etcd.service <<'EOF'
[Unit]
Description=etcd key-value store
After=network.target

[Service]
Type=notify
User=etcd
ExecStart=/usr/local/bin/etcd --config-file=/etc/etcd/etcd.conf.yml
Restart=always
RestartSec=5s
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload
systemctl enable --now etcd

Verify the etcd Cluster

# List cluster members
etcdctl \
  --endpoints=https://10.1.0.10:2379,https://10.2.0.10:2379,https://10.3.0.10:2379 \
  --cacert=/etc/etcd/ssl/ca.pem \
  --cert=/etc/etcd/ssl/etcd-seoul.pem \
  --key=/etc/etcd/ssl/etcd-seoul-key.pem \
  member list -w table

# Expected output:
# +------------------+---------+----------------+------------------------+------------------------+
# |        ID        | STATUS  |      NAME      |       PEER ADDRS       |      CLIENT ADDRS      |
# +------------------+---------+----------------+------------------------+------------------------+
# | 8e9e05c52164694d | started | etcd-seoul     | https://10.1.0.10:2380 | https://10.1.0.10:2379 |
# | 9b8e05c12164694a | started | etcd-tokyo     | https://10.2.0.10:2380 | https://10.2.0.10:2379 |
# | ab8e05c12164694c | started | etcd-singapore | https://10.3.0.10:2380 | https://10.3.0.10:2379 |
# +------------------+---------+----------------+------------------------+------------------------+

4. Step 3 — Initialize PostgreSQL and Install Patroni

Install PostgreSQL

# Run on all PG nodes (Ubuntu 22.04)
apt-get install -y postgresql-common
/usr/share/postgresql-common/pgdg/apt.postgresql.org.sh
apt-get install -y postgresql-17

# Patroni handles cluster initialization, so remove the default cluster
pg_dropcluster --stop 17 main

Install Patroni

# Recommended: use a Python virtual environment
apt-get install -y python3-pip python3-venv
python3 -m venv /opt/patroni
/opt/patroni/bin/pip install patroni[etcd3] psycopg[binary]

# Create symlinks
ln -s /opt/patroni/bin/patroni /usr/local/bin/patroni
ln -s /opt/patroni/bin/patronictl /usr/local/bin/patronictl

5. Step 4 — Write patroni.yml

The key to synchronous multi-DC replication is synchronous_mode: true. With this option enabled, the Primary requires WAL to be written to at least one Synchronous Replica before acknowledging a transaction commit.

patroni.yml — Seoul Node (Primary candidate)

# /etc/patroni/patroni.yml (pg-seoul-1)

scope: pg-multiregion          # Cluster name (same on all nodes)
namespace: /db/                # etcd key namespace
name: pg-seoul-1               # Unique node name (different per node)

# -- REST API -------------------------------------------------------
restapi:
  listen: 0.0.0.0:8008
  connect_address: 10.1.0.10:8008
  # TLS so HAProxy can perform HTTPS health checks
  certfile: /etc/patroni/ssl/patroni.pem
  keyfile: /etc/patroni/ssl/patroni-key.pem
  cafile: /etc/patroni/ssl/ca.pem

# -- DCS: etcd3 ----------------------------------------------------
etcd3:
  hosts:
    - 10.1.0.10:2379    # Seoul etcd
    - 10.2.0.10:2379    # Tokyo etcd
    - 10.3.0.10:2379    # Singapore etcd
  protocol: https
  cacert: /etc/etcd/ssl/ca.pem
  cert: /etc/etcd/ssl/etcd-seoul.pem
  key: /etc/etcd/ssl/etcd-seoul-key.pem

# -- Bootstrap (used only for initial cluster initialization) ------
bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576     # Replicas more than 1 MB behind are excluded from failover candidates

    # Core: enable synchronous replication
    synchronous_mode: true
    synchronous_mode_strict: false       # If true, writes block when no sync replica is available
    synchronous_node_count: 1            # Maintain at least 1 sync replica

    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
        wal_level: replica
        hot_standby: "on"
        max_wal_senders: 10
        max_replication_slots: 10
        wal_log_hints: "on"             # Required for pg_rewind
        archive_mode: "on"
        archive_command: 'pgbackrest --stanza=main archive-push %p'

  initdb:
    - encoding: UTF8
    - data-checksums                    # Improves pg_rewind reliability

  pg_hba:
    - local   all             all                           trust
    - host    all             all         127.0.0.1/32      md5
    - host    replication     replicator  10.0.0.0/8        md5
    - hostssl all             all         10.0.0.0/8        md5

# -- PostgreSQL ----------------------------------------------------
postgresql:
  listen: 0.0.0.0:5432
  connect_address: 10.1.0.10:5432
  data_dir: /var/lib/postgresql/17/main
  bin_dir: /usr/lib/postgresql/17/bin
  config_dir: /etc/postgresql/17/main

  authentication:
    replication:
      username: replicator
      password: "SecureRepPass123!"
    superuser:
      username: postgres
      password: "SecureSuperPass123!"
    rewind:
      username: rewind_user
      password: "SecureRewindPass123!"

  parameters:
    unix_socket_directories: '/var/run/postgresql'

# -- Tags ----------------------------------------------------------
tags:
  nofailover: false
  noloadbalance: false
  clonefrom: false
  nosync: false
  dc: seoul                  # Custom tag for HAProxy region-based routing

For Tokyo and Singapore nodes, update name, restapi.connect_address, postgresql.connect_address, tags.dc, and etcd3.cert/key (use each region's certificate).

archive_command requires pgbackrest to be installed and configured first. If WAL archiving is not needed yet, set archive_mode: "off" and enable it later.

Register and Start the Patroni Service

cat > /etc/systemd/system/patroni.service <<'EOF'
[Unit]
Description=Patroni PostgreSQL HA
After=syslog.target network.target etcd.service
Requires=etcd.service

[Service]
Type=simple
User=postgres
Group=postgres
ExecStart=/usr/local/bin/patroni /etc/patroni/patroni.yml
Restart=on-failure
RestartSec=5s
KillMode=process
TimeoutStopSec=30
LimitNOFILE=65536
Environment="MALLOC_ARENA_MAX=1"

[Install]
WantedBy=multi-user.target
EOF

# Start Seoul first (it becomes the initial Primary)
systemctl enable --now patroni

# Once Seoul is confirmed as Primary, start Tokyo then Singapore
ssh pg-tokyo-1 "systemctl enable --now patroni"
ssh pg-singapore-1 "systemctl enable --now patroni"

6. Step 5 — HAProxy Connection Routing

Patroni's REST API is purpose-built for HAProxy health checks. HAProxy determines PostgreSQL node status through the Patroni REST API — it does not connect directly to PostgreSQL to check state.

  • /primary: returns HTTP 200 only when the node holds the Leader Lock
  • /replica: returns 200 when the node is a healthy running Replica
  • /synchronous: returns 200 only for a Synchronous Replica node

In a synchronous replication setup, routing reads to /synchronous backends ensures read durability — only data-safe replicas serve reads.

haproxy.cfg

# /etc/haproxy/haproxy.cfg

global
  maxconn 200
  log /dev/log local0

defaults
  log global
  mode tcp
  retries 2
  timeout client  30m
  timeout connect 4s
  timeout server  30m
  timeout check   5s

# -- Stats dashboard -----------------------------------------------
listen stats
  bind *:7000
  mode http
  stats enable
  stats uri /
  stats refresh 10s

# -- Write traffic -> Primary only (port 5000) ---------------------
listen pg_primary
  bind *:5000
  option httpchk OPTIONS /primary
  http-check expect status 200
  default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
  server pg-seoul-1     10.1.0.10:5432 check port 8008 check-ssl ssl ca-file /etc/haproxy/ssl/ca.pem
  server pg-tokyo-1     10.2.0.10:5432 check port 8008 check-ssl ssl ca-file /etc/haproxy/ssl/ca.pem
  server pg-singapore-1 10.3.0.10:5432 check port 8008 check-ssl ssl ca-file /etc/haproxy/ssl/ca.pem

# -- Read traffic -> Replica round-robin (port 5001) ---------------
listen pg_replicas
  bind *:5001
  balance roundrobin
  option httpchk OPTIONS /replica
  http-check expect status 200
  default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
  server pg-seoul-1     10.1.0.10:5432 check port 8008 check-ssl ssl ca-file /etc/haproxy/ssl/ca.pem
  server pg-tokyo-1     10.2.0.10:5432 check port 8008 check-ssl ssl ca-file /etc/haproxy/ssl/ca.pem
  server pg-singapore-1 10.3.0.10:5432 check port 8008 check-ssl ssl ca-file /etc/haproxy/ssl/ca.pem

# -- Sync Replica reads only (port 5002, read durability) ---------
listen pg_sync_replicas
  bind *:5002
  balance roundrobin
  option httpchk OPTIONS /synchronous
  http-check expect status 200
  default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
  server pg-seoul-1     10.1.0.10:5432 check port 8008 check-ssl ssl ca-file /etc/haproxy/ssl/ca.pem
  server pg-tokyo-1     10.2.0.10:5432 check port 8008 check-ssl ssl ca-file /etc/haproxy/ssl/ca.pem
  server pg-singapore-1 10.3.0.10:5432 check port 8008 check-ssl ssl ca-file /etc/haproxy/ssl/ca.pem

HAProxy marks any node returning an error on /primary as down and routes all writes to the single node returning 200 (the Primary). In the /replica and /synchronous backend groups, the Primary returns 503, so it is naturally excluded from read traffic.


7. Step 6 — Cluster Verification and Failover Testing

Check Cluster Status

# View full cluster topology
patronictl -c /etc/patroni/patroni.yml topology

# Expected output:
# + Cluster: pg-multiregion (7891234567890123456) +--------------+-----------+
# | Member          | Host            | Role         | State   | TL | Lag in MB |
# +-----------------+-----------------+--------------+---------+----+-----------+
# | pg-seoul-1      | 10.1.0.10:5432  | Leader       | running |  1 |           |
# | pg-tokyo-1      | 10.2.0.10:5432  | Sync Standby | running |  1 |       0.0 |
# | pg-singapore-1  | 10.3.0.10:5432  | Replica      | running |  1 |       0.0 |
# +-----------------+-----------------+--------------+---------+----+-----------+

# Confirm synchronous replication state (run on Primary)
psql -U postgres -c "
  SELECT application_name, sync_state, write_lag, flush_lag, replay_lag
  FROM pg_stat_replication;
"

A node with sync_state = sync is the Synchronous Replica. If a node shows async, verify that synchronous_node_count in the DCS config is being applied as intended.

Simulate Automatic Failover

# -- Test 1: force-stop the Primary (Seoul) -----------------------
# On the Seoul node:
systemctl stop patroni

# Monitor failover detection from another node
watch -n1 "patronictl -c /etc/patroni/patroni.yml list"

# Tokyo or Singapore promotes to Leader within ~10–30 seconds
# (Tokyo promotes first if it was the Synchronous Replica)

# -- Test 2: planned switchover via patronictl --------------------
patronictl -c /etc/patroni/patroni.yml switchover \
  --master pg-seoul-1 \
  --candidate pg-tokyo-1 \
  --scheduled now \
  --force

# -- Test 3: single etcd node failure (verify quorum holds) -------
# Stop Singapore's etcd node
ssh pg-singapore-1 "systemctl stop etcd"

# Confirm the etcd cluster operates normally with 2/3 quorum
etcdctl \
  --endpoints=https://10.1.0.10:2379,https://10.2.0.10:2379 \
  --cacert=/etc/etcd/ssl/ca.pem \
  --cert=/etc/etcd/ssl/etcd-seoul.pem \
  --key=/etc/etcd/ssl/etcd-seoul-key.pem \
  endpoint health

Restore Seoul After Failover

# Restart Patroni on Seoul — it rejoins as a Replica
systemctl start patroni

# Patroni automatically syncs via pg_basebackup from the current Primary
# Check progress:
patronictl -c /etc/patroni/patroni.yml list

# If automatic rejoin fails, reinitialize manually
patronictl -c /etc/patroni/patroni.yml reinit pg-multiregion pg-seoul-1

8. Common Troubleshooting

Issue 1: etcd TLS Connection Failure

CRITICAL: get_cluster
etcd3 exception: <class 'etcd3.exceptions.ConnectionFailedError'>

The most common cause is a missing IP address in the certificate SAN.

# Check the certificate's Subject Alternative Names
openssl x509 -in /etc/etcd/ssl/etcd-seoul.pem -text | grep -A2 "Subject Alternative"

# Test a direct etcd client connection
etcdctl --endpoints=https://10.1.0.10:2379 \
  --cacert=/etc/etcd/ssl/ca.pem \
  --cert=/etc/etcd/ssl/etcd-seoul.pem \
  --key=/etc/etcd/ssl/etcd-seoul-key.pem \
  get / --prefix --keys-only

Issue 2: HAProxy REST API TLS Health Check Failure

The check-ssl option must come after the host address in a server line. Wrong ordering causes an SSL handshake error.

# Wrong (check-ssl before the host address)
server pg-seoul-1 check-ssl ssl ca-file /etc/haproxy/ssl/ca.pem 10.1.0.10:5432 check port 8008

# Correct (check-ssl after the host address)
server pg-seoul-1 10.1.0.10:5432 check port 8008 check-ssl ssl ca-file /etc/haproxy/ssl/ca.pem

Issue 3: Writes Blocked in synchronous_mode

When synchronous_mode_strict: true is set and all Sync Replicas go down, the Primary refuses writes. In an emergency, temporarily disable strict mode.

# Check current dynamic config
patronictl -c /etc/patroni/patroni.yml show-config

# Disable strict mode temporarily
patronictl -c /etc/patroni/patroni.yml edit-config
# In the editor, change synchronous_mode_strict: false and save

Issue 4: Unstable etcd Leader Election Due to Inter-Region RTT

# Check etcd logs for election/timeout warnings
journalctl -u etcd | grep -i "election\|timeout\|heartbeat"

# Tuning principle:
# heartbeat-interval = 2–3x RTT, election-timeout = 10x heartbeat-interval
# Example for Seoul–Singapore RTT ~70ms:
# heartbeat-interval: 200
# election-timeout: 2000

Measure the actual RTT first (ping -c 20 <remote-ip>) before setting timeout values. The defaults (heartbeat 100ms, election timeout 1000ms) can become unstable when inter-region RTT exceeds 50ms.


References

  • Patroni Official Documentation — Multi-Datacenter HA Configuration
  • Patroni Official Documentation — REST API Health Check Endpoints
  • Percona — HAProxy with Patroni Health Check Endpoints and Debugging
  • pgEdge — Using Patroni to Build a Highly Available Postgres Cluster: HAProxy
  • CYBERTEC — Patroni etcd Clusters: Introduction and How-To

Share This Article

Series Navigation

Operating Patroni H/A Across Multiple Regions

Current part 2 · 6 published

Explore this topic·Start with featured series

한국어

Follow new posts via RSS

Use RSS to get new posts and series updates directly.

Open RSS Guide