Lv.2 BeginnerPostgreSQL

2026.04.1128 min readLv.2 Beginner

SeriesWhy PostgreSQL? · Part 2View series hub

Why PostgreSQL? Part 2 — Why Big Tech Chose PostgreSQL

For years, a common story said you eventually “graduate” from relational databases when you scale. Yet Instagram, Reddit, and Zalando publicly describe a different pattern—scaling PostgreSQL itself, complementing it with Aurora or Kubernetes operators, and using distributed stores where eventual consistency is acceptable. This post treats well-known engineering write-ups as source material, separates throughput, latency, and consistency before comparing headline numbers, distinguishes self-managed PostgreSQL from Aurora-style managed layers, and ties in hidden operational cost and team skill—the full context behind why the same engine keeps appearing next to Cassandra and Kafka. It is written for readers who want defensible framing, not a single-vendor slogan.

Series outline

Part 1 — PostgreSQL in the numbers

Part 2 — Why big tech chose PostgreSQL (this post)

Part 3 — Startups: speed vs. cost

Part 4 — MongoDB & Oracle: real migration stories

Part 5 — The ecosystem: pgvector, PostGIS, TimescaleDB

Introduction: “Don’t you have to drop the RDBMS when you grow?”
Instagram — PostgreSQL at a billion-user scale
Spotify — PostgreSQL in a microservices world
Reddit — Consolidating media metadata on Aurora PostgreSQL
Zalando — Building postgres-operator and open-sourcing it
Coinbase — Integrity for financial-style transactions
What these companies have in common
Closing

1. Introduction: “Don’t you have to drop the RDBMS when you grow?”

For a long time, a familiar slogan circulated in the database world.

“Start on PostgreSQL or MySQL, then move to NoSQL or a bespoke system when you get big.”

That story lined up with the growth years of MongoDB, Cassandra, and DynamoDB. But when you read how large tech companies actually chose and operated databases, a different pattern appears.

When you read the cases below, start from a simple premise: the same headline number is not automatically the same workload or success criterion.

Throughput: how many operations per second you can sustain.
Latency: how tight your p90/p99 tails stay.
Consistency: whether you need ACID, linearizable reads, or eventual consistency.

Instagram’s “likes per second” and Reddit’s “metadata RPS” come from different workloads and measurement conditions. In what follows, we separate the PostgreSQL engine’s strengths from what a managed AWS Aurora-style layer adds on top of storage, replication, and failover.

Instagram processed more than a billion users on PostgreSQL; Reddit’s 2024 public write-up describes an Aurora PostgreSQL–backed store handling large metadata traffic; Zalando open-sourced tooling that automates PostgreSQL on Kubernetes. This part unpacks whether these companies “just happened” to use PostgreSQL—or kept choosing it on purpose.

2. Instagram — PostgreSQL at a billion-user scale

From “90 likes/sec” to “10,000+ likes/sec”

Instagram adopted PostgreSQL early. A tiny engineering team needed to ship quickly, and PostgreSQL was a good fit for fast validation. The hard part came later.

According to Instagram’s engineering blog, a system that handled 90 likes per second grew in a few years to 10,000+ likes per second. Those figures are examples cited at a point in time in public posts; production numbers continued to evolve. Many teams would have swapped the database for “something more scalable” at that inflection point. Instagram did not.

Public posts evolve year by year. What follows summarizes design principles from those articles; it is not a claim that today’s production matches line-by-line. The consistent signal is that they did not abandon SQL to scale.

Core architecture: sharding and connection pooling

The approach was not “replace PostgreSQL,” but change how PostgreSQL is operated in a distributed environment.

They horizontally partitioned data by mapping many logical shards onto fewer physical PostgreSQL shards, placed PgBouncer as a pooler between app servers and Postgres to cut connection overhead, and implemented time-sortable IDs inside shards—41-bit timestamp + 13-bit shard id + 10-bit sequence (a 64-bit integer in total).

Public material describes multi-AZ clusters with multiple replicas: writes concentrate on the primary; reads fan out to replicas. The important idea is that application logic, sharding, pooling, and replication were engineered as one system—not “install Postgres and it magically scales.”

Why not move everything to NoSQL?

Instagram did adopt Cassandra—for feeds, activity logs, and analytics where eventual consistency can be acceptable. In public blog explanations, structured data that needs correctness—profiles, comments, follow relationships, photo metadata—lives in the PostgreSQL tier (exact splits shift with product and era).

The lesson Instagram telegraphs is closer to “poorly modeled or poorly operated SQL hits scaling limits,” not “SQL cannot scale.”

3. Spotify — PostgreSQL in a microservices world

Positioning PostgreSQL among hundreds of services

Spotify runs an architecture with hundreds of microservices. A core principle is each service owns its database.

That makes database choice partly about team autonomy. Spotify separates databases by role: Cassandra and BigTable for large-scale distributed reads, and relational databases—including PostgreSQL in some services—for domains that need billing, subscriptions, and account integrity. Public talks repeat that pattern; service-to-stack mapping changes over time.

Why put relational databases on ACID-heavy paths?

For money-adjacent state—wallets, subscriptions, entitlements—a “half-applied” update is catastrophic. Eventual consistency in many NoSQL designs trades away exactly the guarantee you want here. ACID transactions are why relational engines keep showing up.

Spotify’s public material describes keeping relational stores on high-integrity paths while using Cassandra and friends for personalization, playlist metadata, and streaming events. Do not compress this to “payments always equal PostgreSQL from one canonical URL”—treat it as a pattern.

A common implementation is a single transaction that moves balances together:

BEGIN;
  UPDATE wallets SET balance = balance - 100 WHERE user_id = 'u1';
  UPDATE wallets SET balance = balance + 100 WHERE user_id = 'u2';
COMMIT;

Real systems add isolation levels, retries, idempotency keys, balance checks, and row-count guards—but the requirement that either everything commits or nothing does is where engines like PostgreSQL shine.

4. Reddit — Consolidating media metadata on Aurora PostgreSQL

100k+ RPS on Aurora PostgreSQL (2024 public case)

Reddit hosts billions of posts and diverse media. Upload-heavy usage keeps pushing metadata complexity.

A 2024 engineering post describes consolidating scattered media metadata—from multiple systems, including paths where metadata had to be fetched via object stores such as S3—into a unified store on AWS Aurora PostgreSQL.

Public numbers cite 100k+ requests per second with p90 latency under ~5 ms. That matters as evidence for that workload (metadata reads/writes), not as a universal benchmark of “vanilla single-instance Postgres.”

The useful distinction is between PostgreSQL’s SQL and extension model and Aurora’s managed storage, replication, and recovery—a partial hand-off of the toil of self-hosted Postgres.

Migration strategy: dual writes and phased cutover

Rather than a risky big bang, Reddit described a phased path: dual writes to old and new systems, dual reads with comparison, then gradual read traffic shift with monitoring. Public articles and summaries describe CDC / change-event streaming pipelines to catch drift between sources and the new store (some sources explicitly mention Apache Kafka consumers).

That story is as much how to migrate safely to a Postgres-compatible store as why Postgres. Dual writes and verification consume team time and operational budget.

Reddit: phased rollout (concept diagram)

Reddit media metadata migration — phased rollout (concept)

5. Zalando — Building postgres-operator and open-sourcing it

How a major retailer bet on PostgreSQL

Zalando’s engineering team released postgres-operator, a Kubernetes controller that provisions and manages PostgreSQL clusters. Public conference talks and blogs cite hundreds of clusters in production on that operator—numbers vary by era and definition.

What the operator solves

Running Postgres on Kubernetes is hard: stateful workloads differ from stateless ones. The operator abstracts that complexity.

Documented features include Patroni-based failover, declarative clusters via CRDs, storage resize, restore/clone in cloud environments, and more. Extensions such as pgvector, PostGIS, or TimescaleDB depend on image, version, and cluster settings—check the official repository when you need specifics.

It is not just an internal tool: Google Cloud’s GKE documentation cites Zalando’s operator as an example path for PostgreSQL.

“Build what you run, then share it”

Open-sourcing the operator is not pure charity—it strengthens the ecosystem Zalando also depends on. That mirrors Instagram blogging about Postgres operations: these teams treat Postgres as shared infrastructure, not a disposable vendor SKU.

Adoption still requires an operational model—Kubernetes, Patroni, backups, upgrades—not just installing a chart.

6. Coinbase — Integrity for financial-style transactions

What ACID means on a crypto exchange

Coinbase is a cryptocurrency exchange. A top selection criterion is transactional correctness.

A trade that debits one BTC wallet and credits another must be atomic. A half-successful pair is not a minor bug—it is a financial incident.

PostgreSQL’s ACID guarantees—especially atomicity and isolation—are among the proven ways to meet that bar. Public engineering posts describe core account and trading data on PostgreSQL-class RDBMS as part of the stack; it is not safe to assume the entire company runs only one engine everywhere.

Scale without giving up correctness

Even at very large trading volumes, staying on PostgreSQL can be read as “correctness does not have to be traded away” when paired with the right hardware, topology, and operations.

Pattern: accuracy layer vs auxiliary stores (concept)

Big-tech pattern — PostgreSQL accuracy layer vs auxiliary stores (concept)

7. What these companies have in common

Five recurring patterns:

First, PostgreSQL sits where correctness is expensive. Profiles, payments, financial transactions, some metadata—the cost of being wrong is high. There are domains where NoSQL as a primary store is a poor fit (while other domains happily center on NoSQL—scope this article to the cases we surveyed).

Second, they did not delete PostgreSQL—they changed how it was operated. Instagram used sharding and pooling; Reddit used Aurora; Zalando used Kubernetes operators. The bottleneck is often how you partition responsibility, not “SQL itself.”

Third, they gave experience back to the ecosystem. Engineering blogs, postgres-operator, migration write-ups—these raise the waterline for everyone.

Fourth, in multi-database strategies PostgreSQL remained a major pillar. Spotify pairs Cassandra and BigTable with relational stores on integrity-heavy paths (including PostgreSQL where applicable). Instagram added Cassandra but kept structured cores on Postgres.

Fifth, “why Postgres?” comes with “who can operate it?” Sharding, dual writes, and operator upgrades are hidden operational costs; the same stack is not equally easy for every team.

8. Closing

Across these companies, one sentence captures the pattern:

In the big-tech cases we surveyed, NoSQL and distributed databases are less a wholesale replacement for PostgreSQL than a complement for workloads PostgreSQL is not trying to own.

Surviving at scale is not luck—it is decades of ACID discipline, community momentum, extensible architecture, and engineers who operationalize all of it.

Part 3 shifts the lens to startups: why PostgreSQL wins on speed and cost under constraints, and how early architecture choices shape growth.

Next — Part 3: Startups — speed vs. cost

From Supabase-backed teams to pgvector cost hacks and “we deleted the warehouse and kept Postgres” stories—why tighter budgets can make PostgreSQL an even stronger default.

References

Instagram Engineering — Sharding & IDs at Instagram (2012)
Spotify — public blogs and talks on microservices data ownership (mapping changes over time)
Reddit — Reddit Inc. Engineering blog and summaries such as InfoQ (2024)
Zalando postgres-operator (GitHub)
Google Cloud — PostgreSQL on GKE
Coinbase — public engineering and architecture material
PostgreSQL Documentation — Transaction Isolation · MVCC
AWS Aurora PostgreSQL

Written April 2026 · Figures and topologies reflect public posts at the time; production systems evolve. Verify primary sources when citing.