Tuesday, April 14, 2026
Volume 1.3
All posts
Lv.2 BeginnerPostgreSQL
28 min readLv.2 Beginner
SeriesWhy PostgreSQL? · Part 2/5View series hub

Why PostgreSQL? Part 2 — Why Big Tech Chose PostgreSQL

Why PostgreSQL? Part 2 — Why Big Tech Chose PostgreSQL

For years, a common story said you eventually “graduate” from relational databases when you scale. Yet Instagram, Reddit, and Zalando publicly describe a different pattern—scaling PostgreSQL itself, complementing it with Aurora or Kubernetes operators, and using distributed stores where eventual consistency is acceptable. This post treats well-known engineering write-ups as source material, separates throughput, latency, and consistency before comparing headline numbers, distinguishes self-managed PostgreSQL from Aurora-style managed layers, and ties in hidden operational cost and team skill—the full context behind why the same engine keeps appearing next to Cassandra and Kafka. It is written for readers who want defensible framing, not a single-vendor slogan.

Series outline

  • Part 1 — PostgreSQL in the numbers
  • Part 2 — Why big tech chose PostgreSQL (this post)
  • Part 3 — Startups and PostgreSQL: speed vs. cost
  • Part 4 — Escaping MongoDB/MySQL: real migration stories
  • Part 5 — The ecosystem: pgvector, PostGIS, TimescaleDB

Table of contents

  1. Introduction: “Don’t you have to drop the RDBMS when you grow?”
  2. Instagram — PostgreSQL at a billion-user scale
  3. Spotify — PostgreSQL in a microservices world
  4. Reddit — Consolidating media metadata on Aurora PostgreSQL
  5. Zalando — Building postgres-operator and open-sourcing it
  6. Coinbase — Integrity for financial-style transactions
  7. What these companies have in common
  8. Closing

1. Introduction: “Don’t you have to drop the RDBMS when you grow?”

For a long time, a familiar slogan circulated in the database world.

“Start on PostgreSQL or MySQL, then move to NoSQL or a bespoke system when you get big.”

That story lined up with the growth years of MongoDB, Cassandra, and DynamoDB. But when you read how large tech companies actually chose and operated databases, a different pattern appears.

When you read the cases below, start from a simple premise: the same headline number is not automatically the same workload or success criterion.

  • Throughput: how many operations per second you can sustain.
  • Latency: how tight your p90/p99 tails stay.
  • Consistency: whether you need ACID, linearizable reads, or eventual consistency.

Instagram’s “likes per second” and Reddit’s “metadata RPS” come from different workloads and measurement conditions. In what follows, we separate the PostgreSQL engine’s strengths from what a managed AWS Aurora-style layer adds on top of storage, replication, and failover.

Instagram processed more than a billion users on PostgreSQL; Reddit’s 2024 public write-up describes an Aurora PostgreSQL–backed store handling large metadata traffic; Zalando open-sourced tooling that automates PostgreSQL on Kubernetes. This part unpacks whether these companies “just happened” to use PostgreSQL—or kept choosing it on purpose.

2. Instagram — PostgreSQL at a billion-user scale

From “90 likes/sec” to “10,000+ likes/sec”

Instagram adopted PostgreSQL early. A tiny engineering team needed to ship quickly, and PostgreSQL was a good fit for fast validation. The hard part came later.

According to Instagram’s engineering blog, a system that handled 90 likes per second grew in a few years to 10,000+ likes per second. Those figures are examples cited at a point in time in public posts; production numbers continued to evolve. Many teams would have swapped the database for “something more scalable” at that inflection point. Instagram did not.

Public posts evolve year by year. What follows summarizes design principles from those articles; it is not a claim that today’s production matches line-by-line. The consistent signal is that they did not abandon SQL to scale.

Core architecture: sharding and connection pooling

The approach was not “replace PostgreSQL,” but change how PostgreSQL is operated in a distributed environment.

They horizontally partitioned data by mapping many logical shards onto fewer physical PostgreSQL shards, placed PgBouncer as a pooler between app servers and Postgres to cut connection overhead, and implemented time-sortable IDs inside shards—41-bit timestamp + 13-bit shard id + 10-bit sequence (a 64-bit integer in total).

Public material describes multi-AZ clusters with multiple replicas: writes concentrate on the primary; reads fan out to replicas. The important idea is that application logic, sharding, pooling, and replication were engineered as one system—not “install Postgres and it magically scales.”

Why not move everything to NoSQL?

Instagram did adopt Cassandra—for feeds, activity logs, and analytics where eventual consistency can be acceptable. In public blog explanations, structured data that needs correctness—profiles, comments, follow relationships, photo metadata—lives in the PostgreSQL tier (exact splits shift with product and era).

The lesson Instagram telegraphs is closer to “poorly modeled or poorly operated SQL hits scaling limits,” not “SQL cannot scale.”

3. Spotify — PostgreSQL in a microservices world

Positioning PostgreSQL among hundreds of services

Spotify runs an architecture with hundreds of microservices. A core principle is each service owns its database.

That makes database choice partly about team autonomy. Spotify separates databases by role: Cassandra and BigTable for large-scale distributed reads, and relational databases—including PostgreSQL in some services—for domains that need billing, subscriptions, and account integrity. Public talks repeat that pattern; service-to-stack mapping changes over time.

Why put relational databases on ACID-heavy paths?

For money-adjacent state—wallets, subscriptions, entitlements—a “half-applied” update is catastrophic. Eventual consistency in many NoSQL designs trades away exactly the guarantee you want here. ACID transactions are why relational engines keep showing up.

Spotify’s public material describes keeping relational stores on high-integrity paths while using Cassandra and friends for personalization, playlist metadata, and streaming events. Do not compress this to “payments always equal PostgreSQL from one canonical URL”—treat it as a pattern.

A common implementation is a single transaction that moves balances together:

BEGIN;
  UPDATE wallets SET balance = balance - 100 WHERE user_id = 'u1';
  UPDATE wallets SET balance = balance + 100 WHERE user_id = 'u2';
COMMIT;

Real systems add isolation levels, retries, idempotency keys, balance checks, and row-count guards—but the requirement that either everything commits or nothing does is where engines like PostgreSQL shine.

4. Reddit — Consolidating media metadata on Aurora PostgreSQL

100k+ RPS on Aurora PostgreSQL (2024 public case)

Reddit hosts billions of posts and diverse media. Upload-heavy usage keeps pushing metadata complexity.

A 2024 engineering post describes consolidating scattered media metadata—from multiple systems, including paths where metadata had to be fetched via object stores such as S3—into a unified store on AWS Aurora PostgreSQL.

Public numbers cite 100k+ requests per second with p90 latency under ~5 ms. That matters as evidence for that workload (metadata reads/writes), not as a universal benchmark of “vanilla single-instance Postgres.”

The useful distinction is between PostgreSQL’s SQL and extension model and Aurora’s managed storage, replication, and recovery—a partial hand-off of the toil of self-hosted Postgres.

Migration strategy: dual writes and phased cutover

Rather than a risky big bang, Reddit described a phased path: dual writes to old and new systems, dual reads with comparison, then gradual read traffic shift with monitoring. Public articles and summaries describe CDC / change-event streaming pipelines to catch drift between sources and the new store (some sources explicitly mention Apache Kafka consumers).

That story is as much how to migrate safely to a Postgres-compatible store as why Postgres. Dual writes and verification consume team time and operational budget.

Reddit media metadata migration — phased rollout (concept)

5. Zalando — Building postgres-operator and open-sourcing it

How a major retailer bet on PostgreSQL

Zalando’s engineering team released postgres-operator, a Kubernetes controller that provisions and manages PostgreSQL clusters. Public conference talks and blogs cite hundreds of clusters in production on that operator—numbers vary by era and definition.

What the operator solves

Running Postgres on Kubernetes is hard: stateful workloads differ from stateless ones. The operator abstracts that complexity.

Documented features include Patroni-based failover, declarative clusters via CRDs, storage resize, restore/clone in cloud environments, and more. Extensions such as pgvector, PostGIS, or TimescaleDB depend on image, version, and cluster settings—check the official repository when you need specifics.

It is not just an internal tool: Google Cloud’s GKE documentation cites Zalando’s operator as an example path for PostgreSQL.

“Build what you run, then share it”

Open-sourcing the operator is not pure charity—it strengthens the ecosystem Zalando also depends on. That mirrors Instagram blogging about Postgres operations: these teams treat Postgres as shared infrastructure, not a disposable vendor SKU.

Adoption still requires an operational model—Kubernetes, Patroni, backups, upgrades—not just installing a chart.

6. Coinbase — Integrity for financial-style transactions

What ACID means on a crypto exchange

Coinbase is a cryptocurrency exchange. A top selection criterion is transactional correctness.

A trade that debits one BTC wallet and credits another must be atomic. A half-successful pair is not a minor bug—it is a financial incident.

PostgreSQL’s ACID guarantees—especially atomicity and isolation—are among the proven ways to meet that bar. Public engineering posts describe core account and trading data on PostgreSQL-class RDBMS as part of the stack; it is not safe to assume the entire company runs only one engine everywhere.

Scale without giving up correctness

Even at very large trading volumes, staying on PostgreSQL can be read as “correctness does not have to be traded away” when paired with the right hardware, topology, and operations.

Big-tech pattern — PostgreSQL accuracy layer vs auxiliary stores (concept)

7. What these companies have in common

Five recurring patterns:

First, PostgreSQL sits where correctness is expensive. Profiles, payments, financial transactions, some metadata—the cost of being wrong is high. There are domains where NoSQL as a primary store is a poor fit (while other domains happily center on NoSQL—scope this article to the cases we surveyed).

Second, they did not delete PostgreSQL—they changed how it was operated. Instagram used sharding and pooling; Reddit used Aurora; Zalando used Kubernetes operators. The bottleneck is often how you partition responsibility, not “SQL itself.”

Third, they gave experience back to the ecosystem. Engineering blogs, postgres-operator, migration write-ups—these raise the waterline for everyone.

Fourth, in multi-database strategies PostgreSQL remained a major pillar. Spotify pairs Cassandra and BigTable with relational stores on integrity-heavy paths (including PostgreSQL where applicable). Instagram added Cassandra but kept structured cores on Postgres.

Fifth, “why Postgres?” comes with “who can operate it?” Sharding, dual writes, and operator upgrades are hidden operational costs; the same stack is not equally easy for every team.

8. Closing

Across these companies, one sentence captures the pattern:

In the big-tech cases we surveyed, NoSQL and distributed databases are less a wholesale replacement for PostgreSQL than a complement for workloads PostgreSQL is not trying to own.

Surviving at scale is not luck—it is decades of ACID discipline, community momentum, extensible architecture, and engineers who operationalize all of it.

Part 3 shifts the lens to startups: why PostgreSQL wins on speed and cost under constraints, and how early architecture choices shape growth.


Next — Part 3: Startups — speed vs. cost

From Supabase-backed teams to pgvector cost hacks and “we deleted the warehouse and kept Postgres” stories—why tighter budgets can make PostgreSQL an even stronger default.


References


Written April 2026 · Figures and topologies reflect public posts at the time; production systems evolve. Verify primary sources when citing.

Share This Article

Series Navigation

Why PostgreSQL?

2 / 5 · 5

Recommended Reads

Why PostgreSQL? Part 5 — The ecosystem: pgvector, PostGIS, TimescaleDB

One more PostgreSQL extension—and you can seriously discuss vector search, geospatial queries, time-series analytics, and BM25-style full-text search on the same engine. This series finale walks through pgvector, PostGIS, TimescaleDB, and ParadeDB (pg_search): what public benchmarks and vendor write-ups claim, where “replace the specialist” is conditionally true, and how to read latency/cost numbers when managed services, self-hosting, and tuning assumptions differ. It closes the five-part arc on why PostgreSQL is often the lowest-regret default—reliability, extensibility, ecosystem depth—and when a separate system still earns its place. Use the decision inputs at the end alongside the comparison table: growth rate, staffing, failure tolerance, compliance, and how you define TCO.

Read

Why PostgreSQL? Part 4 — MongoDB & Oracle: real migration stories

If you are already on MongoDB or Oracle, moving to PostgreSQL is not a vague someday project—it hits budget, performance, and operations immediately. This post stays between the two lazy extremes (“trivial” vs “impossible”): what triggered real MongoDB→Postgres and Oracle→Postgres moves, how long they took, what hurt more than expected, and what changed afterward. Figures such as Infisical’s reported database cost reduction after leaving MongoDB, or wide TCO improvement narratives away from Oracle, come from public write‑ups, case studies, and vendor materials—so you should separate license vs labor vs migration project costs and read them as directional, not universal guarantees. The article also stresses that schema redesign and query work often travel with the database change, so “Postgres magic” and “refactoring wins” should not be collapsed into one headline. It closes with a compact decision frame so you can judge whether migration is a sensible next step for your team—not a moral obligation.

Read

Why PostgreSQL? Part 3 — Startups: the sweet spot of speed and cost

Part 2 explained why large tech companies keep PostgreSQL in the stack; this installment shifts to early product teams where the decision is usually about shipping an MVP quickly, keeping costs predictable, and avoiding a painful rewrite a few quarters later. For web and AI products, PostgreSQL has become the practical default—but much of the perceived ease comes from managed platforms such as Supabase and Neon and from extensions like pgvector, not from pretending the engine alone removes all work. The article separates relational modeling and extensions from provisioning, pricing, and developer experience, and it reads ARR estimates, YC adoption figures, and vector benchmarks as different axes with different definitions. It also stresses workload assumptions behind vendor benchmarks and closes with the operational habits—schema discipline, migration safety, backups—that determine whether the same Postgres choice stays cheap in production.

Read

Explore this topic·Start with featured series

한국어

Follow new posts via RSS

Until the newsletter opens, RSS is the fastest way to get updates.

Open RSS Guide