Tuesday, April 14, 2026
Volume 1.3
All posts
Lv.2 BeginnerPostgreSQL
36 min readLv.2 Beginner
SeriesWhy PostgreSQL? · Part 5/5View series hub

Why PostgreSQL? Part 5 — The ecosystem: pgvector, PostGIS, TimescaleDB

Why PostgreSQL? Part 5 — The ecosystem: pgvector, PostGIS, TimescaleDB

One more PostgreSQL extension—and you can seriously discuss vector search, geospatial queries, time-series analytics, and BM25-style full-text search on the same engine. This series finale walks through pgvector, PostGIS, TimescaleDB, and ParadeDB (pg_search): what public benchmarks and vendor write-ups claim, where “replace the specialist” is conditionally true, and how to read latency/cost numbers when managed services, self-hosting, and tuning assumptions differ. It closes the five-part arc on why PostgreSQL is often the lowest-regret default—reliability, extensibility, ecosystem depth—and when a separate system still earns its place. Use the decision inputs at the end alongside the comparison table: growth rate, staffing, failure tolerance, compliance, and how you define TCO.

Series outline

Table of contents

  1. Introduction: the “one database for everything” idea
  2. pgvector — how PostgreSQL shook the vector database conversation
  3. PostGIS — why a 25‑year‑old extension still defines geospatial Postgres
  4. TimescaleDB — why Cloudflare picked it over ClickHouse for some analytics work
  5. ParadeDB — full‑text search inside PostgreSQL without Elasticsearch
  6. The extension landscape: frequently cited extensions in 2024–2025 surveys
  7. PostgreSQL maximalism: light and shadow
  8. Closing the series: a final answer to “why PostgreSQL?”

1. Introduction: the “one database for everything” idea

People new to PostgreSQL’s extension ecosystem sometimes say:

“If Postgres can do all of this, why do other databases exist?”

It is not a silly question—and it is not universally true, either.

Add pgvector and you may not need a separate vector store. Add PostGIS and you can handle many spatial queries without a dedicated GIS server. Add TimescaleDB and dedicated time-series products look less mandatory. Add ParadeDB’s pg_search and running Elasticsearch for every app can feel heavy. At the same time, workload shape, scale, and operations still make specialized systems the better tool in plenty of cases.

This article walks four pillars using public materials from around 2025, and it keeps conditional conclusions explicit: not “Postgres replaces everything,” but where extensions are usually enough—and where they are not.

How to read benchmarks and case studies

The latency, throughput, and cost numbers below are useful signals, but they often come from vendor blogs, benchmark posts, and marketing-adjacent write-ups. When managed vs self-hosted, what’s included in “cost,” and hardware/tuning assumptions differ, the same headline number can mean different things. Read directionally and verify the original test conditions.

Large-name stories (Cloudflare, Redfin, IGN, …) also carry different stacks, teams, and SLAs than yours. Treat them as existence proofs, not a sizing shortcut—pair them with the decision inputs later in the article.


2. pgvector — how PostgreSQL shook the vector database conversation

If Part 3 covered product and cost angles, this section focuses on commonly cited public benchmarks.

Comparisons you will see in 2024–2025 write-ups

Materials comparing pgvector with Timescale’s pgvectorscale stack against Pinecone, Qdrant, and others often summarize runs on the order of 50 million Cohere embeddings (768‑dimensional) as a headline dataset.

Metricpgvector + pgvectorscalePinecone (p2)Qdrant
p95 latencySome posts report ~1.4× lower than Pineconebaselinereported in a similar band
Throughput (QPS)471 QPS @ 99% recall (example figures)similar bands in some tests41 QPS @ 99% recall (example figures from public materials)
Cost (often EC2 self-host framing)~79% lower than Pinecone in some vendor comparisonsbaseline

In Qdrant comparisons, both systems are sometimes reported near sub‑100ms query latency at ~50M vectors—but Qdrant is also described as having tighter tail-latency behavior in some analyses, which matters when latency consistency is the SLA. Treat those comparisons as public benchmark and community discussions; read them next to Qdrant’s official benchmarks and anything you can reproduce in your own environment.

Around May 2025, AWS published materials on pgvector 0.8.0 on Aurora PostgreSQL, including up to ~5.7× speedups for certain query patterns versus earlier versions, and improvements around filtered vector search via features like iterative_scan.

How to read the table: it summarizes public posts. Pinecone is typically managed; pgvector stacks are often self-hosted (or RDS/Aurora). If you ignore operations staffing, uptime, and networking, “cheaper” can mislead. Rankings also move with tuning, indexes, and workload—use the numbers as directional, not universal.

Practical pgvector patterns

A major strength is combining vector search with relational predicates in one SQL statement:

-- Semantic similarity + relational filters in one query
-- (<-> is distance; smaller means closer)
SELECT
  doc_id,
  title,
  created_at,
  embedding <-> $1 AS distance_score
FROM documents
WHERE
  user_id = $2
  AND language = 'ko'
  AND created_at > NOW() - INTERVAL '30 days'
ORDER BY distance_score ASC
LIMIT 10;

Many vector-only stores force you to fan out to multiple systems and stitch results in application code. PostgreSQL can keep the path unified with the planner and indexes you already operate.

Honest limits

For millions—often tens of millions—of vectors, RAG pipelines, and multi-tenant SaaS, pgvector is frequently a strong default. But at hundreds of millions+ scale, GPU acceleration, or single-digit millisecond SLAs baked into contracts, dedicated engines (Pinecone, Milvus, …) can still win.

“Start on Postgres; move when you truly hit a wall” remains common advice—and many teams hit product and organizational bottlenecks before they hit pgvector’s ceiling.


3. PostGIS — why a 25‑year‑old extension still defines geospatial Postgres

PostGIS first appeared in 2001. In 2020s surveys of extension usage, it still shows up near the top.

The problem PostGIS solves

If you store latitude/longitude as plain numbers, “find stores within 2km” collapses into painful scans. PostGIS mitigates that with spatial indexes (GiST, R‑tree family, …).

You get geometry types, spatial functions like ST_Distance, ST_Intersects, ST_Buffer, and coordinate handling close to OGC expectations—inside the database you already run.

Field stories you will hear

Redfin — the US real-estate platform reported performance and stability improvements after moving from MySQL to PostgreSQL + PostGIS for large spatial query workloads.

IGN (France) — frequently cited for managing high-resolution terrain data in PostGIS, emphasizing transactional consistency when many editors work concurrently.

Delivery / mobility — “riders within N km of the rider location” is the bread-and-butter query where spatial indexes change day‑to‑night operations.

Telecom / infrastructure — cell sites, cable routes, coverage polygons joined with business tables in one database.

PostGIS spans urban planning, logistics, environmental monitoring—anywhere spatial data meets business data. A recurring theme: one transactional system for both.

How to read the stories: traffic, schemas, and regulatory requirements differ. Before copying an architecture, sanity-check query frequency, indexing strategy, and coordinate reference consistency.


4. TimescaleDB — why Cloudflare picked it over ClickHouse for some analytics work

Dedicated time-series stacks include InfluxDB, ClickHouse, Apache Druid, and more. In the Postgres world, TimescaleDB is a frequent name for time-series and analytics-style workloads.

Cloudflare’s choice

Cloudflare is widely known for PostgreSQL on transactional paths and ClickHouse on analytics. In mid‑2025 engineering posts, they described choosing TimescaleDB instead of ClickHouse for some new analytics surfaces.

Reasons cited include: (1) TimescaleDB is a PostgreSQL extension, so hypertables can live next to ordinary tables on existing infra; (2) continuous aggregates reduce reliance on bespoke cron/batch pipelines for near-real-time rollups. Public posts mention ~5–35× latency improvements and ~33× storage reductions for measured workloads—always verify the measurement window and query set in the original article.

What Timescale adds

Hypertables chunk by time so large tables still favor “recent window” queries.

Continuous aggregates differ from a naive materialized view that recomputes everything—incremental maintenance is closer to how teams want live dashboards.

Columnar compression options target time-series storage costs; “~90% compression” style claims depend heavily on data shape and redundancy.

Typical domains

IoT sensors, market ticks, app metrics, server monitoring—anywhere append-heavy timestamps stream in. Major clouds offer paths to run Timescale alongside managed Postgres or as a dedicated service.


5. ParadeDB — full‑text search inside PostgreSQL without Elasticsearch

Elasticsearch is powerful—and operationally heavy: separate cluster, sync, monitoring, duplicated infra.

ParadeDB’s pg_search brings BM25-style ranking into PostgreSQL so you can pursue relevance-ranked search without standing up a second search system.

What pg_search enables

Example using the @@@ operator:

SELECT title, rating, description
FROM products
WHERE description @@@ 'comfortable running shoes'
  AND rating >= 4.0
ORDER BY paradedb.score(id) DESC
LIMIT 10;

Classic Postgres FTS (tsvector / tsquery) is closer to keyword matching; BM25 factors term frequency and document length. ParadeDB also documents hybrid workflows with pgvector (keywords + vectors) in one query path.

Maturity and operations

Community write-ups about replacing Elasticsearch exist, but production mileage, version compatibility, and incident playbooks vary widely by product. Before betting a flagship search path on pg_search, validate docs for your Postgres major, support policy, and run load/failure drills in staging. “Fewer moving parts” is real—but so is the question of whether your team is ready to own search behavior without a dedicated cluster.


6. The extension landscape: frequently cited extensions in 2024–2025 surveys

The 2024 State of PostgreSQL survey (Tiger Data / Timescale) often lists extensions like:

RankExtensionTypical useNotes
1PostGISGIS / spatialFrequently #1 in recent years
2pg_stat_statementsQuery performance insights“Almost built-in” ops extension
3TimescaleDBTime series / analyticsRises with TS workloads
pgvectorAI / vectorsRapid growth since ~2023
PgBouncerConnection poolingCommon ops layer

Frequently mentioned additions include pg_cron, pg_partman, pgaudit, and others.


7. PostgreSQL maximalism: light and shadow

Some people call the “one Postgres for everything” posture PostgreSQL maximalism. It has real benefits—and real failure modes.

Upside: simpler stacks

Fewer systems can mean fewer on-call paths—no separate search cluster, fewer sync jobs, fewer siloed metrics stores—when your workload actually fits.

Downside: not best-in-class for every niche

If you need hundreds of millions of vectors at sub‑ms SLAs, pgvector alone may not be enough. If you need multi‑million rows/sec ingest with exotic analytics simultaneously, ClickHouse/Druid-class systems may fit better. Great extensions ≠ universal #1.

When extensions are “enough,” and when to split systems

NeedOften fine on Postgres extensionsSignals to consider a separate system
VectorsUp to tens of millions of rows, RAG, multi-tenant SaaSHundreds of millions+, GPUs, extreme latency SLAs
GeospatialTypical LBS / spatial queriesSpecialized real-time navigation optimizers
Time seriesIoT, monitoring, analytics dashboardsMassive ingest + separate real-time stream processing
Full-textIn-app search, catalogsBillion-document corpora, exotic real-time indexing

Decision inputs worth writing down

Tables alone are not enough—capture:

  • Growth rate and a 12‑month size estimate
  • Ops staffing (full-time DBA? on-call rotation?)
  • Failure tolerance (allowed downtime, RTO/RPO targets)
  • Regulation / audit (retention, access control, certification constraints)
  • TCO definition (license, storage, labor, training—what’s in the box?)

8. Closing the series: a final answer to “why PostgreSQL?”

Across five parts, the through-line is compact:

PostgreSQL wins by default not because it is always the fastest or the easiest tool in every niche. It wins because proven operations, extensibility, a deep ecosystem, and conservative defaults combine into a low‑regret choice for many teams.

From Instagram and Coinbase to YC-heavy stacks, from migration stories to Cloudflare’s Timescale choice and AI startups standardizing on pgvector—the pattern is the same: does this stack do the job well enough, for long enough, with a team we can actually run?

Thirty‑five years from a research project to a database developers keep picking in surveys is not an accident—it reflects shipping discipline, community, and governance.

The shortest answer to “why PostgreSQL?”:

If you don’t have a sharp reason to pick something else, Postgres belongs on the short list.

Series recap

PartCore message
Part 1Postgres near the top of developer surveys; every major cloud invests
Part 2Big tech keeps Postgres—not accidentally
Part 3Startups default to Supabase/Neon/pgvector-class stacks
Part 4Real MongoDB/Oracle exit stories and TCO narratives
Part 5pgvector, PostGIS, TimescaleDB, ParadeDB—where extensions simplify the stack—and where they don’t

References


Written: April 2026 · Figures and product roadmaps change; verify primary sources when citing.

Share This Article

Series Navigation

Why PostgreSQL?

5 / 5 · 5

Recommended Reads

Why PostgreSQL? Part 4 — MongoDB & Oracle: real migration stories

If you are already on MongoDB or Oracle, moving to PostgreSQL is not a vague someday project—it hits budget, performance, and operations immediately. This post stays between the two lazy extremes (“trivial” vs “impossible”): what triggered real MongoDB→Postgres and Oracle→Postgres moves, how long they took, what hurt more than expected, and what changed afterward. Figures such as Infisical’s reported database cost reduction after leaving MongoDB, or wide TCO improvement narratives away from Oracle, come from public write‑ups, case studies, and vendor materials—so you should separate license vs labor vs migration project costs and read them as directional, not universal guarantees. The article also stresses that schema redesign and query work often travel with the database change, so “Postgres magic” and “refactoring wins” should not be collapsed into one headline. It closes with a compact decision frame so you can judge whether migration is a sensible next step for your team—not a moral obligation.

Read

Why PostgreSQL? Part 3 — Startups: the sweet spot of speed and cost

Part 2 explained why large tech companies keep PostgreSQL in the stack; this installment shifts to early product teams where the decision is usually about shipping an MVP quickly, keeping costs predictable, and avoiding a painful rewrite a few quarters later. For web and AI products, PostgreSQL has become the practical default—but much of the perceived ease comes from managed platforms such as Supabase and Neon and from extensions like pgvector, not from pretending the engine alone removes all work. The article separates relational modeling and extensions from provisioning, pricing, and developer experience, and it reads ARR estimates, YC adoption figures, and vector benchmarks as different axes with different definitions. It also stresses workload assumptions behind vendor benchmarks and closes with the operational habits—schema discipline, migration safety, backups—that determine whether the same Postgres choice stays cheap in production.

Read

Why PostgreSQL? Part 2 — Why Big Tech Chose PostgreSQL

For years, a common story said you eventually “graduate” from relational databases when you scale. Yet Instagram, Reddit, and Zalando publicly describe a different pattern—scaling PostgreSQL itself, complementing it with Aurora or Kubernetes operators, and using distributed stores where eventual consistency is acceptable. This post treats well-known engineering write-ups as source material, separates throughput, latency, and consistency before comparing headline numbers, distinguishes self-managed PostgreSQL from Aurora-style managed layers, and ties in hidden operational cost and team skill—the full context behind why the same engine keeps appearing next to Cassandra and Kafka. It is written for readers who want defensible framing, not a single-vendor slogan.

Read

Explore this topic·Start with featured series

한국어

Follow new posts via RSS

Until the newsletter opens, RSS is the fastest way to get updates.

Open RSS Guide