Why PostgreSQL? Part 5 — The ecosystem: pgvector, PostGIS, TimescaleDB
One more PostgreSQL extension—and you can seriously discuss vector search, geospatial queries, time-series analytics, and BM25-style full-text search on the same engine. This series finale walks through pgvector, PostGIS, TimescaleDB, and ParadeDB (pg_search): what public benchmarks and vendor write-ups claim, where “replace the specialist” is conditionally true, and how to read latency/cost numbers when managed services, self-hosting, and tuning assumptions differ. It closes the five-part arc on why PostgreSQL is often the lowest-regret default—reliability, extensibility, ecosystem depth—and when a separate system still earns its place. Use the decision inputs at the end alongside the comparison table: growth rate, staffing, failure tolerance, compliance, and how you define TCO.
Series outline
- Part 1 — PostgreSQL in the numbers
- Part 2 — Why big tech chose PostgreSQL
- Part 3 — Startups: speed vs. cost
- Part 4 — MongoDB & Oracle: real migration stories
- Part 5 — The ecosystem: pgvector, PostGIS, TimescaleDB (this post · series finale)
Table of contents
- Introduction: the “one database for everything” idea
- pgvector — how PostgreSQL shook the vector database conversation
- PostGIS — why a 25‑year‑old extension still defines geospatial Postgres
- TimescaleDB — why Cloudflare picked it over ClickHouse for some analytics work
- ParadeDB — full‑text search inside PostgreSQL without Elasticsearch
- The extension landscape: frequently cited extensions in 2024–2025 surveys
- PostgreSQL maximalism: light and shadow
- Closing the series: a final answer to “why PostgreSQL?”
1. Introduction: the “one database for everything” idea
People new to PostgreSQL’s extension ecosystem sometimes say:
“If Postgres can do all of this, why do other databases exist?”
It is not a silly question—and it is not universally true, either.
Add pgvector and you may not need a separate vector store. Add PostGIS and you can handle many spatial queries without a dedicated GIS server. Add TimescaleDB and dedicated time-series products look less mandatory. Add ParadeDB’s pg_search and running Elasticsearch for every app can feel heavy. At the same time, workload shape, scale, and operations still make specialized systems the better tool in plenty of cases.
This article walks four pillars using public materials from around 2025, and it keeps conditional conclusions explicit: not “Postgres replaces everything,” but where extensions are usually enough—and where they are not.
How to read benchmarks and case studies
The latency, throughput, and cost numbers below are useful signals, but they often come from vendor blogs, benchmark posts, and marketing-adjacent write-ups. When managed vs self-hosted, what’s included in “cost,” and hardware/tuning assumptions differ, the same headline number can mean different things. Read directionally and verify the original test conditions.
Large-name stories (Cloudflare, Redfin, IGN, …) also carry different stacks, teams, and SLAs than yours. Treat them as existence proofs, not a sizing shortcut—pair them with the decision inputs later in the article.
2. pgvector — how PostgreSQL shook the vector database conversation
If Part 3 covered product and cost angles, this section focuses on commonly cited public benchmarks.
Comparisons you will see in 2024–2025 write-ups
Materials comparing pgvector with Timescale’s pgvectorscale stack against Pinecone, Qdrant, and others often summarize runs on the order of 50 million Cohere embeddings (768‑dimensional) as a headline dataset.
| Metric | pgvector + pgvectorscale | Pinecone (p2) | Qdrant |
|---|---|---|---|
| p95 latency | Some posts report ~1.4× lower than Pinecone | baseline | reported in a similar band |
| Throughput (QPS) | 471 QPS @ 99% recall (example figures) | similar bands in some tests | 41 QPS @ 99% recall (example figures from public materials) |
| Cost (often EC2 self-host framing) | ~79% lower than Pinecone in some vendor comparisons | baseline | — |
In Qdrant comparisons, both systems are sometimes reported near sub‑100ms query latency at ~50M vectors—but Qdrant is also described as having tighter tail-latency behavior in some analyses, which matters when latency consistency is the SLA. Treat those comparisons as public benchmark and community discussions; read them next to Qdrant’s official benchmarks and anything you can reproduce in your own environment.
Around May 2025, AWS published materials on pgvector 0.8.0 on Aurora PostgreSQL, including up to ~5.7× speedups for certain query patterns versus earlier versions, and improvements around filtered vector search via features like iterative_scan.
How to read the table: it summarizes public posts. Pinecone is typically managed; pgvector stacks are often self-hosted (or RDS/Aurora). If you ignore operations staffing, uptime, and networking, “cheaper” can mislead. Rankings also move with tuning, indexes, and workload—use the numbers as directional, not universal.
Practical pgvector patterns
A major strength is combining vector search with relational predicates in one SQL statement:
-- Semantic similarity + relational filters in one query
-- (<-> is distance; smaller means closer)
SELECT
doc_id,
title,
created_at,
embedding <-> $1 AS distance_score
FROM documents
WHERE
user_id = $2
AND language = 'ko'
AND created_at > NOW() - INTERVAL '30 days'
ORDER BY distance_score ASC
LIMIT 10;
Many vector-only stores force you to fan out to multiple systems and stitch results in application code. PostgreSQL can keep the path unified with the planner and indexes you already operate.
Honest limits
For millions—often tens of millions—of vectors, RAG pipelines, and multi-tenant SaaS, pgvector is frequently a strong default. But at hundreds of millions+ scale, GPU acceleration, or single-digit millisecond SLAs baked into contracts, dedicated engines (Pinecone, Milvus, …) can still win.
“Start on Postgres; move when you truly hit a wall” remains common advice—and many teams hit product and organizational bottlenecks before they hit pgvector’s ceiling.
3. PostGIS — why a 25‑year‑old extension still defines geospatial Postgres
PostGIS first appeared in 2001. In 2020s surveys of extension usage, it still shows up near the top.
The problem PostGIS solves
If you store latitude/longitude as plain numbers, “find stores within 2km” collapses into painful scans. PostGIS mitigates that with spatial indexes (GiST, R‑tree family, …).
You get geometry types, spatial functions like ST_Distance, ST_Intersects, ST_Buffer, and coordinate handling close to OGC expectations—inside the database you already run.
Field stories you will hear
Redfin — the US real-estate platform reported performance and stability improvements after moving from MySQL to PostgreSQL + PostGIS for large spatial query workloads.
IGN (France) — frequently cited for managing high-resolution terrain data in PostGIS, emphasizing transactional consistency when many editors work concurrently.
Delivery / mobility — “riders within N km of the rider location” is the bread-and-butter query where spatial indexes change day‑to‑night operations.
Telecom / infrastructure — cell sites, cable routes, coverage polygons joined with business tables in one database.
PostGIS spans urban planning, logistics, environmental monitoring—anywhere spatial data meets business data. A recurring theme: one transactional system for both.
How to read the stories: traffic, schemas, and regulatory requirements differ. Before copying an architecture, sanity-check query frequency, indexing strategy, and coordinate reference consistency.
4. TimescaleDB — why Cloudflare picked it over ClickHouse for some analytics work
Dedicated time-series stacks include InfluxDB, ClickHouse, Apache Druid, and more. In the Postgres world, TimescaleDB is a frequent name for time-series and analytics-style workloads.
Cloudflare’s choice
Cloudflare is widely known for PostgreSQL on transactional paths and ClickHouse on analytics. In mid‑2025 engineering posts, they described choosing TimescaleDB instead of ClickHouse for some new analytics surfaces.
Reasons cited include: (1) TimescaleDB is a PostgreSQL extension, so hypertables can live next to ordinary tables on existing infra; (2) continuous aggregates reduce reliance on bespoke cron/batch pipelines for near-real-time rollups. Public posts mention ~5–35× latency improvements and ~33× storage reductions for measured workloads—always verify the measurement window and query set in the original article.
What Timescale adds
Hypertables chunk by time so large tables still favor “recent window” queries.
Continuous aggregates differ from a naive materialized view that recomputes everything—incremental maintenance is closer to how teams want live dashboards.
Columnar compression options target time-series storage costs; “~90% compression” style claims depend heavily on data shape and redundancy.
Typical domains
IoT sensors, market ticks, app metrics, server monitoring—anywhere append-heavy timestamps stream in. Major clouds offer paths to run Timescale alongside managed Postgres or as a dedicated service.
5. ParadeDB — full‑text search inside PostgreSQL without Elasticsearch
Elasticsearch is powerful—and operationally heavy: separate cluster, sync, monitoring, duplicated infra.
ParadeDB’s pg_search brings BM25-style ranking into PostgreSQL so you can pursue relevance-ranked search without standing up a second search system.
What pg_search enables
Example using the @@@ operator:
SELECT title, rating, description
FROM products
WHERE description @@@ 'comfortable running shoes'
AND rating >= 4.0
ORDER BY paradedb.score(id) DESC
LIMIT 10;
Classic Postgres FTS (tsvector / tsquery) is closer to keyword matching; BM25 factors term frequency and document length. ParadeDB also documents hybrid workflows with pgvector (keywords + vectors) in one query path.
Maturity and operations
Community write-ups about replacing Elasticsearch exist, but production mileage, version compatibility, and incident playbooks vary widely by product. Before betting a flagship search path on pg_search, validate docs for your Postgres major, support policy, and run load/failure drills in staging. “Fewer moving parts” is real—but so is the question of whether your team is ready to own search behavior without a dedicated cluster.
6. The extension landscape: frequently cited extensions in 2024–2025 surveys
The 2024 State of PostgreSQL survey (Tiger Data / Timescale) often lists extensions like:
| Rank | Extension | Typical use | Notes |
|---|---|---|---|
| 1 | PostGIS | GIS / spatial | Frequently #1 in recent years |
| 2 | pg_stat_statements | Query performance insights | “Almost built-in” ops extension |
| 3 | TimescaleDB | Time series / analytics | Rises with TS workloads |
| — | pgvector | AI / vectors | Rapid growth since ~2023 |
| — | PgBouncer | Connection pooling | Common ops layer |
Frequently mentioned additions include pg_cron, pg_partman, pgaudit, and others.
7. PostgreSQL maximalism: light and shadow
Some people call the “one Postgres for everything” posture PostgreSQL maximalism. It has real benefits—and real failure modes.
Upside: simpler stacks
Fewer systems can mean fewer on-call paths—no separate search cluster, fewer sync jobs, fewer siloed metrics stores—when your workload actually fits.
Downside: not best-in-class for every niche
If you need hundreds of millions of vectors at sub‑ms SLAs, pgvector alone may not be enough. If you need multi‑million rows/sec ingest with exotic analytics simultaneously, ClickHouse/Druid-class systems may fit better. Great extensions ≠ universal #1.
When extensions are “enough,” and when to split systems
| Need | Often fine on Postgres extensions | Signals to consider a separate system |
|---|---|---|
| Vectors | Up to tens of millions of rows, RAG, multi-tenant SaaS | Hundreds of millions+, GPUs, extreme latency SLAs |
| Geospatial | Typical LBS / spatial queries | Specialized real-time navigation optimizers |
| Time series | IoT, monitoring, analytics dashboards | Massive ingest + separate real-time stream processing |
| Full-text | In-app search, catalogs | Billion-document corpora, exotic real-time indexing |
Decision inputs worth writing down
Tables alone are not enough—capture:
- Growth rate and a 12‑month size estimate
- Ops staffing (full-time DBA? on-call rotation?)
- Failure tolerance (allowed downtime, RTO/RPO targets)
- Regulation / audit (retention, access control, certification constraints)
- TCO definition (license, storage, labor, training—what’s in the box?)
8. Closing the series: a final answer to “why PostgreSQL?”
Across five parts, the through-line is compact:
PostgreSQL wins by default not because it is always the fastest or the easiest tool in every niche. It wins because proven operations, extensibility, a deep ecosystem, and conservative defaults combine into a low‑regret choice for many teams.
From Instagram and Coinbase to YC-heavy stacks, from migration stories to Cloudflare’s Timescale choice and AI startups standardizing on pgvector—the pattern is the same: does this stack do the job well enough, for long enough, with a team we can actually run?
Thirty‑five years from a research project to a database developers keep picking in surveys is not an accident—it reflects shipping discipline, community, and governance.
The shortest answer to “why PostgreSQL?”:
If you don’t have a sharp reason to pick something else, Postgres belongs on the short list.
Series recap
| Part | Core message |
|---|---|
| Part 1 | Postgres near the top of developer surveys; every major cloud invests |
| Part 2 | Big tech keeps Postgres—not accidentally |
| Part 3 | Startups default to Supabase/Neon/pgvector-class stacks |
| Part 4 | Real MongoDB/Oracle exit stories and TCO narratives |
| Part 5 | pgvector, PostGIS, TimescaleDB, ParadeDB—where extensions simplify the stack—and where they don’t |
References
- Timescale — pgvector vs Pinecone · State of PostgreSQL 2024
- Qdrant — Benchmarks
- AWS — Database Blog · pgvector releases
- PostGIS — Documentation
- Cloudflare — Engineering Blog
- ParadeDB — GitHub
Written: April 2026 · Figures and product roadmaps change; verify primary sources when citing.