Lv.2 BeginnerPostgreSQL

2026.04.1236 min readLv.2 Beginner

SeriesWhy PostgreSQL? · Part 5View series hub

Why PostgreSQL? Part 5 — The ecosystem: pgvector, PostGIS, TimescaleDB

One more PostgreSQL extension—and you can seriously discuss vector search, geospatial queries, time-series analytics, and BM25-style full-text search on the same engine. This series finale walks through pgvector, PostGIS, TimescaleDB, and ParadeDB (pg_search): what public benchmarks and vendor write-ups claim, where “replace the specialist” is conditionally true, and how to read latency/cost numbers when managed services, self-hosting, and tuning assumptions differ. It closes the five-part arc on why PostgreSQL is often the lowest-regret default—reliability, extensibility, ecosystem depth—and when a separate system still earns its place. Use the decision inputs at the end alongside the comparison table: growth rate, staffing, failure tolerance, compliance, and how you define TCO.

Series outline

Part 1 — PostgreSQL in the numbers

Part 2 — Why big tech chose PostgreSQL

Part 3 — Startups: speed vs. cost

Part 4 — MongoDB & Oracle: real migration stories

Part 5 — The ecosystem: pgvector, PostGIS, TimescaleDB (this post · series finale)

Introduction: the “one database for everything” idea
pgvector — how PostgreSQL shook the vector database conversation
PostGIS — why a 25‑year‑old extension still defines geospatial Postgres
TimescaleDB — why Cloudflare picked it over ClickHouse for some analytics work
ParadeDB — full‑text search inside PostgreSQL without Elasticsearch
The extension landscape: frequently cited extensions in 2024–2025 surveys
PostgreSQL maximalism: light and shadow
Closing the series: a final answer to “why PostgreSQL?”

1. Introduction: the “one database for everything” idea

People new to PostgreSQL’s extension ecosystem sometimes say:

“If Postgres can do all of this, why do other databases exist?”

It is not a silly question—and it is not universally true, either.

Add pgvector and you may not need a separate vector store. Add PostGIS and you can handle many spatial queries without a dedicated GIS server. Add TimescaleDB and dedicated time-series products look less mandatory. Add ParadeDB’s pg_search and running Elasticsearch for every app can feel heavy. At the same time, workload shape, scale, and operations still make specialized systems the better tool in plenty of cases.

This article walks four pillars using public materials from around 2025, and it keeps conditional conclusions explicit: not “Postgres replaces everything,” but where extensions are usually enough—and where they are not.

How to read benchmarks and case studies

The latency, throughput, and cost numbers below are useful signals, but they often come from vendor blogs, benchmark posts, and marketing-adjacent write-ups. When managed vs self-hosted, what’s included in “cost,” and hardware/tuning assumptions differ, the same headline number can mean different things. Read directionally and verify the original test conditions.

Large-name stories (Cloudflare, Redfin, IGN, …) also carry different stacks, teams, and SLAs than yours. Treat them as existence proofs, not a sizing shortcut—pair them with the decision inputs later in the article.

2. pgvector — how PostgreSQL shook the vector database conversation

If Part 3 covered product and cost angles, this section focuses on commonly cited public benchmarks.

Comparisons you will see in 2024–2025 write-ups

Materials comparing pgvector with Timescale’s pgvectorscale stack against Pinecone, Qdrant, and others often summarize runs on the order of 50 million Cohere embeddings (768‑dimensional) as a headline dataset.

Metric	pgvector + pgvectorscale	Pinecone (p2)	Qdrant
p95 latency	Some posts report ~1.4× lower than Pinecone	baseline	reported in a similar band
Throughput (QPS)	471 QPS @ 99% recall (example figures)	similar bands in some tests	41 QPS @ 99% recall (example figures from public materials)
Cost (often EC2 self-host framing)	~79% lower than Pinecone in some vendor comparisons	baseline	—

In Qdrant comparisons, both systems are sometimes reported near sub‑100ms query latency at ~50M vectors—but Qdrant is also described as having tighter tail-latency behavior in some analyses, which matters when latency consistency is the SLA. Treat those comparisons as public benchmark and community discussions; read them next to Qdrant’s official benchmarks and anything you can reproduce in your own environment.

Around May 2025, AWS published materials on pgvector 0.8.0 on Aurora PostgreSQL, including up to ~5.7× speedups for certain query patterns versus earlier versions, and improvements around filtered vector search via features like iterative_scan.

How to read the table: it summarizes public posts. Pinecone is typically managed; pgvector stacks are often self-hosted (or RDS/Aurora). If you ignore operations staffing, uptime, and networking, “cheaper” can mislead. Rankings also move with tuning, indexes, and workload—use the numbers as directional, not universal.

Practical pgvector patterns

A major strength is combining vector search with relational predicates in one SQL statement:

-- Semantic similarity + relational filters in one query
-- (<-> is distance; smaller means closer)
SELECT
  doc_id,
  title,
  created_at,
  embedding <-> $1 AS distance_score
FROM documents
WHERE
  user_id = $2
  AND language = 'ko'
  AND created_at > NOW() - INTERVAL '30 days'
ORDER BY distance_score ASC
LIMIT 10;

Many vector-only stores force you to fan out to multiple systems and stitch results in application code. PostgreSQL can keep the path unified with the planner and indexes you already operate.

Honest limits

For millions—often tens of millions—of vectors, RAG pipelines, and multi-tenant SaaS, pgvector is frequently a strong default. But at hundreds of millions+ scale, GPU acceleration, or single-digit millisecond SLAs baked into contracts, dedicated engines (Pinecone, Milvus, …) can still win.

“Start on Postgres; move when you truly hit a wall” remains common advice—and many teams hit product and organizational bottlenecks before they hit pgvector’s ceiling.

3. PostGIS — why a 25‑year‑old extension still defines geospatial Postgres

PostGIS first appeared in 2001. In 2020s surveys of extension usage, it still shows up near the top.

The problem PostGIS solves

If you store latitude/longitude as plain numbers, “find stores within 2km” collapses into painful scans. PostGIS mitigates that with spatial indexes (GiST, R‑tree family, …).

You get geometry types, spatial functions like ST_Distance, ST_Intersects, ST_Buffer, and coordinate handling close to OGC expectations—inside the database you already run.

Field stories you will hear

Redfin — the US real-estate platform reported performance and stability improvements after moving from MySQL to PostgreSQL + PostGIS for large spatial query workloads.

IGN (France) — frequently cited for managing high-resolution terrain data in PostGIS, emphasizing transactional consistency when many editors work concurrently.

Delivery / mobility — “riders within N km of the rider location” is the bread-and-butter query where spatial indexes change day‑to‑night operations.

Telecom / infrastructure — cell sites, cable routes, coverage polygons joined with business tables in one database.

PostGIS spans urban planning, logistics, environmental monitoring—anywhere spatial data meets business data. A recurring theme: one transactional system for both.

How to read the stories: traffic, schemas, and regulatory requirements differ. Before copying an architecture, sanity-check query frequency, indexing strategy, and coordinate reference consistency.

4. TimescaleDB — why Cloudflare picked it over ClickHouse for some analytics work

Dedicated time-series stacks include InfluxDB, ClickHouse, Apache Druid, and more. In the Postgres world, TimescaleDB is a frequent name for time-series and analytics-style workloads.

Cloudflare’s choice

Cloudflare is widely known for PostgreSQL on transactional paths and ClickHouse on analytics. In mid‑2025 engineering posts, they described choosing TimescaleDB instead of ClickHouse for some new analytics surfaces.

Reasons cited include: (1) TimescaleDB is a PostgreSQL extension, so hypertables can live next to ordinary tables on existing infra; (2) continuous aggregates reduce reliance on bespoke cron/batch pipelines for near-real-time rollups. Public posts mention ~5–35× latency improvements and ~33× storage reductions for measured workloads—always verify the measurement window and query set in the original article.

What Timescale adds

Hypertables chunk by time so large tables still favor “recent window” queries.

Continuous aggregates differ from a naive materialized view that recomputes everything—incremental maintenance is closer to how teams want live dashboards.

Columnar compression options target time-series storage costs; “~90% compression” style claims depend heavily on data shape and redundancy.

Typical domains

IoT sensors, market ticks, app metrics, server monitoring—anywhere append-heavy timestamps stream in. Major clouds offer paths to run Timescale alongside managed Postgres or as a dedicated service.

5. ParadeDB — full‑text search inside PostgreSQL without Elasticsearch

Elasticsearch is powerful—and operationally heavy: separate cluster, sync, monitoring, duplicated infra.

ParadeDB’s pg_search brings BM25-style ranking into PostgreSQL so you can pursue relevance-ranked search without standing up a second search system.

What `pg_search` enables

Example using the @@@ operator:

SELECT title, rating, description
FROM products
WHERE description @@@ 'comfortable running shoes'
  AND rating >= 4.0
ORDER BY paradedb.score(id) DESC
LIMIT 10;

Classic Postgres FTS (tsvector / tsquery) is closer to keyword matching; BM25 factors term frequency and document length. ParadeDB also documents hybrid workflows with pgvector (keywords + vectors) in one query path.

Maturity and operations

Community write-ups about replacing Elasticsearch exist, but production mileage, version compatibility, and incident playbooks vary widely by product. Before betting a flagship search path on pg_search, validate docs for your Postgres major, support policy, and run load/failure drills in staging. “Fewer moving parts” is real—but so is the question of whether your team is ready to own search behavior without a dedicated cluster.

6. The extension landscape: frequently cited extensions in 2024–2025 surveys

The 2024 State of PostgreSQL survey (Tiger Data / Timescale) often lists extensions like:

Rank	Extension	Typical use	Notes
1	PostGIS	GIS / spatial	Frequently #1 in recent years
2	pg_stat_statements	Query performance insights	“Almost built-in” ops extension
3	TimescaleDB	Time series / analytics	Rises with TS workloads
—	pgvector	AI / vectors	Rapid growth since ~2023
—	PgBouncer	Connection pooling	Common ops layer

Frequently mentioned additions include pg_cron, pg_partman, pgaudit, and others.

7. PostgreSQL maximalism: light and shadow

Some people call the “one Postgres for everything” posture PostgreSQL maximalism. It has real benefits—and real failure modes.

Upside: simpler stacks

Fewer systems can mean fewer on-call paths—no separate search cluster, fewer sync jobs, fewer siloed metrics stores—when your workload actually fits.

Downside: not best-in-class for every niche

If you need hundreds of millions of vectors at sub‑ms SLAs, pgvector alone may not be enough. If you need multi‑million rows/sec ingest with exotic analytics simultaneously, ClickHouse/Druid-class systems may fit better. Great extensions ≠ universal #1.

When extensions are “enough,” and when to split systems

Need	Often fine on Postgres extensions	Signals to consider a separate system
Vectors	Up to tens of millions of rows, RAG, multi-tenant SaaS	Hundreds of millions+, GPUs, extreme latency SLAs
Geospatial	Typical LBS / spatial queries	Specialized real-time navigation optimizers
Time series	IoT, monitoring, analytics dashboards	Massive ingest + separate real-time stream processing
Full-text	In-app search, catalogs	Billion-document corpora, exotic real-time indexing

Decision inputs worth writing down

Tables alone are not enough—capture:

Growth rate and a 12‑month size estimate
Ops staffing (full-time DBA? on-call rotation?)
Failure tolerance (allowed downtime, RTO/RPO targets)
Regulation / audit (retention, access control, certification constraints)
TCO definition (license, storage, labor, training—what’s in the box?)

8. Closing the series: a final answer to “why PostgreSQL?”

Across five parts, the through-line is compact:

PostgreSQL wins by default not because it is always the fastest or the easiest tool in every niche. It wins because proven operations, extensibility, a deep ecosystem, and conservative defaults combine into a low‑regret choice for many teams.

From Instagram and Coinbase to YC-heavy stacks, from migration stories to Cloudflare’s Timescale choice and AI startups standardizing on pgvector—the pattern is the same: does this stack do the job well enough, for long enough, with a team we can actually run?

Thirty‑five years from a research project to a database developers keep picking in surveys is not an accident—it reflects shipping discipline, community, and governance.

The shortest answer to “why PostgreSQL?”:

If you don’t have a sharp reason to pick something else, Postgres belongs on the short list.

Series recap

Part	Core message
Part 1	Postgres near the top of developer surveys; every major cloud invests
Part 2	Big tech keeps Postgres—not accidentally
Part 3	Startups default to Supabase/Neon/pgvector-class stacks
Part 4	Real MongoDB/Oracle exit stories and TCO narratives
Part 5	pgvector, PostGIS, TimescaleDB, ParadeDB—where extensions simplify the stack—and where they don’t

What to read next on this blog

This five-part arc ends here. Topics like pipelines, ELT, and observability—where data does not stay inside a single database—continue on the Data engineering topic hub and published series. Use the RSS page to subscribe so you do not miss new posts.

References

Timescale — pgvector vs Pinecone · State of PostgreSQL 2024
Qdrant — Benchmarks
AWS — Database Blog · pgvector releases
PostGIS — Documentation
Cloudflare — Engineering Blog
ParadeDB — GitHub

Written: April 2026 · Figures and product roadmaps change; verify primary sources when citing.

Why PostgreSQL? Part 5 — The ecosystem: pgvector, PostGIS, TimescaleDB

Table of contents

1. Introduction: the “one database for everything” idea

How to read benchmarks and case studies

2. pgvector — how PostgreSQL shook the vector database conversation

Comparisons you will see in 2024–2025 write-ups

Practical pgvector patterns

Honest limits

3. PostGIS — why a 25‑year‑old extension still defines geospatial Postgres

The problem PostGIS solves

Field stories you will hear

4. TimescaleDB — why Cloudflare picked it over ClickHouse for some analytics work

Cloudflare’s choice

What Timescale adds

Typical domains

5. ParadeDB — full‑text search inside PostgreSQL without Elasticsearch

What `pg_search` enables

Maturity and operations

6. The extension landscape: frequently cited extensions in 2024–2025 surveys

7. PostgreSQL maximalism: light and shadow

When extensions are “enough,” and when to split systems

Decision inputs worth writing down

8. Closing the series: a final answer to “why PostgreSQL?”

Series recap

What to read next on this blog

References

Share This Article

Why PostgreSQL?

Follow new posts via RSS

Table of contents

1. Introduction: the “one database for everything” idea

How to read benchmarks and case studies

2. pgvector — how PostgreSQL shook the vector database conversation

Comparisons you will see in 2024–2025 write-ups

Practical pgvector patterns

Honest limits

3. PostGIS — why a 25‑year‑old extension still defines geospatial Postgres

The problem PostGIS solves

Field stories you will hear

4. TimescaleDB — why Cloudflare picked it over ClickHouse for some analytics work

Cloudflare’s choice

What Timescale adds

Typical domains

5. ParadeDB — full‑text search inside PostgreSQL without Elasticsearch

What pg_search enables

Maturity and operations

6. The extension landscape: frequently cited extensions in 2024–2025 surveys

7. PostgreSQL maximalism: light and shadow

When extensions are “enough,” and when to split systems

Decision inputs worth writing down

8. Closing the series: a final answer to “why PostgreSQL?”

Series recap

What to read next on this blog

References

Share This Article

Why PostgreSQL?

Follow new posts via RSS

What `pg_search` enables