# When pgvector is enough and when you actually need Pinecone

Source: https://www.techinterview.org/post/3233475450/when-pgvector-is-enough/
Updated: 2026-07-02 · techinterview.org

Most teams that spin up Pinecone for a new feature could have shipped the same thing on the Postgres database they already run. The decision usually goes backwards. Someone reads that vector search needs a vector database, picks the one with the best landing page, and only later asks how many embeddings they actually have. For a lot of products the honest answer is a few hundred thousand, and at that size the choice barely moves performance. It moves your bill and your on-call rotation a great deal.

So before comparing engines, get specific about four things: how many vectors you'll store this year, whether they already sit next to relational data you join against, whether you need filtered or hybrid search, and what recall you can live with. Those answers settle the question better than any benchmark.

## pgvector, and why it's the default until it isn't

pgvector is a Postgres extension. Your embeddings become a column, similarity search becomes an `ORDER BY embedding <=> query LIMIT k`, and you index it with HNSW (added in 0.5.0) for approximate nearest-neighbor lookups. Because you're inside Postgres, one query can join your products table, filter by tenant, respect a transaction, and come back in a single round trip. That property, search and your real data in one place, is the reason pgvector wins more often than the benchmark crowd admits.

At a million vectors with an HNSW index, single-query latency sits in the low single-digit milliseconds, and the database is rarely your bottleneck. The pain shows up in two spots. Building an HNSW index over tens of millions of rows is slow and memory-hungry, and HNSW wants to live in RAM, so the working set has to fit. Timescale's pgvectorscale extension goes after both with a disk-backed index called StreamingDiskANN, which keeps part of the structure on SSD and cuts the memory you need to hold a large set. Their published numbers claim sizable latency and throughput gains over Pinecone's storage-optimized tier at high recall. Treat any vendor benchmark as a ceiling rather than a promise, but the architecture is sound and people run it in production.

Where pgvector gets awkward: you now own index tuning, autovacuum behavior on a huge table, and the write-throughput ceiling of a single primary. Push past roughly ten million vectors with heavy writes, or want horizontal scale without sharding Postgres by hand, and the calculus shifts.

## Pinecone when you want to run nothing

Pinecone's pitch is that you never think about an index again. The serverless model splits storage from compute, scales to zero, and handles replication and recall tuning behind an API. For a team with no appetite for database operations, that is worth real money.

Pricing is usage-based rather than a flat instance cost: stored data per gigabyte-month plus read and write units, with plan minimums that depend on tier. The trap is that read units scale with query volume and with how much each query has to scan, so a chatty product with broad filters can run up a bill you couldn't have predicted from a sticker price. Pull Pinecone's current pricing page before you model anything, because the unit definitions and minimums have changed more than once.

Two operational details bite people. Scale-to-zero means a cold index can add anywhere from a couple hundred milliseconds to a couple of seconds on the first query after an idle stretch, so anything with a tight SLA ends up paying for always-on capacity instead. And metadata filtering, the thing you'll want for multi-tenant search, can add latency depending on how selective the filter is. None of this is disqualifying. It's the stuff the marketing page leaves out.

## Weaviate when hybrid search is the point

Weaviate is open source with a managed cloud, which gives you an exit if the hosted bill stops making sense. Its strongest argument is built-in hybrid search: it fuses keyword (BM25) scoring with vector similarity natively, so you don't bolt a separate text index onto the stack to handle queries where exact terms matter. For search over documents full of names, SKUs, or jargon that embeddings smear together, that's a real edge.

Weaviate Cloud shifted to pricing based on the volume of vector dimensions stored (vector count times dimensionality), with tiers that start low and climb with scale and support level. That model rewards smaller embeddings, which lines up with a decision you should make anyway: a 384- or 768-dimension model is cheaper to store and search than a 1536-dimension one, and is often good enough for retrieval. As ever, confirm the current tier structure on their pricing page instead of trusting a figure from a blog post, this one included.

## How they stack up

|   | pgvector | Pinecone | Weaviate |
| --- | --- | --- | --- |
| What it is | Postgres extension | Managed serverless API | Open source plus managed cloud |
| Sweet spot | Under a few million vectors, data already in Postgres | Spiky traffic, zero ops, any scale | Search-first apps needing hybrid relevance |
| Hybrid keyword + vector | Manual, via Postgres full-text | Limited, metadata filters only | Built in (BM25 fused with vector) |
| Ops burden | You run Postgres | None | None on cloud, or self-host it |
| Pricing shape | Your instance cost | Storage + read/write units | Per vector-dimensions stored |
| Cold start | None | Up to ~2s after idle on serverless | None on always-on tiers |
| Joins with relational data | Native, in one query | Separate store, you sync | Separate store, you sync |

Read the table as rough guidance, not gospel. Latency depends on dimensions, recall target, hardware, and filter selectivity, and every vendor's pricing drifts over time. The shape of the tradeoff holds even when the digits move.

## Where the cost curve actually crosses

At small scale, hosted services are cheap and self-hosting is a rounding error either way, so optimize for whatever saves engineering time. Somewhere in the low millions of vectors, the math starts favoring your own infrastructure, because a managed service's per-query and per-gigabyte charges keep growing while a Postgres instance or a self-hosted Weaviate node mostly doesn't. By a hundred million vectors with steady traffic, a fully managed bill can reach four or five figures a month, which is plenty to justify an engineer babysitting infrastructure. The crossover depends on query volume far more than vector count, so measure your reads, not only your storage.

There's a cost that never shows on an invoice: the operational tax of a second datastore. If your vectors live in Pinecone and your source-of-truth rows live in Postgres, you now own the sync between them, the reindex job when an embedding model changes, and the consistency bugs when the two drift apart. Keeping everything in Postgres with pgvector deletes that whole class of problem, which is part of why it's the right default for products that aren't search-first.

## What I'd actually pick

Already running Postgres and expecting under a few million vectors? Use pgvector, and add pgvectorscale when memory gets tight. You'll move faster and your data stays consistent. If search is the product, you need keyword-plus-vector relevance, and you want the option to self-host later, start on Weaviate. If traffic is spiky, nobody wants to run a database, and the budget cares more about engineer-hours than cloud line items, Pinecone is the path of least resistance, and the serverless model suits bursty workloads well.

The mistake isn't picking the wrong engine. All three return good results at the scales most teams operate at. The mistake is reaching for a specialized, separately billed, separately operated system to solve a problem your existing database would have handled, then discovering the cost six months in when the bill and the sync bugs arrive together. Size the problem first. The database falls out of that.
