A vector database stores high-dimensional embedding vectors and enables fast approximate nearest neighbor (ANN) search — finding the top-K vectors most similar to a query vector. It is the backbone of semantic search, RAG (retrieval-augmented generation), recommendation systems, and multimodal search.
Embedding Storage
Embeddings are fixed-dimension float32 arrays. Common dimensions: 384 (MiniLM), 768 (BERT), 1536 (text-embedding-3-small), 3072 (text-embedding-3-large). A single float32 vector of 1536 dimensions occupies 6 KB of raw storage.
Each vector record stores:
- vector: raw float32 bytes (or quantized int8 for compression).
- entity_id: foreign key linking back to the original document, image, or product.
- namespace: logical partition — separate index per use case (documents, images, users).
- model_version: which embedding model produced this vector.
- metadata: filterable fields (category, language, user_id, date) stored as JSONB alongside the vector.
Scalar quantization: converting float32 to int8 reduces storage by 4x with a small recall penalty (~1–2%). Binary quantization (1 bit per dimension) reduces by 32x at larger recall cost, suitable for first-pass retrieval followed by float32 re-ranking.
HNSW Index: Structure and Parameters
HNSW builds a layered proximity graph:
- Layer 0: contains all nodes with short-range edges.
- Layers 1..L: exponentially fewer nodes; each node has long-range connections for fast global navigation.
Key parameters:
- M: number of bidirectional connections per node per layer. Higher M → better recall, more memory (memory ≈ M * 2 * num_vectors * 8 bytes for pointers). Typical M = 16–64.
- ef_construction: search breadth during index build. Higher ef_construction → better graph quality → better recall at query time, but slower build. Typical ef_construction = 200–400.
- ef_search (ef at query time): candidate list size during search. Higher ef_search → better recall, slower query. Tune ef_search to hit a target recall (e.g., 0.95) at acceptable latency.
Search: enter at the top layer at a fixed entry point, greedily move to the neighbor closest to the query, repeat per layer, descend to layer 0, collect ef_search candidates, return top-K by distance.
IVF Index: Cluster Centroids and Posting Lists
IVF runs k-means clustering on the training set to produce K cluster centroids. Each vector is assigned to its nearest centroid. At query time:
- Compute distance from query vector to all K centroids (fast, K is small — typically 1024–65536).
- Select the nprobe nearest centroids.
- Scan all vectors in those nprobe posting lists, compute distances, return top-K.
IVF excels for large static datasets (billions of vectors) where the training data is representative. Recall depends on nprobe: higher nprobe approaches brute-force recall but increases latency. IVF+PQ (product quantization) further compresses vectors within each posting list for memory efficiency.
Hybrid Search: Pre-filter vs Post-filter
Pre-filter strategy: apply metadata filter first (e.g., WHERE category = 'electronics' AND language = 'en'), then run ANN on the resulting entity_id set. For small result sets, brute-force L2 scan is faster than traversing the HNSW graph on a sparse subset. Guarantees recall within the filter but is slow for large filtered sets.
Post-filter strategy: run ANN on the full index, retrieve top-K * multiplier candidates, then apply metadata filter. Fast but may yield fewer than K results. Mitigated by over-fetching (retrieve 10x K and filter down).
Hybrid approach: use the filter's selectivity to decide strategy at query time. If estimated filtered set size is under a threshold (e.g., 10,000 vectors), use brute-force pre-filter. Otherwise, use post-filter with over-fetching.
Distance Metrics
- Cosine similarity: angle between vectors; magnitude-invariant. Standard for L2-normalized text embeddings.
- Dot product: equivalent to cosine when vectors are unit-normalized; encodes magnitude when not. Used in recommendation where magnitude encodes popularity.
- L2 (Euclidean) distance: absolute spatial distance. Used when embedding magnitude is semantically meaningful.
Most embedding APIs (OpenAI, Cohere) return L2-normalized vectors. Using cosine or dot product on these is equivalent, but dot product is computationally cheaper (no normalization step at query time).
SQL DDL
-- Vector records with metadata for hybrid search
CREATE TABLE VectorRecord (
id BIGSERIAL PRIMARY KEY,
namespace VARCHAR(128) NOT NULL,
entity_id BIGINT NOT NULL,
vector BYTEA NOT NULL, -- float32 array serialized
metadata JSONB NOT NULL DEFAULT '{}',
model_version VARCHAR(64) NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
UNIQUE (namespace, entity_id, model_version)
);
CREATE INDEX idx_vector_namespace ON VectorRecord (namespace, model_version);
CREATE INDEX idx_vector_metadata ON VectorRecord USING GIN (metadata);
-- HNSW index configuration metadata
CREATE TABLE HNSWIndex (
id BIGSERIAL PRIMARY KEY,
namespace VARCHAR(128) NOT NULL,
m INTEGER NOT NULL,
ef_construction INTEGER NOT NULL,
dimension INTEGER NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
UNIQUE (namespace)
);
Python: Core Operations
import numpy as np
import struct
from typing import Any
try:
import hnswlib
_HAS_HNSW = True
except ImportError:
_HAS_HNSW = False
# In-memory index registry: namespace -> hnswlib.Index
_indexes: dict[str, Any] = {}
def _get_or_create_index(namespace: str, dim: int, max_elements: int = 1_000_000) -> Any:
if namespace not in _indexes and _HAS_HNSW:
idx = hnswlib.Index(space='cosine', dim=dim)
idx.init_index(max_elements=max_elements, ef_construction=200, M=16)
idx.set_ef(50)
_indexes[namespace] = idx
return _indexes.get(namespace)
def upsert_vector(namespace: str, entity_id: int, vector: list[float], metadata: dict) -> None:
"""Insert or update a vector in the given namespace."""
vec = np.array(vector, dtype=np.float32)
vec = vec / (np.linalg.norm(vec) + 1e-9) # L2 normalize
idx = _get_or_create_index(namespace, len(vector))
if idx:
idx.add_items(vec.reshape(1, -1), [entity_id])
def search(namespace: str, query_vector: list[float], top_k: int,
filter_fn=None) -> list[dict]:
"""ANN search with optional post-filter function."""
q = np.array(query_vector, dtype=np.float32)
q = q / (np.linalg.norm(q) + 1e-9)
idx = _indexes.get(namespace)
if not idx:
return []
fetch_k = top_k * 10 if filter_fn else top_k
labels, distances = idx.knn_query(q.reshape(1, -1), k=min(fetch_k, idx.element_count))
results = []
for label, dist in zip(labels[0], distances[0]):
record = {'entity_id': int(label), 'score': float(1 - dist)}
if filter_fn is None or filter_fn(record):
results.append(record)
if len(results) >= top_k:
break
return results
def rebuild_index(namespace: str) -> None:
"""Drop and rebuild the HNSW index for a namespace from stored vectors."""
# In production: load all VectorRecord rows for namespace, re-add to fresh index
_indexes.pop(namespace, None)
print(f"Index {namespace} dropped; re-build from VectorRecord table required.")
Design Considerations Summary
- HNSW vs IVF: HNSW for dynamic datasets and low-latency queries; IVF for large static datasets with memory constraints.
- Pre-filter vs post-filter: choose based on filter selectivity; brute-force pre-filter for highly selective filters.
- Quantization: int8 scalar quantization gives 4x storage reduction with minimal recall loss — use for cost reduction at scale.
- Namespaces: isolate indexes per use case to avoid cross-contamination and enable independent model versioning.
- Re-indexing: model upgrades require full re-embed and re-index; run dual indexes during transition to avoid downtime.
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering
See also: Anthropic Interview Guide 2026: Process, Questions, and AI Safety
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Databricks Interview Guide 2026: Spark Internals, Delta Lake, and Lakehouse Architecture