Time-Series Databases for Quant: KDB, ClickHouse, InfluxDB, and What Quant Firms Actually Use
Quant firms generate and consume staggering volumes of time-series data: trade prints, quote updates, order book snapshots, position histories, signal scores, factor exposures, P&L attributions. A typical hedge fund’s research database holds billions of rows of tick data; HFT firms generate terabytes per day. Standard relational databases (PostgreSQL, MySQL) handle small amounts adequately but collapse at quant scale. Specialized time-series databases — kdb+, ClickHouse, InfluxDB, TimescaleDB, QuestDB, Parquet-based stacks — dominate quant infrastructure. For SWE and quant-developer interview candidates, understanding what these systems do and when each is appropriate is genuine domain knowledge that’s hard to acquire outside the industry.
What Makes Time-Series Different
Time-series workloads have characteristics that general-purpose databases don’t optimize for:
- Append-heavy: data is inserted in roughly time order; updates and deletes are rare.
- Time-keyed queries: “give me data from time T1 to T2” is the most common query.
- Aggregations are dominant: “average price per minute,” “total volume per day,” “VWAP per session.”
- Massive scale: billions of rows; terabytes per symbol-year.
- High-cardinality joins: joining trades to quotes by timestamp; aligning multiple instruments.
Optimizing for these workloads gives 10–100x performance over general-purpose databases for typical quant queries.
kdb+ / q
The dominant time-series database in finance for over two decades. Developed by KX Systems. Used by most major banks, hedge funds, and HFT firms. Programming language: q (a successor to APL/K).
Strengths
- Extreme performance: in-memory tables, columnar storage, vectorized operations.
- Compact code: q is famously concise; complex queries fit on a few lines.
- Native time-series operations: time-window joins, as-of joins, rolling aggregations.
- Industry standard: established tooling, large community, abundant talent (relatively).
Weaknesses
- License cost: kdb+ is commercial and expensive (six figures per year for production deployments at scale).
- q is a steep learning curve: terse syntax, idioms unlike most programming languages.
- Operational complexity: tuning, capacity planning, distributed setups require expertise.
When to use
Industry-standard for tick database storage and replay at large hedge funds, banks, and HFT firms. Most commonly: a kdb+ tick database storing years of trade and quote data, queried by research and risk teams.
ClickHouse
Open-source columnar database originally built by Yandex. Increasingly popular in quant for analytics workloads.
Strengths
- Open source, free.
- SQL interface (familiar to most developers, unlike q).
- Excellent compression and query speed.
- Scales horizontally with sharding.
- Active development, growing community.
Weaknesses
- Less specialized for time-series than kdb+: no native as-of join, less terse for time-series operations.
- Newer in finance; tooling and community knowledge less mature.
- Some operational rough edges compared to mature commercial systems.
When to use
Cost-conscious teams or shops without legacy kdb+ investment; analytical workloads beyond pure tick storage (clickstream-like data, event analytics).
InfluxDB / TimescaleDB / QuestDB
Other open-source time-series databases. Each has its niche.
- InfluxDB: popular for IoT and DevOps monitoring. Has growing finance use but doesn’t match kdb+ or ClickHouse for high-volume tick data.
- TimescaleDB: PostgreSQL extension for time-series. Familiar SQL; integrates with existing Postgres deployments. Performance is good for moderate scale; not optimal for HFT-scale tick data.
- QuestDB: newer entrant designed for high-throughput finance use cases. Open source. Smaller community but growing.
When to use
For specific niches: InfluxDB for monitoring infrastructure, TimescaleDB for moderate-volume time-series alongside relational data, QuestDB as a kdb+ alternative for cost-conscious teams.
Parquet + Object Storage Stacks
Modern data engineering pattern: store time-series in Parquet (columnar file format) on object storage (S3, GCS, Azure Blob). Query with SQL engines (Spark, Trino / Presto, DuckDB) or DataFrame libraries (Pandas, Polars).
Strengths
- Cost-effective: object storage is cheap; you pay for compute only when querying.
- Open: Parquet is widely supported.
- Decouples storage from compute: query with whatever engine fits the use case.
- Plays well with modern ML stacks.
Weaknesses
- Latency is higher than purpose-built time-series databases (network access, query planning overhead).
- Not suitable for low-latency operational queries; better for batch analytics and research.
- Operational complexity of distributed query engines.
When to use
Research and analytics workloads where latency in the seconds is acceptable. Many systematic hedge funds use this stack for research; pair with kdb+ or ClickHouse for low-latency operational queries.
Common Interview Questions
Choose a database
“You need to store 5 years of US equity tick data (~100M rows per day). What database do you use?” Discuss kdb+ as the industry standard if budget allows; ClickHouse as a cost-effective open-source alternative; Parquet + DuckDB for a research-only setup. Strong candidates discuss latency requirements, query patterns, team familiarity, and cost.
Design a tick database schema
“Design the schema for storing trades and quotes.” Columns for timestamp, symbol, side, price, quantity (for trades) plus bid, ask, bid_size, ask_size (for quotes). Discuss compression-friendly column ordering (group similar data types together). Discuss partitioning by date for query efficiency. Discuss handling timestamp precision (microseconds vs nanoseconds).
Discuss as-of joins
“Join trades to quotes such that each trade gets the prevailing quote at the time of the trade.” Standard SQL is awkward; kdb+ has native as-of join (aj). ClickHouse has ASOF JOIN. Explain the algorithm: for each trade, binary search for the latest quote with timestamp ≤ trade time. Strong candidates discuss why this is a fundamental operation in finance.
Compute VWAP
“Compute volume-weighted average price by symbol per day.” Aggregation: sum(price * volume) / sum(volume) per group. Trivial in any SQL-like language; discuss scaling considerations (do you compute VWAP per minute and aggregate, or aggregate raw trades?).
Discuss compression
“How does columnar compression work and why does it help time-series?” Same data type per column (run-length encoding, dictionary encoding); time-series often has correlated values (delta encoding); compression ratios of 5–20x are normal. Result: less data to read from disk, faster queries.
Practical Patterns
Hot vs cold storage
Recent data (today, this week) in fast in-memory storage (kdb+ in-memory, ClickHouse on SSDs). Older data in cheaper storage (kdb+ on disk, S3 / Parquet). Tiering policies move data automatically.
Real-time + historical
Real-time tick stream into a hot database; periodic batches written to historical store. Queries that span both use a federated query that hits both systems and merges.
Snapshotting
For order book data, storing every update is expensive. Common pattern: store snapshots (full book state) periodically plus deltas between snapshots. Reconstruct intermediate states by applying deltas to the most recent snapshot.
Time-zone handling
Time-zone bugs are endemic in finance data. Standard practice: store UTC timestamps; convert at query time if needed. Beware of daylight savings transitions; some venues operate in local time, requiring extra translation.
Frequently Asked Questions
Do I need to learn kdb+ / q before interviewing?
Helpful but not required. Most quant firms don’t expect candidates to know kdb+ before joining; they’ll train you. But familiarity is a meaningful advantage at firms where kdb+ is dominant (most banks, many large hedge funds). For interview prep, knowing kdb+ exists, what it’s good at, and roughly how its data model works (tables as columnar arrays, q as the query language) is sufficient. Going deeper signals serious interest in finance infrastructure.
What’s the relationship between time-series databases and quant research workflows?
Tight. Quant researchers spend significant time querying time-series data: pulling history for a strategy, joining trades to quotes, computing rolling statistics, aggregating across symbols. The database performance directly affects researcher productivity. Slow queries (minutes to hours for routine pulls) drag on research velocity; fast queries enable exploration that wouldn’t be feasible otherwise. Quant firms invest heavily in time-series infrastructure for exactly this reason.
How big are quant time-series datasets in practice?
For a major hedge fund with US equity coverage: tick data alone is ~100GB per day raw, ~10GB per day compressed. Five years of history is 5–10TB. Add options (much higher message rates), futures (multiple exchanges), FX (24/7 trading), credit, and the totals climb to dozens or hundreds of TB. Major HFT firms often have petabyte-scale tick archives. Storage and access costs are real engineering concerns.
How does this differ from time-series in non-finance contexts (IoT, monitoring)?
Finance time-series has higher message rates, smaller events (a trade is a few bytes), higher emphasis on exact timestamp precision (microseconds matter), and richer query patterns (as-of joins, multi-symbol aggregations, intraday seasonality). IoT and monitoring time-series tend to have lower message rates per source but more sources; queries tend to be simpler aggregations over time windows. Different optimization targets; different DB choices.
Is the time-series database choice a strategic decision or an implementation detail?
Strategic. Switching from kdb+ to ClickHouse (or vice versa) is a multi-year project for established firms; the system is core infrastructure. For new firms, the choice shapes the team’s skill profile (kdb+ vs SQL-fluent), the cost structure (license vs commodity), and the operational model. Senior engineering candidates at quant firms should be ready to discuss these trade-offs; junior candidates should at least know that the choice exists and matters.
See also: Order Book Dynamics for Quant Interviews • Time-Series Analysis for Quant Interviews • Algorithmic Trading System Architecture