Question 1

When should you use full refresh vs incremental refresh for a materialized view?

Accepted Answer

Use full refresh when the source dataset is small enough that a full re-scan is affordable, or when the view query is complex enough that delta application would be error-prone. Use incremental refresh when the source table is large and only a small fraction of rows change between refreshes. Incremental refresh requires a reliable change tracking mechanism such as an updated_at column or CDC.

Question 2

How does CDC integrate with a materialized view?

Accepted Answer

CDC emits a change event (INSERT, UPDATE, DELETE) for every row modification in the source table. The materialized view service consumes these events from a stream (e.g., Kafka) and applies each delta to the view table directly. INSERTs add new rows, UPDATEs modify existing rows, and DELETEs remove rows. This enables near-real-time materialized views with sub-second staleness.

Question 3

How do you prevent concurrent refresh jobs on the same materialized view?

Accepted Answer

Use a refresh flag in the metadata table updated atomically with UPDATE ... WHERE is_refreshing = FALSE. If zero rows are updated, a refresh is already in progress and the new job should skip. Alternatively, use a database advisory lock keyed to the view name. The lock is released when the refresh completes or on error (via a finally block).

Question 4

How should a staleness threshold be chosen for a materialized view?

Accepted Answer

The staleness threshold should reflect the acceptable data age for the consumers of the view. For real-time dashboards, a threshold of seconds to minutes may be required, driving CDC-based refresh. For daily reporting views, hours may be acceptable. The threshold should also account for the cost of falling back to base tables — if base table queries are very expensive, a more aggressive refresh strategy is warranted.

Question 5

How is a materialized view incrementally refreshed?

Accepted Answer

Incremental refresh applies only the delta of changes (inserts, updates, deletes) from the base tables since the last refresh, rather than recomputing the entire view from scratch. The system maintains auxiliary state such as row counts or partial aggregates per group key so that adding or removing rows updates the precomputed result in O(delta) rather than O(total).

Question 6

How does CDC trigger view updates?

Accepted Answer

Change Data Capture (CDC) reads the database replication log (e.g., PostgreSQL WAL or MySQL binlog) and emits a stream of row-level change events that a view-maintenance service consumes to apply incremental updates to the materialized view. This decouples the refresh from the write path and allows the view to lag behind the source by a bounded amount that can be monitored via consumer lag metrics.

Question 7

How are query routing decisions made between base and materialized view?

Accepted Answer

A query router inspects the query's predicate and aggregation pattern and routes to the materialized view when the query can be answered entirely from precomputed data and the acceptable staleness bound is not violated. Queries that require freshness beyond the view's lag guarantee, or that filter on dimensions not covered by the view, fall through to the base tables.

Question 8

How is view staleness bounded?

Accepted Answer

Staleness is bounded by publishing a watermark (the highest source log sequence number fully reflected in the view) alongside the view data, and the query router rejects the view when the watermark lag exceeds the configured SLA. Operators set alerting thresholds on consumer lag and can apply backpressure or trigger full refreshes when incremental lag grows beyond acceptable limits.

Materialized View Low-Level Design: Incremental Refresh, Change Data Capture, and Query Routing

What Is a Materialized View?

Full Refresh

Incremental Refresh

CDC-Based Real-Time Updates

Staleness Tracking

Concurrent Refresh Prevention

Query Routing

SQL Schema

Python Implementation Sketch