Question 1

How does Debezium read database changes without impacting the source database?

Accepted Answer

Debezium connects to PostgreSQL as a replication client using the PostgreSQL streaming replication protocol. PostgreSQL writes every committed change to its Write-Ahead Log (WAL) before applying it to the data files — this happens regardless of whether any replication clients are connected. Debezium reads the WAL stream using a logical replication slot (which decodes raw WAL bytes into row-level change events), then publishes those events to Kafka. The WAL is written anyway for crash recovery and streaming replication; Debezium simply reads it. Overhead on the source database: minimal CPU (decoding WAL), plus WAL must be retained until Debezium's slot has consumed it (risk: disk fill if Debezium falls behind). There are no triggers, no polling queries, no additional writes to the source database.

Question 2

What is the difference between CDC and the outbox pattern for event publishing?

Accepted Answer

The outbox pattern writes events to an Outbox table in the same DB transaction as the business data, then a relay reads the Outbox table and publishes to Kafka. The application explicitly controls what events are published and their format. CDC reads every change to every configured table from the WAL — you get a stream of all inserts, updates, and deletes, including changes made by migrations, admin scripts, or other services. CDC is more powerful (captures everything) but noisier (must filter relevant events). Outbox is more targeted (only events you explicitly write). Combined approach (recommended): use the outbox pattern to write intent events, then use Debezium's outbox event router to publish them — you get explicit event control with zero-overhead delivery.

Question 3

How do you handle schema changes in CDC event streams?

Accepted Answer

When you add or rename a column in the source table, the CDC event schema changes. Downstream consumers that expect the old schema will fail. Solution: use Confluent Schema Registry with Avro serialization. Every CDC event includes a schema ID; the consumer fetches the schema by ID and deserializes accordingly. Schema Registry enforces compatibility rules: BACKWARD compatibility means new schemas can read old events (new optional fields only); FORWARD means old schemas can read new events. Use BACKWARD_TRANSITIVE for most cases. For breaking changes (removing a column, changing a type), version the topic (orders.v2) and migrate consumers before decommissioning the old topic. Never make breaking schema changes without a migration plan for all consumers.

Question 4

How do you prevent WAL accumulation from causing disk exhaustion?

Accepted Answer

PostgreSQL retains WAL segments until all replication slots have consumed them. If Debezium falls behind (slow consumer, Kafka backpressure, Debezium downtime), WAL grows unboundedly. Prevention: (1) Monitor replication slot lag: SELECT slot_name, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn)) AS lag FROM pg_replication_slots. Alert if lag > 1GB. (2) Set wal_keep_size (PostgreSQL 13+) as a minimum WAL retention floor, separate from slot retention. (3) Set a max_slot_wal_keep_size limit — if a slot falls more than this far behind, PostgreSQL drops the slot automatically (preventing disk exhaustion at the cost of needing a full resnapshot). (4) Monitor Debezium consumer lag in Kafka consumer groups.

Question 5

How does the initial snapshot work when first enabling CDC?

Accepted Answer

When Debezium first starts for a table, it cannot just read the WAL from the beginning — the WAL does not retain full history. Debezium performs a consistent snapshot: it acquires a consistent snapshot of the table (using a transaction with REPEATABLE READ isolation), reads all existing rows and emits them as CDC events with "op": "r" (read), records the WAL LSN at the snapshot point, then transitions to streaming live changes from that LSN forward. This ensures no gap between the snapshot and live changes. For large tables (100M+ rows), the snapshot can take hours. During snapshot, the table is not locked (Debezium uses REPEATABLE READ, not an exclusive lock), but replication slot lag accumulates. Schedule initial snapshots during low-traffic windows.

Change Data Capture (CDC) Low-Level Design: Real-Time Database Streaming

How CDC Works at the Database Level

Debezium CDC Event Schema

Common CDC Use Cases

Handling Ordering and Exactly-Once Semantics

Key Interview Points