Q: How do you generate stable event IDs for deduplication?

The producer must assign event IDs before publishing — the consumer should never generate the deduplication key. Best practices: (1) UUID v4 for individual user actions (a button click generates one unique event). (2) Deterministic ID for computed events: SHA-256(entity_id + event_type + timestamp_bucket) ensures the same logical event always produces the same ID. (3) Outbox pattern: INSERT the event into an Outbox table within the same transaction as the business logic change. The outbox row's primary key becomes the event ID. This guarantees the event is published exactly once and idempotently replayable.

Question 1

Why do message queues deliver events at-least-once instead of exactly-once?

Accepted Answer

Exactly-once delivery requires distributed consensus between the message broker and all consumers — the broker must guarantee the consumer received AND processed the message, and the consumer must confirm it processed exactly once. This is expensive: it requires two-phase commit or equivalent. Kafka, SQS, and most brokers instead guarantee at-least-once: the message will be delivered, but may be delivered more than once if the consumer crashes after processing but before acknowledging. Designing consumers to be idempotent is far simpler and more scalable than requiring exactly-once delivery from the broker.

Question 2

How does INSERT ON CONFLICT DO NOTHING prevent duplicate event processing?

Accepted Answer

Store a ProcessedEvent record keyed by (event_id, processor_id). Before processing, attempt INSERT INTO ProcessedEvent ... ON CONFLICT DO NOTHING. If the INSERT succeeds (rowcount=1), this is the first time seeing this event — proceed with processing. If the INSERT is a no-op (rowcount=0), the event was already processed — skip it. This is atomic: two concurrent workers processing the same event simultaneously will both attempt the INSERT, but only one will succeed. The database PRIMARY KEY constraint enforces mutual exclusion at the storage layer.

Question 3

When should you use Redis SET NX for deduplication versus a database table?

Accepted Answer

Redis SET NX (set if not exists) is atomic, ~10x faster than a DB write, and works well for high-throughput deduplication where some duplicates are acceptable. Use Redis when: event rate is very high (>10K/sec), strict durability is not required, and TTL-based cleanup is acceptable. Use a database ProcessedEvent table when: events are financially sensitive (payments, inventory), durability is required (Redis can lose data on restart without AOF/RDB persistence), or you need a queryable audit log of processed events. For payments: DB. For analytics event dedup: Redis.

Question 4

How do you clean up old deduplication records efficiently?

Accepted Answer

Use table partitioning by time: CREATE TABLE processed_events_2024_01 PARTITION OF ProcessedEvent FOR VALUES FROM ('2024-01-01') TO ('2024-02-01'). At the start of each month, DROP the oldest partition. Dropping a partition is a metadata operation — it takes milliseconds regardless of row count. Contrast with DELETE WHERE processed_at < '2024-01-01': on a table with 100M rows, this DELETE runs for hours and generates massive WAL writes, impacting production query performance. Always use partitioning for append-only tables with time-based retention.

Question 5

How do you generate stable event IDs for deduplication?

Accepted Answer

The producer must assign event IDs before publishing — the consumer should never generate the deduplication key. Best practices: (1) UUID v4 for individual user actions (a button click generates one unique event). (2) Deterministic ID for computed events: SHA-256(entity_id + event_type + timestamp_bucket) ensures the same logical event always produces the same ID. (3) Outbox pattern: INSERT the event into an Outbox table within the same transaction as the business logic change. The outbox row's primary key becomes the event ID. This guarantees the event is published exactly once and idempotently replayable.

Event Deduplication System Low-Level Design

Event Deduplication System — Low-Level Design

The Problem: At-Least-Once Delivery

Core Data Model

Basic Deduplication with INSERT ON CONFLICT

Deduplication with Result Caching

Redis-Based Deduplication (High Throughput)

Window-Based Deduplication

Key Interview Points