Event Deduplication System Low-Level Design

Event Deduplication System — Low-Level Design

An event deduplication system ensures that duplicate events from at-least-once delivery queues are processed exactly once. It is critical for payment processing, inventory updates, and any operation that must not be applied twice. This design applies to Kafka, SQS, and any message queue used at Amazon, Stripe, and Uber.

The Problem: At-Least-Once Delivery

Most message queues guarantee at-least-once delivery:
  - Kafka: a consumer crash before committing offset replays messages
  - SQS: if a consumer doesn't delete the message within visibility timeout, it redelivers
  - Webhooks: provider retries on timeout or non-2xx response

Result: consumer code may receive the same event_id multiple times.
Any non-idempotent operation (increment counter, debit account, send email)
applied twice causes bugs.

Core Data Model

ProcessedEvent
  event_id        TEXT NOT NULL       -- event's unique identifier
  event_type      TEXT NOT NULL       -- 'order.created', 'payment.charged'
  processor_id    TEXT NOT NULL       -- which consumer group processed it
  processed_at    TIMESTAMPTZ NOT NULL
  result          JSONB               -- optional: store result for replay
  PRIMARY KEY (event_id, processor_id)

-- Partition by month to enable fast cleanup of old records
CREATE TABLE processed_events_2024_01
  PARTITION OF ProcessedEvent
  FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

Basic Deduplication with INSERT ON CONFLICT

def process_event_idempotent(event):
    # Attempt to mark the event as processed
    rows_inserted = db.execute("""
        INSERT INTO ProcessedEvent (event_id, event_type, processor_id, processed_at)
        VALUES (%(eid)s, %(etype)s, %(pid)s, NOW())
        ON CONFLICT (event_id, processor_id) DO NOTHING
    """, {'eid': event.id, 'etype': event.type, 'pid': PROCESSOR_ID}).rowcount

    if rows_inserted == 0:
        # Already processed — skip
        log.info(f'Duplicate event {event.id}, skipping')
        return

    # Process exactly once
    handle_event(event)

Deduplication with Result Caching

# Some consumers need to return the same result on duplicate delivery
# (e.g., payment charge: second call should return the original charge ID)

def process_with_result(event):
    # Check if already processed and return cached result
    existing = db.execute("""
        SELECT result FROM ProcessedEvent
        WHERE event_id=%(eid)s AND processor_id=%(pid)s
    """, {'eid': event.id, 'pid': PROCESSOR_ID}).first()

    if existing:
        return existing.result  # Replay the original result

    # Execute the operation
    result = handle_event(event)

    # Store result atomically
    db.execute("""
        INSERT INTO ProcessedEvent (event_id, event_type, processor_id, processed_at, result)
        VALUES (%(eid)s, %(etype)s, %(pid)s, NOW(), %(result)s)
        ON CONFLICT (event_id, processor_id) DO NOTHING
    """, {'eid': event.id, 'etype': event.type, 'pid': PROCESSOR_ID,
          'result': json.dumps(result)})

    return result

Redis-Based Deduplication (High Throughput)

# For very high event rates, DB writes per event are expensive
# Redis SET NX (set if not exists) is atomic and ~10x faster

def deduplicate_with_redis(event_id, processor_id, ttl_seconds=86400):
    """
    Returns True if this is the first time we've seen this event.
    Returns False if it's a duplicate.
    """
    key = f'processed:{processor_id}:{event_id}'
    # SET key value NX EX ttl — atomic, returns True on first set
    is_new = redis.set(key, '1', nx=True, ex=ttl_seconds)
    return is_new is not None  # None = key already existed

def process_event(event):
    if not deduplicate_with_redis(event.id, PROCESSOR_ID):
        return  # Duplicate

    handle_event(event)
    # For critical events: also write to ProcessedEvent DB table for durability
    # Redis key may be evicted under memory pressure

Window-Based Deduplication

# For events with event_id derived from content (not a UUID),
# deduplicate within a time window to handle near-duplicate events

def deduplicate_by_fingerprint(event):
    # Fingerprint: hash of event content + time bucket
    time_bucket = int(event.timestamp / 300) * 300  # 5-minute buckets
    fingerprint = sha256(f'{event.user_id}:{event.action}:{time_bucket}')

    key = f'dedup:fp:{fingerprint}'
    is_new = redis.set(key, event.id, nx=True, ex=600)

    if not is_new:
        log.info(f'Deduped event: fingerprint {fingerprint} seen in this window')
        return False
    return True

Key Interview Points

INSERT ON CONFLICT DO NOTHING is the canonical pattern: Atomic at the DB level. No race condition between two concurrent workers processing the same event — one wins the INSERT, the other gets rowcount=0.
Event ID must come from the producer: The consumer should not generate the deduplication key. The producer assigns a stable event_id before publishing so retries use the same ID.
TTL for cleanup: ProcessedEvent records can be dropped after N days (events older than N days will never be redelivered). Use table partitioning by month and DROP PARTITION for instant cleanup without slow DELETE operations.
Redis is fast but not durable: Redis keys can be evicted under memory pressure. For financial events, use DB-backed deduplication. For high-throughput analytics events where duplicates are acceptable, Redis alone is sufficient.

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”Why do message queues deliver events at-least-once instead of exactly-once?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Exactly-once delivery requires distributed consensus between the message broker and all consumers — the broker must guarantee the consumer received AND processed the message, and the consumer must confirm it processed exactly once. This is expensive: it requires two-phase commit or equivalent. Kafka, SQS, and most brokers instead guarantee at-least-once: the message will be delivered, but may be delivered more than once if the consumer crashes after processing but before acknowledging. Designing consumers to be idempotent is far simpler and more scalable than requiring exactly-once delivery from the broker.”}},{“@type”:”Question”,”name”:”How does INSERT ON CONFLICT DO NOTHING prevent duplicate event processing?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Store a ProcessedEvent record keyed by (event_id, processor_id). Before processing, attempt INSERT INTO ProcessedEvent … ON CONFLICT DO NOTHING. If the INSERT succeeds (rowcount=1), this is the first time seeing this event — proceed with processing. If the INSERT is a no-op (rowcount=0), the event was already processed — skip it. This is atomic: two concurrent workers processing the same event simultaneously will both attempt the INSERT, but only one will succeed. The database PRIMARY KEY constraint enforces mutual exclusion at the storage layer.”}},{“@type”:”Question”,”name”:”When should you use Redis SET NX for deduplication versus a database table?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Redis SET NX (set if not exists) is atomic, ~10x faster than a DB write, and works well for high-throughput deduplication where some duplicates are acceptable. Use Redis when: event rate is very high (>10K/sec), strict durability is not required, and TTL-based cleanup is acceptable. Use a database ProcessedEvent table when: events are financially sensitive (payments, inventory), durability is required (Redis can lose data on restart without AOF/RDB persistence), or you need a queryable audit log of processed events. For payments: DB. For analytics event dedup: Redis.”}},{“@type”:”Question”,”name”:”How do you clean up old deduplication records efficiently?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Use table partitioning by time: CREATE TABLE processed_events_2024_01 PARTITION OF ProcessedEvent FOR VALUES FROM (‘2024-01-01’) TO (‘2024-02-01’). At the start of each month, DROP the oldest partition. Dropping a partition is a metadata operation — it takes milliseconds regardless of row count. Contrast with DELETE WHERE processed_at < ‘2024-01-01’: on a table with 100M rows, this DELETE runs for hours and generates massive WAL writes, impacting production query performance. Always use partitioning for append-only tables with time-based retention.”}},{“@type”:”Question”,”name”:”How do you generate stable event IDs for deduplication?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”The producer must assign event IDs before publishing — the consumer should never generate the deduplication key. Best practices: (1) UUID v4 for individual user actions (a button click generates one unique event). (2) Deterministic ID for computed events: SHA-256(entity_id + event_type + timestamp_bucket) ensures the same logical event always produces the same ID. (3) Outbox pattern: INSERT the event into an Outbox table within the same transaction as the business logic change. The outbox row’s primary key becomes the event ID. This guarantees the event is published exactly once and idempotently replayable.”}}]}

Event deduplication and message queue design is discussed in Amazon system design interview questions.

Event deduplication and payment processing design is covered in Stripe system design interview preparation.

Event deduplication and distributed system design is discussed in Uber system design interview guide.