Low Level Design: Kafka Advanced Internals

⏱ 3 min read

Apache Kafka is a distributed event streaming platform built around a partitioned, replicated, append-only log. Understanding Kafka internals — partitioning strategies, replication, consumer group coordination, and offset management — is essential for designing high-throughput, fault-tolerant event-driven systems and for reasoning about guarantees like exactly-once delivery.

Partition and Replication

Each Kafka topic is divided into partitions. Each partition is an ordered, immutable log stored on disk. Each partition has one leader (handles reads and writes) and N-1 followers (replicas). The leader tracks which replicas are fully caught up — the In-Sync Replicas (ISR) set. A produce request is acknowledged only after all ISR members have written the message (when acks=all). If a replica falls behind the ISR threshold, it is removed from the ISR until it catches up. The number of ISR members determines the minimum number of replicas required before losing data.

Partitioning Strategy

The partition key determines which partition a message goes to. Same key always goes to the same partition (consistent hash), guaranteeing ordering for all messages with the same key. Choose partition keys based on what ordering you need: user_id ensures all events for a user are ordered; order_id ensures all order events are ordered. Null key uses round-robin (good for load distribution, no ordering guarantee). Hot keys (one user generating 90% of traffic) cause partition hotspots — split by combining key with a random suffix and using compaction or sticky partitioning.

Consumer Group Rebalancing

A consumer group assigns each partition to exactly one consumer. When a consumer joins or leaves, the group coordinator (a Kafka broker) triggers a rebalance: pause all consumers, reassign partitions, resume. During rebalancing, consumption stops — the “stop-the-world” rebalance. Cooperative rebalancing (Kafka 2.4+) rebalances incrementally: only partitions being moved are paused, not the entire group. Static group membership (group.instance.id) prevents rebalances on rolling restarts: the broker waits for the consumer to reconnect before reassigning its partitions.

Offset Management

Consumers track which messages have been processed by committing their offset to the __consumer_offsets topic. Two commit modes: auto-commit (commits periodically, risks reprocessing on crash) and manual commit (commit only after processing succeeds). Commit after processing, not before — committing before processing risks losing messages on crash. At-least-once delivery: commit after the message is fully processed. Exactly-once: use Kafka transactions — consume messages, process, and commit the output and the offset in the same atomic transaction.

Kafka Transactions

Kafka transactions enable atomic multi-partition writes: write output to multiple partitions and commit the consumer offset, all atomically. If the producer crashes mid-transaction, the partial writes are rolled back. Consumers reading in READ_COMMITTED isolation mode see only committed messages. The transactional producer requires a transactional.id; the broker maintains producer epoch to prevent zombie writers from completing stale transactions. Transactions add ~10ms latency but provide exactly-once semantics end-to-end.

Log Compaction

Log compaction retains only the most recent message per key, discarding older messages with the same key. This provides a changelog-style topic: consumers can rebuild the latest state by reading from the beginning (like materializing a table from its event log). Use cases: database change streams (CDC), configuration topics, user preference updates. Compaction does not delete messages with null values immediately (tombstones are retained for a configurable period to allow consumers to process the deletion). Compacted topics have no time-based retention — data is retained until superseded by a newer value for the same key.

KRaft Mode (ZooKeeper Removal)

Traditional Kafka uses ZooKeeper for metadata management (topic configs, ISR tracking, controller election). KRaft mode (Kafka 2.8+, production-ready in 3.3+) replaces ZooKeeper with an internal Raft-based metadata quorum. The controller role is now managed by Kafka brokers themselves using Raft consensus. Benefits: simpler operations (one system instead of two), faster controller failover (seconds instead of tens of seconds), support for 10x more partitions per cluster. New Kafka deployments should use KRaft; ZooKeeper mode is deprecated.