System Design: Distributed Counter — Atomic Operations, Sharded Counters, Approximate Counting, Redis INCR

Counting things at scale — page views, likes, inventory levels, active users — is surprisingly hard in distributed systems. A single counter becomes a bottleneck when thousands of servers increment it simultaneously. This guide covers distributed counter architectures from simple atomic operations to sharded counters and approximate counting — a focused system design topic that demonstrates deep understanding of concurrency and scale.

The Single Counter Problem

A naive counter in a database (UPDATE posts SET like_count = like_count + 1 WHERE id = 123) creates a hot row. With 10,000 concurrent likes per second on a viral post, all 10,000 transactions contend for the same row lock. Most transactions wait, retry, or timeout. The database becomes a bottleneck for this single row while the rest of the system is idle. Solutions: (1) Redis INCR — atomic increment in memory, 100K+ ops/sec on a single key. Use for real-time counters (view counts, online users). Periodically flush to the database. Risk: data loss if Redis crashes before the flush. (2) Sharded counters — split one counter into N sub-counters (counter_123_shard_0, counter_123_shard_1, …). Each increment goes to a random shard. The total count is SUM of all shards. This distributes the write contention across N keys. With N=10 shards, each shard handles 1/10 of the traffic. (3) Approximate counting — for analytics where exact counts are unnecessary, use HyperLogLog (unique counts) or probabilistic counters. Trade accuracy for performance.

Sharded Counters in Detail

Google Cloud Firestore and DynamoDB both recommend sharded counters for high-throughput writes. Architecture: instead of one document/row for the counter, create N shards. Each shard stores a partial count. Increment: pick a random shard (shard_id = random(0, N-1)), atomically increment that shard. Read: fetch all N shards, sum the values. Write is O(1), read is O(N). For N=10, reading requires 10 fetches — acceptable for most use cases. Dynamic sharding: start with 1 shard. When contention is detected (high retry rates), automatically split into more shards. Monitor the error rate per counter to trigger splits. Tradeoff: more shards = higher write throughput but slower reads and more storage. For counters read frequently (displayed on every page load), cache the aggregated sum with a short TTL (1-5 seconds). The cache serves reads; shards handle writes. DynamoDB example: partition key = counter_id#shard_id. Each shard is a separate item. Atomic update: UpdateExpression SET count = count + 1. Read all shards with a query on the counter_id prefix.

Exact vs Approximate Counting

Not all counts need to be exact. Facebook does not show the exact like count for posts with millions of likes — “1.2M likes” is sufficient. Exact counting: use sharded counters or Redis INCR. Required for: inventory (must not oversell), financial transactions (ledger must balance), and rate limiting (must enforce exact limits). Approximate counting: (1) Lossy counting — increment with probability 1/p. The estimated count is actual_increments * p. With p=100, you track 1% of events. 100x less write traffic. Error: proportional to sqrt(actual_count). (2) HyperLogLog for unique counts — estimate distinct elements (unique visitors, unique IPs) using 12 KB of memory regardless of cardinality. Redis PFADD/PFCOUNT. ~2% error. (3) Count-Min Sketch for frequency — estimate the frequency of each element in a stream. Overestimates but never underestimates. Use for: top-K heavy hitters (most viewed products, most searched queries). Decision: use exact counting when business correctness requires it (inventory, billing). Use approximate counting for analytics, dashboards, and any display where “~1.2M” is acceptable.

Counter Patterns in Production

View counter (YouTube): every video view increments a counter. At YouTube scale (500M+ video views per hour), a single counter per video is insufficient for popular videos. Architecture: batch view events in a Kafka topic. A consumer aggregates counts in memory over a 5-second window and writes a single batch increment to the database. This reduces 10,000 individual writes to 1 batch write. The displayed view count may lag by a few seconds — acceptable. Like counter (Twitter/Instagram): likes are user-visible and need near-real-time updates. Use Redis INCR for the real-time count displayed in the UI. Asynchronously persist to the database. On Redis failure, re-derive the count from the database likes table (COUNT of like records for this post). Inventory counter (e-commerce): must be exact to prevent overselling. Use a database row with SELECT FOR UPDATE or an atomic conditional update: UPDATE inventory SET count = count – 1 WHERE product_id = X AND count > 0. If 0 rows are updated, the item is out of stock. Redis can pre-check availability (fast rejection for out-of-stock) while the database is the authoritative source. Rate limiter counter: see the token bucket pattern — Redis INCR with TTL implements a sliding window counter for rate limiting.

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”How do sharded counters solve the hot-key write bottleneck?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”A single counter row creates a hot key when thousands of concurrent writes contend for the same lock. Sharded counters split one logical counter into N physical sub-counters (counter_123_shard_0 through counter_123_shard_9). Each increment targets a random shard, distributing lock contention across N keys. The total count is the SUM of all shards. With 10 shards, each handles 1/10 of the traffic. Write is O(1), read is O(N). For high-read counters, cache the aggregated sum with a 1-5 second TTL. Google Firestore and DynamoDB both recommend this pattern for high-throughput counters. DynamoDB implementation: partition key = counter_id#shard_id, atomic UpdateExpression SET count = count + 1.”}},{“@type”:”Question”,”name”:”When should you use exact versus approximate counting?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Use exact counting when business correctness requires it: inventory (must not oversell), financial transactions (ledger must balance), and rate limiting (must enforce exact limits). Use approximate counting when slight inaccuracy is acceptable: analytics dashboards (1.2M likes is fine), unique visitor counts (HyperLogLog with ~2% error uses only 12 KB), and top-K heavy hitters (Count-Min Sketch overestimates but never misses popular items). Approximate counting saves 10-1000x in compute and storage. The key question: does showing 1,203,847 likes versus ~1.2M likes matter to the user? For social media engagement displays, the answer is almost always no.”}},{“@type”:”Question”,”name”:”How does YouTube count billions of video views efficiently?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”YouTube cannot increment a database counter for every individual view — popular videos get thousands of views per second. Architecture: view events are published to Kafka. A consumer aggregates counts in memory over a 5-second window and writes a single batch increment to the database. This reduces 10,000 individual writes to 1 batch write. The displayed count may lag by a few seconds — acceptable for view counts. For the real-time display, Redis INCR provides an in-memory counter that updates instantly. The database counter is the authoritative source; Redis is the fast display layer. If Redis loses data, re-derive from the database.”}},{“@type”:”Question”,”name”:”How do you implement an inventory counter that prevents overselling?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Inventory must be exact — overselling costs money and trust. Use a conditional atomic update: UPDATE inventory SET count = count – 1 WHERE product_id = X AND count > 0. If 0 rows are updated, the item is out of stock. This is atomic at the database level — no race condition between reading and writing. For high-throughput: use Redis as a fast pre-check. Decrement in Redis first (DECR returns the new value — if negative, reject immediately and restore). Only hit the database for requests that pass the Redis check. This filters out 99% of out-of-stock requests at the Redis layer, protecting the database from contention on popular items during flash sales.”}}]}
Scroll to Top