Counting things at scale — page views, likes, inventory levels, active users — is surprisingly hard in distributed systems. A single counter becomes a bottleneck when thousands of servers increment it simultaneously. This guide covers distributed counter architectures from simple atomic operations to sharded counters and approximate counting — a focused system design topic that demonstrates deep understanding of concurrency and scale.
The Single Counter Problem
A naive counter in a database (UPDATE posts SET like_count = like_count + 1 WHERE id = 123) creates a hot row. With 10,000 concurrent likes per second on a viral post, all 10,000 transactions contend for the same row lock. Most transactions wait, retry, or timeout. The database becomes a bottleneck for this single row while the rest of the system is idle. Solutions: (1) Redis INCR — atomic increment in memory, 100K+ ops/sec on a single key. Use for real-time counters (view counts, online users). Periodically flush to the database. Risk: data loss if Redis crashes before the flush. (2) Sharded counters — split one counter into N sub-counters (counter_123_shard_0, counter_123_shard_1, …). Each increment goes to a random shard. The total count is SUM of all shards. This distributes the write contention across N keys. With N=10 shards, each shard handles 1/10 of the traffic. (3) Approximate counting — for analytics where exact counts are unnecessary, use HyperLogLog (unique counts) or probabilistic counters. Trade accuracy for performance.
Sharded Counters in Detail
Google Cloud Firestore and DynamoDB both recommend sharded counters for high-throughput writes. Architecture: instead of one document/row for the counter, create N shards. Each shard stores a partial count. Increment: pick a random shard (shard_id = random(0, N-1)), atomically increment that shard. Read: fetch all N shards, sum the values. Write is O(1), read is O(N). For N=10, reading requires 10 fetches — acceptable for most use cases. Dynamic sharding: start with 1 shard. When contention is detected (high retry rates), automatically split into more shards. Monitor the error rate per counter to trigger splits. Tradeoff: more shards = higher write throughput but slower reads and more storage. For counters read frequently (displayed on every page load), cache the aggregated sum with a short TTL (1-5 seconds). The cache serves reads; shards handle writes. DynamoDB example: partition key = counter_id#shard_id. Each shard is a separate item. Atomic update: UpdateExpression SET count = count + 1. Read all shards with a query on the counter_id prefix.
Exact vs Approximate Counting
Not all counts need to be exact. Facebook does not show the exact like count for posts with millions of likes — “1.2M likes” is sufficient. Exact counting: use sharded counters or Redis INCR. Required for: inventory (must not oversell), financial transactions (ledger must balance), and rate limiting (must enforce exact limits). Approximate counting: (1) Lossy counting — increment with probability 1/p. The estimated count is actual_increments * p. With p=100, you track 1% of events. 100x less write traffic. Error: proportional to sqrt(actual_count). (2) HyperLogLog for unique counts — estimate distinct elements (unique visitors, unique IPs) using 12 KB of memory regardless of cardinality. Redis PFADD/PFCOUNT. ~2% error. (3) Count-Min Sketch for frequency — estimate the frequency of each element in a stream. Overestimates but never underestimates. Use for: top-K heavy hitters (most viewed products, most searched queries). Decision: use exact counting when business correctness requires it (inventory, billing). Use approximate counting for analytics, dashboards, and any display where “~1.2M” is acceptable.
Counter Patterns in Production
View counter (YouTube): every video view increments a counter. At YouTube scale (500M+ video views per hour), a single counter per video is insufficient for popular videos. Architecture: batch view events in a Kafka topic. A consumer aggregates counts in memory over a 5-second window and writes a single batch increment to the database. This reduces 10,000 individual writes to 1 batch write. The displayed view count may lag by a few seconds — acceptable. Like counter (Twitter/Instagram): likes are user-visible and need near-real-time updates. Use Redis INCR for the real-time count displayed in the UI. Asynchronously persist to the database. On Redis failure, re-derive the count from the database likes table (COUNT of like records for this post). Inventory counter (e-commerce): must be exact to prevent overselling. Use a database row with SELECT FOR UPDATE or an atomic conditional update: UPDATE inventory SET count = count – 1 WHERE product_id = X AND count > 0. If 0 rows are updated, the item is out of stock. Redis can pre-check availability (fast rejection for out-of-stock) while the database is the authoritative source. Rate limiter counter: see the token bucket pattern — Redis INCR with TTL implements a sliding window counter for rate limiting.