System Design Interview: Design a Leaderboard / Top-K System

System Design: Top-K / Leaderboard (Heavy Hitters)

Top-K problems appear in many real-world systems: game leaderboards, trending hashtags, most-viewed videos, top search queries, most-active users. The challenge at scale: millions of score updates per second, millions of users querying the leaderboard, and requirements for both real-time and accurate rankings.

Requirements

Functional: Update a user’s score. Get global rank for a user. Get top-K users by score. Get users ranked around a specific user (neighbor ranks). Support millions of users.

Non-functional: Score updates in <10ms, leaderboard query in <50ms, 100K score updates/sec, 50K leaderboard reads/sec, real-time or near-real-time rankings.

Solution 1: Redis Sorted Sets (Recommended)

Redis Sorted Sets are the industry-standard solution for leaderboards. Each member has a floating-point score; members are ordered by score. All operations are O(log N).

import redis

r = redis.Redis(host='localhost', port=6379)
LEADERBOARD_KEY = "game:leaderboard"

def update_score(user_id: str, score: float) -> None:
    """Set user's score (ZADD replaces existing score)."""
    r.zadd(LEADERBOARD_KEY, {user_id: score})

def increment_score(user_id: str, delta: float) -> float:
    """Increment user's score by delta. Returns new score."""
    return r.zincrby(LEADERBOARD_KEY, delta, user_id)

def get_rank(user_id: str) -> int:
    """Get 0-indexed rank (0 = highest). Returns None if not in leaderboard."""
    rank = r.zrevrank(LEADERBOARD_KEY, user_id)  # ZREVRANK: higher score = lower rank index
    return rank + 1 if rank is not None else None  # 1-indexed for display

def get_top_k(k: int) -> list[tuple]:
    """Get top K users with scores. Returns [(user_id, score), ...]"""
    return r.zrevrange(LEADERBOARD_KEY, 0, k-1, withscores=True)

def get_neighbors(user_id: str, n: int = 5) -> list[tuple]:
    """Get n users above and below user_id in the ranking."""
    rank = r.zrevrank(LEADERBOARD_KEY, user_id)
    if rank is None:
        return []
    start = max(0, rank - n)
    end = rank + n
    return r.zrevrange(LEADERBOARD_KEY, start, end, withscores=True)

Redis Sorted Set internals: Implemented as a skip list + hash table. Skip list provides O(log N) rank/range queries. Hash table provides O(1) score lookup by member. ZADD, ZRANK, ZINCRBY, ZRANGE are all O(log N). Stores ~50 bytes per member.

Capacity: 100M users × 50 bytes = 5 GB — fits in a single Redis instance. For 1B users: shard by user_id prefix across multiple Redis instances. Each instance handles a subset of users; global rank requires aggregating across shards.

Solution 2: Database + Rank Computation

For moderate scale (<10M users), a database approach works:

-- Get user's rank (PostgreSQL)
SELECT COUNT(*) + 1 AS rank
FROM leaderboard
WHERE score > (SELECT score FROM leaderboard WHERE user_id = $1);

-- Top K
SELECT user_id, score, RANK() OVER (ORDER BY score DESC) AS rank
FROM leaderboard
ORDER BY score DESC
LIMIT $1;

Pros: exact ranking, full SQL expressiveness (filter by country, time range). Cons: rank query is O(N) without partial index. Optimize with: CREATE INDEX idx_score ON leaderboard(score DESC). For “rank of user X”: index scan until score < X’s score — O(rank) not O(N) with a good query planner.

Solution 3: Count-Min Sketch (Approximate Heavy Hitters)

When you don’t need exact rankings and the key space is huge (all URLs ever visited, all search queries), use a Count-Min Sketch — a probabilistic data structure that estimates frequency with bounded error.

import hashlib

class CountMinSketch:
    def __init__(self, width: int = 1000, depth: int = 5):
        self.width = width
        self.depth = depth
        self.table = [[0] * width for _ in range(depth)]
        self.seeds = list(range(depth))

    def _hash(self, item: str, seed: int) -> int:
        h = hashlib.md5(f"{seed}:{item}".encode()).hexdigest()
        return int(h, 16) % self.width

    def increment(self, item: str) -> None:
        for i in range(self.depth):
            self.table[i][self._hash(item, i)] += 1

    def estimate(self, item: str) -> int:
        """Returns minimum count across all hash functions — least overcount."""
        return min(self.table[i][self._hash(item, i)] for i in range(self.depth))

Space: O(width * depth) regardless of number of distinct items. Error: with width=1000, depth=5, estimates are within 0.1% of true count with 99.9% probability. Used in: Twitter trending topics, Apache Kafka stream processing, network traffic analysis.

Sharded Leaderboard for Global Scale

def get_global_top_k(k: int, num_shards: int = 10) -> list[tuple]:
    """Aggregate top-K from all shards."""
    # Fan out to all shards in parallel
    import concurrent.futures
    with concurrent.futures.ThreadPoolExecutor() as pool:
        futures = [
            pool.submit(r_shard[i].zrevrange, LEADERBOARD_KEY, 0, k-1, withscores=True)
            for i in range(num_shards)
        ]
        all_results = [f.result() for f in futures]

    # Merge: combine all candidates, take global top-K
    import heapq
    all_entries = [item for shard_result in all_results for item in shard_result]
    top_k = heapq.nlargest(k, all_entries, key=lambda x: x[1])   # sort by score
    return top_k

Each shard fan-out returns its local top-K; the coordinator merges N*K entries and returns the global top-K. Correct because the global top-K must appear in at least one shard’s top-K list.

Interview Tips

  • Redis Sorted Sets are the canonical answer — mention ZADD, ZREVRANK, ZINCRBY, ZREVRANGE.
  • Separate score ingestion (high write throughput) from rank queries (read-heavy) — they have different scaling properties.
  • Time-windowed leaderboard: use separate sorted sets per time window (daily, weekly). On score update, update all relevant windows. Expire old windows with Redis TTL.
  • Ties: Redis ZREVRANK returns 0-indexed rank; two users with equal scores get adjacent ranks (not equal). For tie-handling: use composite score = score * 10^10 + (MAX_INT – join_timestamp) to ensure earlier joiners rank higher on ties.

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”How do you design a real-time leaderboard using Redis sorted sets?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Redis sorted sets (ZSETs) are the canonical data structure for leaderboards. Each member (player ID) maps to a floating-point score. Key operations: ZADD leaderboard {score} {player_id} to add/update; ZINCRBY leaderboard {delta} {player_id} to increment atomically; ZREVRANK leaderboard {player_id} to get rank (0-indexed, highest score = rank 0); ZREVRANGE leaderboard 0 9 WITHSCORES to get the top-10 with scores. All operations are O(log N) where N is the number of members. For millions of players, a single Redis ZSET handles this efficiently — Redis sorted sets use a skip list internally. For tie-breaking, encode as a composite score: composite = (score * 1e9) + (1e9 – timestamp_seconds), so equal-score players are ranked by who achieved the score first. At write scale (thousands of score updates per second), use Redis pipelining to batch ZINCRBY calls, or Lua scripts for atomic read-modify-write sequences.”}},{“@type”:”Question”,”name”:”How do you implement a time-windowed leaderboard (daily, weekly)?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Use separate Redis ZSETs per time window with TTL-based expiry. Key scheme: leaderboard:daily:{YYYYMMDD}, leaderboard:weekly:{YYYY-W{week}}, leaderboard:all_time. On each score event, update all relevant keys: ZINCRBY leaderboard:daily:{today} {delta} {player_id}, ZINCRBY leaderboard:weekly:{week} {delta} {player_id}, ZINCRBY leaderboard:all_time {delta} {player_id}. Set TTL on time-windowed keys: EXPIRE leaderboard:daily:{yesterday} 604800 (keep 7 days for historical queries). For exact daily resets (midnight UTC), use a scheduled job that creates the new day key rather than relying on TTL, which only removes the key after it expires. At query time, ZREVRANGE leaderboard:daily:{today} 0 99 gives the top-100 for today. Fan-out writes (one event → multiple ZSET updates) are cheap since Redis is in-memory — all 3 ZINCRBY calls complete in microseconds.”}},{“@type”:”Question”,”name”:”What is a Count-Min Sketch and when do you use it for Top-K problems?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”A Count-Min Sketch (CMS) is a probabilistic data structure for frequency estimation using O(width * depth) space regardless of the number of distinct elements. Structure: a 2D array of width W and depth D counters, plus D independent hash functions. To increment item x: for each row i, increment counters[i][hash_i(x) % W]. To estimate frequency of x: return min(counters[i][hash_i(x) % W]) across all rows — the minimum reduces the impact of hash collisions. CMS overestimates but never underestimates. Typical parameters: width=1000, depth=5 gives ~1% error rate with 0.1% probability. Use CMS when: (1) the number of distinct items is too large to store exactly (billions of URLs, search queries); (2) you only need approximate heavy hitters, not exact counts. For Top-K with CMS: maintain a min-heap of size K alongside the CMS. On each increment, update CMS, then check if the estimated count exceeds the heap minimum — if so, push to heap and pop the smallest. This runs in O(D * log K) per event versus O(distinct_items) for exact counting.”}}]}

🏢 Asked at: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Content Delivery

🏢 Asked at: Snap Interview Guide

🏢 Asked at: Twitter / X Interview Guide 2026: Real-Time Systems, Feed Architecture, and Platform Engineering

🏢 Asked at: Databricks Interview Guide 2026: Spark Internals, Delta Lake, and Lakehouse Architecture

🏢 Asked at: Coinbase Interview Guide

Scroll to Top