Question 1

Why use 1-minute buckets instead of a true sliding window for trend counting?

Accepted Answer

A true sliding window maintains a timestamp for every engagement event: if 10M events occurred in the past hour, you store 10M timestamps. Memory: 10M × 8 bytes = 80MB per topic. With 100K active topics: 8TB of Redis memory. Impractical. 1-minute buckets trade precision for efficiency: instead of exact event timestamps, store one counter per topic per minute. To get the 1-hour count: sum 60 bucket values (60 GET operations via pipeline). Memory: 1 int per topic per minute × 60 minutes × 100K topics = 100K × 60 × 4 bytes = 24MB. The bucket approach slightly overestimates by including events from the current partially-elapsed minute, and slightly underestimates by losing sub-minute precision — acceptable for trending purposes where approximate ranking is sufficient.

Question 2

How does exponential decay prevent stale topics from trending indefinitely?

Accepted Answer

Without decay, a topic that surged 50 minutes ago still carries its full weight until the bucket exits the window at exactly 60 minutes — then it drops to zero suddenly. With decay weight e^(-lambda * age_minutes): a 30-minute-old event has weight e^(-0.05 × 30) ≈ 0.22 (22% of live weight). A 50-minute-old event has weight e^(-0.05 × 50) ≈ 0.08 (8%). This creates a smooth score decrease as engagement ages, preventing cliff effects at the window boundary. The decay constant (lambda=0.05) is tunable: smaller lambda = slower decay (topics stay trending longer), larger lambda = faster decay (only recent activity matters). Twitter uses a similar half-life decay for its trending algorithm.

Question 3

How does the Count-Min Sketch solve the heavy hitters problem for trending?

Accepted Answer

The candidate selection problem: from all hashtags in the universe, which ones should you compute scores for? You can't score all possible hashtags. Count-Min Sketch maintains a 2D array of counters (width × depth, e.g., 1000 × 5 = 5000 counters). For each incoming event, hash the hashtag using depth different hash functions and increment the corresponding counter in each row. To estimate count: take the minimum across all depth rows for this hashtag. The minimum gives an upper-bound estimate — it overestimates but never underestimates. After processing N events, the top-K heavy hitters (hashtags with estimated count ≥ N/K) are the candidates. This is O(1) space regardless of hashtag cardinality and O(depth) per update — suitable for streaming at millions of events/second.

Question 4

How do you prevent a bot network from artificially trending a hashtag?

Accepted Answer

Bot signals: (1) burst pattern — 100K events in 10 seconds from newly created accounts (natural trending is gradual); (2) account age — accounts created in the last 7 days have lower weight per event (multiply by 0.1); (3) IP concentration — 1000 events from the same /24 CIDR block; (4) identical post text — many accounts posting the exact same content. Mitigations: rate limit engagement per user per topic (one increment per 5 minutes); apply account age weighting before incrementing buckets; detect velocity anomalies (if score triples in 1 minute, flag for review rather than surfacing immediately); maintain a human-reviewed blocklist of abused hashtags. These signals compound — a topic is flagged only when multiple signals trigger simultaneously, reducing false positives.

Question 5

How do you display trending topics differently across geographic regions?

Accepted Answer

Global trending surfaces topics popular worldwide, which may be irrelevant to users in specific regions. Regional trending: scope the engagement events by user location (IP geolocation or user profile location). Maintain separate Redis bucket keys per region: trend:US:{topic}:{bucket}, trend:JP:{topic}:{bucket}. The TopicScorer runs per region, computing top-K for each. Store regional rankings: trending:ranked:US, trending:ranked:JP. API: GET /trending?region=US. Location detection: use IP geolocation for unregistered users; use profile location for registered users. Coverage: for regions with insufficient data (trend buckets have < 1000 total events), fall back to global trending. This is how Twitter's "Trends for you" works — city-level trending falls back to country, then global.

Trending Topics Low-Level Design: Sliding Window Counts, Time Decay, and Top-K Computation

Core Data Model

Redis Sliding Window Counter

Top-K Computation with Time Decay

Key Interview Points