A trending topics system surfaces content gaining rapid engagement — the top hashtags on Twitter, trending searches on Google, or rising products on Amazon. Core challenges: computing approximate top-K over a sliding time window with low latency, applying time decay so older engagement counts less, filtering spam and bot manipulation, and refreshing the list frequently without expensive full scans.
Core Data Model
-- Raw engagement events (write path: Kafka → Flink/Spark)
-- Not stored long-term in Postgres; processed into aggregates
-- Pre-computed trending scores (updated every 5 minutes)
CREATE TABLE TrendingItem (
topic TEXT NOT NULL,
category TEXT NOT NULL DEFAULT 'hashtag', -- 'hashtag','search','product'
score NUMERIC(12,4) NOT NULL,
velocity NUMERIC(12,4) NOT NULL, -- rate of change (acceleration)
rank INT NOT NULL,
window_start TIMESTAMPTZ NOT NULL, -- e.g. "last 1 hour"
computed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
PRIMARY KEY (topic, category, window_start)
);
CREATE INDEX idx_trending_rank ON TrendingItem (category, rank) WHERE window_start = (SELECT MAX(window_start) FROM TrendingItem);
-- Sliding window counts (Redis-based, not Postgres)
-- Key: trend:{category}:{topic}:{bucket} (1-minute buckets)
-- Value: integer count
-- TTL: 2 hours (window size + buffer)
Redis Sliding Window Counter
import time, redis
from datetime import datetime, timezone
r = redis.Redis(host='redis', decode_responses=True)
WINDOW_SECONDS = 3600 # 1-hour trending window
BUCKET_SECONDS = 60 # 1-minute buckets
NUM_BUCKETS = WINDOW_SECONDS // BUCKET_SECONDS # 60 buckets
def record_engagement(category: str, topic: str):
"""
Increment the current time bucket for this topic.
Uses 1-minute buckets to approximate a sliding window.
"""
topic_normalized = topic.lower().strip().lstrip('#')
now = int(time.time())
bucket = (now // BUCKET_SECONDS) * BUCKET_SECONDS # floor to bucket boundary
key = f"trend:{category}:{topic_normalized}:{bucket}"
r.incr(key)
r.expire(key, WINDOW_SECONDS + BUCKET_SECONDS) # TTL = window + one extra bucket
def get_topic_count(category: str, topic: str) -> int:
"""
Sum all bucket counts within the sliding window.
Approximation: uses fixed 1-minute buckets — true sliding window would require
per-event timestamps but is 10-100x more expensive.
"""
topic_normalized = topic.lower().strip().lstrip('#')
now = int(time.time())
total = 0
pipeline = r.pipeline()
buckets = []
for i in range(NUM_BUCKETS):
bucket = ((now - i * BUCKET_SECONDS) // BUCKET_SECONDS) * BUCKET_SECONDS
key = f"trend:{category}:{topic_normalized}:{bucket}"
pipeline.get(key)
buckets.append(key)
results = pipeline.execute()
total = sum(int(v) for v in results if v)
return total
Top-K Computation with Time Decay
def compute_trending_scores(category: str, candidate_topics: list[str]) -> list[dict]:
"""
Score each candidate topic using Wilson score or simple decay-weighted count.
Decay: recent engagement counts more than older engagement.
"""
now = int(time.time())
results = []
for topic in candidate_topics:
topic_normalized = topic.lower().strip().lstrip('#')
# Fetch all bucket counts in the window
pipeline = r.pipeline()
bucket_counts = []
for i in range(NUM_BUCKETS):
bucket_start = ((now - i * BUCKET_SECONDS) // BUCKET_SECONDS) * BUCKET_SECONDS
key = f"trend:{category}:{topic_normalized}:{bucket_start}"
pipeline.get(key)
bucket_counts.append((bucket_start, key))
raw_counts = pipeline.execute()
# Apply exponential decay: older buckets get lower weight
# Weight = e^(-lambda * age_in_minutes), lambda = 0.05
import math
DECAY_LAMBDA = 0.05
score = 0.0
total_count = 0
for (bucket_start, _), count_str in zip(bucket_counts, raw_counts):
count = int(count_str) if count_str else 0
age_minutes = (now - bucket_start) / 60
weight = math.exp(-DECAY_LAMBDA * age_minutes)
score += count * weight
total_count += count
if total_count > 10: # minimum engagement threshold to avoid spam
results.append({"topic": topic, "score": score,
"total_count": total_count, "category": category})
# Sort by score descending, return top-K
results.sort(key=lambda x: x["score"], reverse=True)
return results[:50] # top 50 candidates
def refresh_trending(conn, category: str):
"""
Periodic job (every 5 minutes): compute top trending topics
and update TrendingItem table + Redis cache.
"""
# Get active topics from Redis SCAN (find all trend:{category}:* keys)
active_topics = set()
cursor = 0
while True:
cursor, keys = r.scan(cursor, match=f"trend:{category}:*", count=1000)
for key in keys:
parts = key.split(":")
if len(parts) >= 3:
active_topics.add(parts[2]) # topic name
if cursor == 0:
break
trending = compute_trending_scores(category, list(active_topics))
# Write to Postgres and Redis sorted set
window_start = datetime.now(timezone.utc).replace(second=0, microsecond=0)
pipeline = r.pipeline()
pipeline.delete(f"trending:ranked:{category}") # clear old rankings
with conn.cursor() as cur:
for rank, item in enumerate(trending[:20], start=1):
cur.execute("""
INSERT INTO TrendingItem (topic, category, score, velocity, rank, window_start)
VALUES (%s, %s, %s, 0, %s, %s)
ON CONFLICT (topic, category, window_start) DO UPDATE
SET score=EXCLUDED.score, rank=EXCLUDED.rank, computed_at=NOW()
""", (item['topic'], category, item['score'], rank, window_start))
pipeline.zadd(f"trending:ranked:{category}", {item['topic']: item['score']})
conn.commit()
pipeline.expire(f"trending:ranked:{category}", 300) # 5-minute TTL
pipeline.execute()
Key Interview Points
- Approximate top-K vs exact: Exact top-K over a 1-hour window requires storing all engagement events — at 100M events/hour, this is a massive dataset. Approximate top-K with Count-Min Sketch reduces memory to O(width × depth) (e.g., 1,000 × 5 = 5,000 counters). Count-Min Sketch overestimates but never underestimates — acceptable for trending. Redis sorted sets hold top-N explicitly; the question is how to populate them without scanning billions of events.
- Heavy hitters problem: The candidate set (which topics to score) is itself a top-K problem. Use the Space Saving algorithm or Misra-Gries to maintain a list of probable heavy hitters with guaranteed error bounds. Alternative: use Kafka partitioned by topic — a consumer per partition maintains a local top-K, and a reducer merges partition results every 5 minutes (Flink pattern).
- Time decay prevents stale trends: A topic that trended 3 hours ago should not still appear in the top-10. Exponential decay (e^(-lambda * age)) smoothly reduces the influence of older buckets. Lambda controls decay speed: lambda=0.05 → a 60-minute-old event has weight e^(-3) ≈ 0.05 (5% of live weight). Alternatively, use a 1-hour fixed window with hard cutoff — simpler but creates step-function jumps as events age out.
- Spam and bot manipulation: Bots can inflate topics by generating fake engagement. Mitigations: (1) rate limit: at most 1 engagement per (user_id, topic) per 5 minutes; (2) new account penalty: engagement from accounts < 7 days old counts as 0.1 votes; (3) velocity check: a topic that goes from 0 to 10M events in 1 minute is likely spam — cap score increase to 10× per refresh cycle; (4) content moderation blocklist: filtered topics never appear in trending regardless of score.
- Personalized trending: Global trending may show topics irrelevant to a user. Personalize by weighting topics by the user’s interest graph: topics popular among the user’s followees get a 2× boost. Compute personalized scores lazily on feed load rather than pre-computing for all users — only fetch if the user requests the trending widget. Cache per user for 5 minutes in Redis.
{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”Why use 1-minute buckets instead of a true sliding window for trend counting?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”A true sliding window maintains a timestamp for every engagement event: if 10M events occurred in the past hour, you store 10M timestamps. Memory: 10M × 8 bytes = 80MB per topic. With 100K active topics: 8TB of Redis memory. Impractical. 1-minute buckets trade precision for efficiency: instead of exact event timestamps, store one counter per topic per minute. To get the 1-hour count: sum 60 bucket values (60 GET operations via pipeline). Memory: 1 int per topic per minute × 60 minutes × 100K topics = 100K × 60 × 4 bytes = 24MB. The bucket approach slightly overestimates by including events from the current partially-elapsed minute, and slightly underestimates by losing sub-minute precision — acceptable for trending purposes where approximate ranking is sufficient.”}},{“@type”:”Question”,”name”:”How does exponential decay prevent stale topics from trending indefinitely?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Without decay, a topic that surged 50 minutes ago still carries its full weight until the bucket exits the window at exactly 60 minutes — then it drops to zero suddenly. With decay weight e^(-lambda * age_minutes): a 30-minute-old event has weight e^(-0.05 × 30) ≈ 0.22 (22% of live weight). A 50-minute-old event has weight e^(-0.05 × 50) ≈ 0.08 (8%). This creates a smooth score decrease as engagement ages, preventing cliff effects at the window boundary. The decay constant (lambda=0.05) is tunable: smaller lambda = slower decay (topics stay trending longer), larger lambda = faster decay (only recent activity matters). Twitter uses a similar half-life decay for its trending algorithm.”}},{“@type”:”Question”,”name”:”How does the Count-Min Sketch solve the heavy hitters problem for trending?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”The candidate selection problem: from all hashtags in the universe, which ones should you compute scores for? You can’t score all possible hashtags. Count-Min Sketch maintains a 2D array of counters (width × depth, e.g., 1000 × 5 = 5000 counters). For each incoming event, hash the hashtag using depth different hash functions and increment the corresponding counter in each row. To estimate count: take the minimum across all depth rows for this hashtag. The minimum gives an upper-bound estimate — it overestimates but never underestimates. After processing N events, the top-K heavy hitters (hashtags with estimated count ≥ N/K) are the candidates. This is O(1) space regardless of hashtag cardinality and O(depth) per update — suitable for streaming at millions of events/second.”}},{“@type”:”Question”,”name”:”How do you prevent a bot network from artificially trending a hashtag?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Bot signals: (1) burst pattern — 100K events in 10 seconds from newly created accounts (natural trending is gradual); (2) account age — accounts created in the last 7 days have lower weight per event (multiply by 0.1); (3) IP concentration — 1000 events from the same /24 CIDR block; (4) identical post text — many accounts posting the exact same content. Mitigations: rate limit engagement per user per topic (one increment per 5 minutes); apply account age weighting before incrementing buckets; detect velocity anomalies (if score triples in 1 minute, flag for review rather than surfacing immediately); maintain a human-reviewed blocklist of abused hashtags. These signals compound — a topic is flagged only when multiple signals trigger simultaneously, reducing false positives.”}},{“@type”:”Question”,”name”:”How do you display trending topics differently across geographic regions?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Global trending surfaces topics popular worldwide, which may be irrelevant to users in specific regions. Regional trending: scope the engagement events by user location (IP geolocation or user profile location). Maintain separate Redis bucket keys per region: trend:US:{topic}:{bucket}, trend:JP:{topic}:{bucket}. The TopicScorer runs per region, computing top-K for each. Store regional rankings: trending:ranked:US, trending:ranked:JP. API: GET /trending?region=US. Location detection: use IP geolocation for unregistered users; use profile location for registered users. Coverage: for regions with insufficient data (trend buckets have < 1000 total events), fall back to global trending. This is how Twitter’s "Trends for you" works — city-level trending falls back to country, then global.”}}]}
Trending topics and hashtag system design is discussed in Twitter system design interview questions.
Trending topics and search trend system design is covered in Google system design interview preparation.
Trending topics and professional content trending design is discussed in LinkedIn system design interview guide.
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering