Trending Topics Low-Level Design: Sliding Window Counts, Time Decay, and Top-K Computation

A trending topics system surfaces content gaining rapid engagement — the top hashtags on Twitter, trending searches on Google, or rising products on Amazon. Core challenges: computing approximate top-K over a sliding time window with low latency, applying time decay so older engagement counts less, filtering spam and bot manipulation, and refreshing the list frequently without expensive full scans.

Core Data Model

-- Raw engagement events (write path: Kafka → Flink/Spark)
-- Not stored long-term in Postgres; processed into aggregates

-- Pre-computed trending scores (updated every 5 minutes)
CREATE TABLE TrendingItem (
    topic         TEXT NOT NULL,
    category      TEXT NOT NULL DEFAULT 'hashtag',  -- 'hashtag','search','product'
    score         NUMERIC(12,4) NOT NULL,
    velocity      NUMERIC(12,4) NOT NULL,  -- rate of change (acceleration)
    rank          INT NOT NULL,
    window_start  TIMESTAMPTZ NOT NULL,    -- e.g. "last 1 hour"
    computed_at   TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    PRIMARY KEY (topic, category, window_start)
);
CREATE INDEX idx_trending_rank ON TrendingItem (category, rank) WHERE window_start = (SELECT MAX(window_start) FROM TrendingItem);

-- Sliding window counts (Redis-based, not Postgres)
-- Key: trend:{category}:{topic}:{bucket}  (1-minute buckets)
-- Value: integer count
-- TTL: 2 hours (window size + buffer)

Redis Sliding Window Counter

import time, redis
from datetime import datetime, timezone

r = redis.Redis(host='redis', decode_responses=True)

WINDOW_SECONDS = 3600   # 1-hour trending window
BUCKET_SECONDS = 60     # 1-minute buckets
NUM_BUCKETS = WINDOW_SECONDS // BUCKET_SECONDS  # 60 buckets

def record_engagement(category: str, topic: str):
    """
    Increment the current time bucket for this topic.
    Uses 1-minute buckets to approximate a sliding window.
    """
    topic_normalized = topic.lower().strip().lstrip('#')
    now = int(time.time())
    bucket = (now // BUCKET_SECONDS) * BUCKET_SECONDS  # floor to bucket boundary

    key = f"trend:{category}:{topic_normalized}:{bucket}"
    r.incr(key)
    r.expire(key, WINDOW_SECONDS + BUCKET_SECONDS)  # TTL = window + one extra bucket

def get_topic_count(category: str, topic: str) -> int:
    """
    Sum all bucket counts within the sliding window.
    Approximation: uses fixed 1-minute buckets — true sliding window would require
    per-event timestamps but is 10-100x more expensive.
    """
    topic_normalized = topic.lower().strip().lstrip('#')
    now = int(time.time())
    total = 0

    pipeline = r.pipeline()
    buckets = []
    for i in range(NUM_BUCKETS):
        bucket = ((now - i * BUCKET_SECONDS) // BUCKET_SECONDS) * BUCKET_SECONDS
        key = f"trend:{category}:{topic_normalized}:{bucket}"
        pipeline.get(key)
        buckets.append(key)

    results = pipeline.execute()
    total = sum(int(v) for v in results if v)
    return total

Top-K Computation with Time Decay

def compute_trending_scores(category: str, candidate_topics: list[str]) -> list[dict]:
    """
    Score each candidate topic using Wilson score or simple decay-weighted count.
    Decay: recent engagement counts more than older engagement.
    """
    now = int(time.time())
    results = []

    for topic in candidate_topics:
        topic_normalized = topic.lower().strip().lstrip('#')
        # Fetch all bucket counts in the window
        pipeline = r.pipeline()
        bucket_counts = []
        for i in range(NUM_BUCKETS):
            bucket_start = ((now - i * BUCKET_SECONDS) // BUCKET_SECONDS) * BUCKET_SECONDS
            key = f"trend:{category}:{topic_normalized}:{bucket_start}"
            pipeline.get(key)
            bucket_counts.append((bucket_start, key))

        raw_counts = pipeline.execute()

        # Apply exponential decay: older buckets get lower weight
        # Weight = e^(-lambda * age_in_minutes), lambda = 0.05
        import math
        DECAY_LAMBDA = 0.05
        score = 0.0
        total_count = 0
        for (bucket_start, _), count_str in zip(bucket_counts, raw_counts):
            count = int(count_str) if count_str else 0
            age_minutes = (now - bucket_start) / 60
            weight = math.exp(-DECAY_LAMBDA * age_minutes)
            score += count * weight
            total_count += count

        if total_count > 10:  # minimum engagement threshold to avoid spam
            results.append({"topic": topic, "score": score,
                            "total_count": total_count, "category": category})

    # Sort by score descending, return top-K
    results.sort(key=lambda x: x["score"], reverse=True)
    return results[:50]  # top 50 candidates

def refresh_trending(conn, category: str):
    """
    Periodic job (every 5 minutes): compute top trending topics
    and update TrendingItem table + Redis cache.
    """
    # Get active topics from Redis SCAN (find all trend:{category}:* keys)
    active_topics = set()
    cursor = 0
    while True:
        cursor, keys = r.scan(cursor, match=f"trend:{category}:*", count=1000)
        for key in keys:
            parts = key.split(":")
            if len(parts) >= 3:
                active_topics.add(parts[2])  # topic name
        if cursor == 0:
            break

    trending = compute_trending_scores(category, list(active_topics))

    # Write to Postgres and Redis sorted set
    window_start = datetime.now(timezone.utc).replace(second=0, microsecond=0)
    pipeline = r.pipeline()
    pipeline.delete(f"trending:ranked:{category}")  # clear old rankings

    with conn.cursor() as cur:
        for rank, item in enumerate(trending[:20], start=1):
            cur.execute("""
                INSERT INTO TrendingItem (topic, category, score, velocity, rank, window_start)
                VALUES (%s, %s, %s, 0, %s, %s)
                ON CONFLICT (topic, category, window_start) DO UPDATE
                SET score=EXCLUDED.score, rank=EXCLUDED.rank, computed_at=NOW()
            """, (item['topic'], category, item['score'], rank, window_start))
            pipeline.zadd(f"trending:ranked:{category}", {item['topic']: item['score']})

    conn.commit()
    pipeline.expire(f"trending:ranked:{category}", 300)  # 5-minute TTL
    pipeline.execute()

Key Interview Points

  • Approximate top-K vs exact: Exact top-K over a 1-hour window requires storing all engagement events — at 100M events/hour, this is a massive dataset. Approximate top-K with Count-Min Sketch reduces memory to O(width × depth) (e.g., 1,000 × 5 = 5,000 counters). Count-Min Sketch overestimates but never underestimates — acceptable for trending. Redis sorted sets hold top-N explicitly; the question is how to populate them without scanning billions of events.
  • Heavy hitters problem: The candidate set (which topics to score) is itself a top-K problem. Use the Space Saving algorithm or Misra-Gries to maintain a list of probable heavy hitters with guaranteed error bounds. Alternative: use Kafka partitioned by topic — a consumer per partition maintains a local top-K, and a reducer merges partition results every 5 minutes (Flink pattern).
  • Time decay prevents stale trends: A topic that trended 3 hours ago should not still appear in the top-10. Exponential decay (e^(-lambda * age)) smoothly reduces the influence of older buckets. Lambda controls decay speed: lambda=0.05 → a 60-minute-old event has weight e^(-3) ≈ 0.05 (5% of live weight). Alternatively, use a 1-hour fixed window with hard cutoff — simpler but creates step-function jumps as events age out.
  • Spam and bot manipulation: Bots can inflate topics by generating fake engagement. Mitigations: (1) rate limit: at most 1 engagement per (user_id, topic) per 5 minutes; (2) new account penalty: engagement from accounts < 7 days old counts as 0.1 votes; (3) velocity check: a topic that goes from 0 to 10M events in 1 minute is likely spam — cap score increase to 10× per refresh cycle; (4) content moderation blocklist: filtered topics never appear in trending regardless of score.
  • Personalized trending: Global trending may show topics irrelevant to a user. Personalize by weighting topics by the user’s interest graph: topics popular among the user’s followees get a 2× boost. Compute personalized scores lazily on feed load rather than pre-computing for all users — only fetch if the user requests the trending widget. Cache per user for 5 minutes in Redis.

Trending topics and hashtag system design is discussed in Twitter system design interview questions.

Trending topics and search trend system design is covered in Google system design interview preparation.

Trending topics and professional content trending design is discussed in LinkedIn system design interview guide.

See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

Scroll to Top