Trending Topics Low-Level Design: Sliding Window Counts, Time Decay, and Top-K Computation

A trending topics system surfaces content gaining rapid engagement — the top hashtags on Twitter, trending searches on Google, or rising products on Amazon. Core challenges: computing approximate top-K over a sliding time window with low latency, applying time decay so older engagement counts less, filtering spam and bot manipulation, and refreshing the list frequently without expensive full scans.

Core Data Model

-- Raw engagement events (write path: Kafka → Flink/Spark)
-- Not stored long-term in Postgres; processed into aggregates

-- Pre-computed trending scores (updated every 5 minutes)
CREATE TABLE TrendingItem (
    topic         TEXT NOT NULL,
    category      TEXT NOT NULL DEFAULT 'hashtag',  -- 'hashtag','search','product'
    score         NUMERIC(12,4) NOT NULL,
    velocity      NUMERIC(12,4) NOT NULL,  -- rate of change (acceleration)
    rank          INT NOT NULL,
    window_start  TIMESTAMPTZ NOT NULL,    -- e.g. "last 1 hour"
    computed_at   TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    PRIMARY KEY (topic, category, window_start)
);
CREATE INDEX idx_trending_rank ON TrendingItem (category, rank) WHERE window_start = (SELECT MAX(window_start) FROM TrendingItem);

-- Sliding window counts (Redis-based, not Postgres)
-- Key: trend:{category}:{topic}:{bucket}  (1-minute buckets)
-- Value: integer count
-- TTL: 2 hours (window size + buffer)

Redis Sliding Window Counter

import time, redis
from datetime import datetime, timezone

r = redis.Redis(host='redis', decode_responses=True)

WINDOW_SECONDS = 3600   # 1-hour trending window
BUCKET_SECONDS = 60     # 1-minute buckets
NUM_BUCKETS = WINDOW_SECONDS // BUCKET_SECONDS  # 60 buckets

def record_engagement(category: str, topic: str):
    """
    Increment the current time bucket for this topic.
    Uses 1-minute buckets to approximate a sliding window.
    """
    topic_normalized = topic.lower().strip().lstrip('#')
    now = int(time.time())
    bucket = (now // BUCKET_SECONDS) * BUCKET_SECONDS  # floor to bucket boundary

    key = f"trend:{category}:{topic_normalized}:{bucket}"
    r.incr(key)
    r.expire(key, WINDOW_SECONDS + BUCKET_SECONDS)  # TTL = window + one extra bucket

def get_topic_count(category: str, topic: str) -> int:
    """
    Sum all bucket counts within the sliding window.
    Approximation: uses fixed 1-minute buckets — true sliding window would require
    per-event timestamps but is 10-100x more expensive.
    """
    topic_normalized = topic.lower().strip().lstrip('#')
    now = int(time.time())
    total = 0

    pipeline = r.pipeline()
    buckets = []
    for i in range(NUM_BUCKETS):
        bucket = ((now - i * BUCKET_SECONDS) // BUCKET_SECONDS) * BUCKET_SECONDS
        key = f"trend:{category}:{topic_normalized}:{bucket}"
        pipeline.get(key)
        buckets.append(key)

    results = pipeline.execute()
    total = sum(int(v) for v in results if v)
    return total

Top-K Computation with Time Decay

def compute_trending_scores(category: str, candidate_topics: list[str]) -> list[dict]:
    """
    Score each candidate topic using Wilson score or simple decay-weighted count.
    Decay: recent engagement counts more than older engagement.
    """
    now = int(time.time())
    results = []

    for topic in candidate_topics:
        topic_normalized = topic.lower().strip().lstrip('#')
        # Fetch all bucket counts in the window
        pipeline = r.pipeline()
        bucket_counts = []
        for i in range(NUM_BUCKETS):
            bucket_start = ((now - i * BUCKET_SECONDS) // BUCKET_SECONDS) * BUCKET_SECONDS
            key = f"trend:{category}:{topic_normalized}:{bucket_start}"
            pipeline.get(key)
            bucket_counts.append((bucket_start, key))

        raw_counts = pipeline.execute()

        # Apply exponential decay: older buckets get lower weight
        # Weight = e^(-lambda * age_in_minutes), lambda = 0.05
        import math
        DECAY_LAMBDA = 0.05
        score = 0.0
        total_count = 0
        for (bucket_start, _), count_str in zip(bucket_counts, raw_counts):
            count = int(count_str) if count_str else 0
            age_minutes = (now - bucket_start) / 60
            weight = math.exp(-DECAY_LAMBDA * age_minutes)
            score += count * weight
            total_count += count

        if total_count > 10:  # minimum engagement threshold to avoid spam
            results.append({"topic": topic, "score": score,
                            "total_count": total_count, "category": category})

    # Sort by score descending, return top-K
    results.sort(key=lambda x: x["score"], reverse=True)
    return results[:50]  # top 50 candidates

def refresh_trending(conn, category: str):
    """
    Periodic job (every 5 minutes): compute top trending topics
    and update TrendingItem table + Redis cache.
    """
    # Get active topics from Redis SCAN (find all trend:{category}:* keys)
    active_topics = set()
    cursor = 0
    while True:
        cursor, keys = r.scan(cursor, match=f"trend:{category}:*", count=1000)
        for key in keys:
            parts = key.split(":")
            if len(parts) >= 3:
                active_topics.add(parts[2])  # topic name
        if cursor == 0:
            break

    trending = compute_trending_scores(category, list(active_topics))

    # Write to Postgres and Redis sorted set
    window_start = datetime.now(timezone.utc).replace(second=0, microsecond=0)
    pipeline = r.pipeline()
    pipeline.delete(f"trending:ranked:{category}")  # clear old rankings

    with conn.cursor() as cur:
        for rank, item in enumerate(trending[:20], start=1):
            cur.execute("""
                INSERT INTO TrendingItem (topic, category, score, velocity, rank, window_start)
                VALUES (%s, %s, %s, 0, %s, %s)
                ON CONFLICT (topic, category, window_start) DO UPDATE
                SET score=EXCLUDED.score, rank=EXCLUDED.rank, computed_at=NOW()
            """, (item['topic'], category, item['score'], rank, window_start))
            pipeline.zadd(f"trending:ranked:{category}", {item['topic']: item['score']})

    conn.commit()
    pipeline.expire(f"trending:ranked:{category}", 300)  # 5-minute TTL
    pipeline.execute()

Key Interview Points

  • Approximate top-K vs exact: Exact top-K over a 1-hour window requires storing all engagement events — at 100M events/hour, this is a massive dataset. Approximate top-K with Count-Min Sketch reduces memory to O(width × depth) (e.g., 1,000 × 5 = 5,000 counters). Count-Min Sketch overestimates but never underestimates — acceptable for trending. Redis sorted sets hold top-N explicitly; the question is how to populate them without scanning billions of events.
  • Heavy hitters problem: The candidate set (which topics to score) is itself a top-K problem. Use the Space Saving algorithm or Misra-Gries to maintain a list of probable heavy hitters with guaranteed error bounds. Alternative: use Kafka partitioned by topic — a consumer per partition maintains a local top-K, and a reducer merges partition results every 5 minutes (Flink pattern).
  • Time decay prevents stale trends: A topic that trended 3 hours ago should not still appear in the top-10. Exponential decay (e^(-lambda * age)) smoothly reduces the influence of older buckets. Lambda controls decay speed: lambda=0.05 → a 60-minute-old event has weight e^(-3) ≈ 0.05 (5% of live weight). Alternatively, use a 1-hour fixed window with hard cutoff — simpler but creates step-function jumps as events age out.
  • Spam and bot manipulation: Bots can inflate topics by generating fake engagement. Mitigations: (1) rate limit: at most 1 engagement per (user_id, topic) per 5 minutes; (2) new account penalty: engagement from accounts < 7 days old counts as 0.1 votes; (3) velocity check: a topic that goes from 0 to 10M events in 1 minute is likely spam — cap score increase to 10× per refresh cycle; (4) content moderation blocklist: filtered topics never appear in trending regardless of score.
  • Personalized trending: Global trending may show topics irrelevant to a user. Personalize by weighting topics by the user’s interest graph: topics popular among the user’s followees get a 2× boost. Compute personalized scores lazily on feed load rather than pre-computing for all users — only fetch if the user requests the trending widget. Cache per user for 5 minutes in Redis.

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”Why use 1-minute buckets instead of a true sliding window for trend counting?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”A true sliding window maintains a timestamp for every engagement event: if 10M events occurred in the past hour, you store 10M timestamps. Memory: 10M × 8 bytes = 80MB per topic. With 100K active topics: 8TB of Redis memory. Impractical. 1-minute buckets trade precision for efficiency: instead of exact event timestamps, store one counter per topic per minute. To get the 1-hour count: sum 60 bucket values (60 GET operations via pipeline). Memory: 1 int per topic per minute × 60 minutes × 100K topics = 100K × 60 × 4 bytes = 24MB. The bucket approach slightly overestimates by including events from the current partially-elapsed minute, and slightly underestimates by losing sub-minute precision — acceptable for trending purposes where approximate ranking is sufficient.”}},{“@type”:”Question”,”name”:”How does exponential decay prevent stale topics from trending indefinitely?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Without decay, a topic that surged 50 minutes ago still carries its full weight until the bucket exits the window at exactly 60 minutes — then it drops to zero suddenly. With decay weight e^(-lambda * age_minutes): a 30-minute-old event has weight e^(-0.05 × 30) ≈ 0.22 (22% of live weight). A 50-minute-old event has weight e^(-0.05 × 50) ≈ 0.08 (8%). This creates a smooth score decrease as engagement ages, preventing cliff effects at the window boundary. The decay constant (lambda=0.05) is tunable: smaller lambda = slower decay (topics stay trending longer), larger lambda = faster decay (only recent activity matters). Twitter uses a similar half-life decay for its trending algorithm.”}},{“@type”:”Question”,”name”:”How does the Count-Min Sketch solve the heavy hitters problem for trending?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”The candidate selection problem: from all hashtags in the universe, which ones should you compute scores for? You can’t score all possible hashtags. Count-Min Sketch maintains a 2D array of counters (width × depth, e.g., 1000 × 5 = 5000 counters). For each incoming event, hash the hashtag using depth different hash functions and increment the corresponding counter in each row. To estimate count: take the minimum across all depth rows for this hashtag. The minimum gives an upper-bound estimate — it overestimates but never underestimates. After processing N events, the top-K heavy hitters (hashtags with estimated count ≥ N/K) are the candidates. This is O(1) space regardless of hashtag cardinality and O(depth) per update — suitable for streaming at millions of events/second.”}},{“@type”:”Question”,”name”:”How do you prevent a bot network from artificially trending a hashtag?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Bot signals: (1) burst pattern — 100K events in 10 seconds from newly created accounts (natural trending is gradual); (2) account age — accounts created in the last 7 days have lower weight per event (multiply by 0.1); (3) IP concentration — 1000 events from the same /24 CIDR block; (4) identical post text — many accounts posting the exact same content. Mitigations: rate limit engagement per user per topic (one increment per 5 minutes); apply account age weighting before incrementing buckets; detect velocity anomalies (if score triples in 1 minute, flag for review rather than surfacing immediately); maintain a human-reviewed blocklist of abused hashtags. These signals compound — a topic is flagged only when multiple signals trigger simultaneously, reducing false positives.”}},{“@type”:”Question”,”name”:”How do you display trending topics differently across geographic regions?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Global trending surfaces topics popular worldwide, which may be irrelevant to users in specific regions. Regional trending: scope the engagement events by user location (IP geolocation or user profile location). Maintain separate Redis bucket keys per region: trend:US:{topic}:{bucket}, trend:JP:{topic}:{bucket}. The TopicScorer runs per region, computing top-K for each. Store regional rankings: trending:ranked:US, trending:ranked:JP. API: GET /trending?region=US. Location detection: use IP geolocation for unregistered users; use profile location for registered users. Coverage: for regions with insufficient data (trend buckets have < 1000 total events), fall back to global trending. This is how Twitter’s "Trends for you" works — city-level trending falls back to country, then global.”}}]}

Trending topics and hashtag system design is discussed in Twitter system design interview questions.

Trending topics and search trend system design is covered in Google system design interview preparation.

Trending topics and professional content trending design is discussed in LinkedIn system design interview guide.

See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

Scroll to Top