System Design Interview: Real-Time Bidding Platform (Ad Exchange)

The global programmatic advertising market is $500B. Every time a web page loads, a real-time auction runs in under 100 milliseconds to decide which ad to show. Google, Meta, Amazon, The Trade Desk, and DoubleClick process billions of these auctions per day. Designing an RTB platform is a common question at ad-tech companies and any company monetizing through programmatic advertising.

The RTB Ecosystem

Publisher: a website or app with ad inventory (slots where ads appear). Publishers use a Supply-Side Platform (SSP) to sell their inventory programmatically.

SSP (Supply-Side Platform): aggregates publisher inventory and conducts the auction. Sends bid requests to DSPs. Selects the winner and returns the winning ad creative.

DSP (Demand-Side Platform): represents advertisers. Receives bid requests, evaluates targeting criteria, and submits bids for impressions that match their campaigns.

Ad Exchange: the marketplace connecting SSPs and DSPs. Routes bid requests, runs the auction, and clears the transaction.

Flow: User loads a page → Publisher calls SSP with ad slot details → SSP sends bid requests to 50+ DSPs simultaneously → DSPs respond within 80ms → SSP runs second-price auction → Winner’s ad is returned to the browser.

The Auction: 100ms Budget

The entire RTB cycle must complete in under 100ms (IAB standard). Time budget breakdown:

  • 10ms: SSP receives impression opportunity, prepares bid request
  • 10ms: bid request transmitted to DSPs (network)
  • 60ms: DSP processing time (targeting, bidding, budget checks)
  • 10ms: DSP response transmitted back
  • 10ms: SSP runs auction, selects winner, returns ad markup

DSPs that do not respond within 60ms are excluded from the auction — timeout is strictly enforced. Any DSP with a p99 response time above 80ms will see its bid requests throttled by SSPs.

DSP Architecture

The DSP must evaluate a bid request, match targeting criteria, calculate a bid price, and respond in <60ms. Requirements: process 500K bid requests/second, each requiring real-time user targeting and budget checks.

class DSPBidEngine:
    def bid(self, request: BidRequest) -> Optional[BidResponse]:
        # 1. Check frequency cap: user already seen this ad too many times?
        user_id = request.user_id
        freq = self.freq_store.get(f"freq:{user_id}:{request.ad_id}")
        if freq >= self.campaign.daily_frequency_cap:
            return None  # skip — user capped

        # 2. Targeting: does this impression match campaign criteria?
        if not self.targeting.matches(request):
            return None  # wrong audience, geo, context

        # 3. Budget check: campaign has remaining budget?
        if not self.budget_service.has_budget(self.campaign.id):
            return None  # paused or exhausted

        # 4. Calculate bid price (ML model or fixed CPM)
        bid_price = self.bidder.predict_value(request)

        return BidResponse(price=bid_price, creative=self.ad_creative)

Each check must be O(1). Frequency caps and user data live in Redis (single GET per check, <1ms). Targeting evaluation uses pre-compiled boolean expression trees stored in memory. Budget state is maintained in Redis with atomic DECR on each win.

Targeting

Targeting criteria define which users an ad should reach:

  • Demographic: age range, gender, income bracket (from first-party data or data partners)
  • Geographic: country, DMA, city, zip code radius
  • Behavioral: users who visited specific URLs, searched for keywords, or belong to audience segments (auto intenders, luxury shoppers)
  • Contextual: page content category (IAB taxonomy), keywords on the page
  • Retargeting: users who visited the advertiser’s site (tracked via pixel)

User segments are stored as bitsets (one bit per user) — segment membership is checked with a single bitwise AND operation. User data is joined with the bid request in-memory from a pre-loaded segment store (hourly refreshed from the DMP).

Frequency Capping

Frequency capping limits how many times a user sees a given ad (typically 3-5 times per day). Implementation: a Redis hash per user per campaign: HINCRBY freq:{user_id}:{campaign_id} impressions 1. If the value exceeds the cap, skip the bid. TTL resets at midnight UTC. The tricky part: TTL must be set to expire at midnight, not a rolling 24 hours. Use EXPIREAT with the timestamp of the next midnight. At 500M users × 1000 campaigns = 500B keys — too much for a single Redis cluster. Solution: use a probabilistic data structure (Count-Min Sketch or HyperLogLog) for approximate frequency counting, sacrificing 1-5% accuracy for 10x memory reduction.

Budget Pacing

A campaign with a $1000 daily budget should not exhaust it in the first hour. Budget pacing distributes spend evenly throughout the day. Algorithm: target spend rate = remaining_budget / remaining_hours. If actual spend rate exceeds the target, throttle bids (bid less frequently or at lower prices) until the rate normalizes. Implementation: a pacing service runs every 60 seconds, computes the ratio of actual spend to target spend per campaign, and sets a throttle factor (0.0 to 1.0) in Redis. The DSP checks this factor: if random() > throttle_factor, skip the bid. This implements stochastic throttling — unbiased across impression opportunities.

Win Notification and Impression Tracking

In a second-price auction: the DSP with the highest bid wins but pays the second-highest bid + $0.01. The win price is sent back to the DSP via a win notification URL embedded in the ad markup (a server-to-server callback). The DSP records: bid_id, win_price, campaign_id, user_id, timestamp → Kafka topic → batch-loaded into the campaign analytics database (ClickHouse) for reporting. The SSP fires a pixel impression tracker when the ad is rendered — records the impression for billing and frequency cap updates.

Scale Numbers

  • Google Display Network: 3 trillion ad impressions per year = 95K impressions/second
  • Each impression generates 50+ bid requests (one per DSP) = 4.7M bid requests/second to DSPs globally
  • Win notification writes: 95K/second to Kafka → ClickHouse
  • User segment data: 500M users × 1000 segments = 500B bits = 62GB compressed bitset (fits in RAM on 4-socket server)

Frequently Asked Questions

How does a real-time bidding auction work end-to-end?

When a user loads a webpage, the publisher's ad server sends a bid request to an SSP (Supply-Side Platform) containing the impression opportunity: ad slot dimensions, page URL, anonymized user identifiers (cookie IDs or mobile advertising IDs), and floor price. The SSP forwards the bid request simultaneously to dozens of DSPs (Demand-Side Platforms). Each DSP must respond within 80-100ms total (the bid request must arrive, be processed, and the response returned before the auction closes). The DSP looks up the user's audience segment in its data store, retrieves active campaigns targeting this segment, scores each against the impression using a bid price model (often a neural network or LightGBM), and returns the highest bid in an OpenRTB 2.x format JSON response. The SSP runs a second-price auction: the highest bidder wins but pays the second-highest bid price plus one cent. The winning ad creative URL is returned to the browser, which fetches and renders the ad. Win/loss notifications are sent asynchronously — DSPs use these to track spend pacing and impression frequency.

How does frequency capping work in programmatic advertising?

Frequency capping limits how many times a user sees the same ad within a time window (e.g., no more than 5 impressions per user per day). The DSP must check and update the frequency counter within the 80-100ms bid response window. Implementation using Redis: each user-campaign pair has a key with a TTL equal to the cap window. On each bid request, HINCRBY user:{uid} campaign:{cid} 1 atomically increments the counter and returns the new value. If the returned value exceeds the cap, the DSP does not bid on this impression. The key expires automatically after the window closes. At scale (1 million QPS), Redis handles this with sub-millisecond latency. For cross-device frequency capping (same user on phone and desktop), the DSP must have a deterministic cross-device graph linking device IDs to a person ID, and use the person ID as the key. Probabilistic approaches use Count-Min Sketch: a compact probabilistic data structure that estimates frequency with bounded error, using O(width × depth) space regardless of the number of unique users, suitable when approximate counts are acceptable.

How does budget pacing work to spread ad spend evenly throughout the day?

Without pacing, a campaign with a $10,000 daily budget would spend it all in the first hour on high-traffic morning inventory, leaving no budget for afternoon and evening audiences. Pacing spreads spend evenly. The simplest approach is throttling: track current spend rate vs. target rate (budget / elapsed_day_fraction). If spending too fast, randomly skip a fraction of bid opportunities. If spending too slow, increase bid aggressiveness. Implementation: a campaign pacing service updates a bid multiplier every 60 seconds based on spend rate. The DSP fetches this multiplier from a local cache and applies it to the base bid price. For smooth pacing, token bucket rate limiting is used: campaigns receive tokens at a rate proportional to their budget allocation. Each bid costs one token. If the bucket is empty, the DSP skips the bid. The bucket fills at the configured rate (e.g., 100 tokens/minute for a campaign targeting 100 impressions/minute). Redis DECR on a token counter with a background refill job implements this efficiently. Real systems use feedback control (PID controllers) to handle non-uniform traffic patterns — more inventory is available during peak hours, requiring the pacing algorithm to be reactive.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does a real-time bidding auction work end-to-end?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “When a user loads a webpage, the publisher’s ad server sends a bid request to an SSP (Supply-Side Platform) containing the impression opportunity: ad slot dimensions, page URL, anonymized user identifiers (cookie IDs or mobile advertising IDs), and floor price. The SSP forwards the bid request simultaneously to dozens of DSPs (Demand-Side Platforms). Each DSP must respond within 80-100ms total (the bid request must arrive, be processed, and the response returned before the auction closes). The DSP looks up the user’s audience segment in its data store, retrieves active campaigns targeting this segment, scores each against the impression using a bid price model (often a neural network or LightGBM), and returns the highest bid in an OpenRTB 2.x format JSON response. The SSP runs a second-price auction: the highest bidder wins but pays the second-highest bid price plus one cent. The winning ad creative URL is returned to the browser, which fetches and renders the ad. Win/loss notifications are sent asynchronously — DSPs use these to track spend pacing and impression frequency.”
}
},
{
“@type”: “Question”,
“name”: “How does frequency capping work in programmatic advertising?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Frequency capping limits how many times a user sees the same ad within a time window (e.g., no more than 5 impressions per user per day). The DSP must check and update the frequency counter within the 80-100ms bid response window. Implementation using Redis: each user-campaign pair has a key with a TTL equal to the cap window. On each bid request, HINCRBY user:{uid} campaign:{cid} 1 atomically increments the counter and returns the new value. If the returned value exceeds the cap, the DSP does not bid on this impression. The key expires automatically after the window closes. At scale (1 million QPS), Redis handles this with sub-millisecond latency. For cross-device frequency capping (same user on phone and desktop), the DSP must have a deterministic cross-device graph linking device IDs to a person ID, and use the person ID as the key. Probabilistic approaches use Count-Min Sketch: a compact probabilistic data structure that estimates frequency with bounded error, using O(width × depth) space regardless of the number of unique users, suitable when approximate counts are acceptable.”
}
},
{
“@type”: “Question”,
“name”: “How does budget pacing work to spread ad spend evenly throughout the day?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Without pacing, a campaign with a $10,000 daily budget would spend it all in the first hour on high-traffic morning inventory, leaving no budget for afternoon and evening audiences. Pacing spreads spend evenly. The simplest approach is throttling: track current spend rate vs. target rate (budget / elapsed_day_fraction). If spending too fast, randomly skip a fraction of bid opportunities. If spending too slow, increase bid aggressiveness. Implementation: a campaign pacing service updates a bid multiplier every 60 seconds based on spend rate. The DSP fetches this multiplier from a local cache and applies it to the base bid price. For smooth pacing, token bucket rate limiting is used: campaigns receive tokens at a rate proportional to their budget allocation. Each bid costs one token. If the bucket is empty, the DSP skips the bid. The bucket fills at the configured rate (e.g., 100 tokens/minute for a campaign targeting 100 impressions/minute). Redis DECR on a token counter with a background refill job implements this efficiently. Real systems use feedback control (PID controllers) to handle non-uniform traffic patterns — more inventory is available during peak hours, requiring the pacing algorithm to be reactive.”
}
}
]
}

Companies That Ask This Question

Scroll to Top