System Design: Real-Time Bidding (RTB) Platform — Ad Auction in 100ms (2025)

RTB Architecture Overview

Real-Time Bidding is the programmatic ad auction system that runs every time a user loads a web page with an ad slot. Timeline: user visits a page → publisher sends a bid request to an SSP (Supply-Side Platform) → SSP broadcasts the bid request to 10-50 DSPs (Demand-Side Platforms) → each DSP evaluates and responds with a bid in < 100ms → SSP runs the auction (second-price) → winning DSP is notified → ad is served. The entire cycle must complete in < 100-150ms wall-clock time before the page renders, which means DSP bidding logic has a hard budget of ~50ms including network round-trip. This is one of the most latency-sensitive distributed systems at scale: 5-10 million bid requests per second globally.

Bid Request and Response

The OpenRTB standard (IAB) defines the bid request/response schema. Key fields in the bid request: id (auction ID), imp (impression array with ad slot specs: size, floor price, placement type), site/app (publisher URL, category, language), user (user ID, age, gender if available), device (IP, user agent, geo, device type), at (auction type: 1=first-price, 2=second-price). The DSP responds with: id (matches request ID), seatbid[].bid: price (CPM in USD), adid (creative ID), adm (ad markup), nurl (win notification URL), lurl (loss notification URL).

# Bid request processing (DSP side)
class BidEngine:
    def process_bid_request(self, request: BidRequest) -> Optional[BidResponse]:
        start = time.monotonic_ns()

        # 1. User lookup: fetch user profile from Redis ( 45:  # timeout guard
            return None

        return BidResponse(
            id=request.id,
            seatbid=[SeatBid(bid=[Bid(
                price=bid_price,
                adid=best_campaign.creative_id,
                nurl=f"https://dsp.example.com/win?aid={request.id}&price=${{AUCTION_PRICE}}"
            )])]
        )

Budget Pacing with Redis

Advertisers set daily budgets. DSPs must pace spend evenly (not blow the budget in the first hour). Token bucket pacing: each campaign has a Redis counter for tokens. A background job refills tokens every second: refill_rate = daily_budget / (86400 * avg_cpm / 1000). On each bid: DECRBY the tokens counter by the bid price. If tokens 1.0, slow down; if < 0.8, speed up.

Win/Loss Notification and Attribution

When the SSP selects a winner: the SSP fires the win notification URL (nurl) with the clearing price substituted for ${AUCTION_PRICE}. The DSP receives this HTTP callback and: (1) records the win and clearing price, (2) deducts the actual clearing price from the budget (vs. the bid price deducted optimistically), (3) credits back the difference. For second-price auctions, the clearing price < bid price in most cases. Loss notifications (lurl) are optionally fired for losing bids. Attribution: when the served ad is clicked or viewed, the DSP fires impression/click pixels, logging event data to Kafka. Attribution joins click events to the original bid record to credit conversions to the correct campaign.

Frequency Capping

Frequency cap: limit how many times a user sees an ad (e.g., max 3 impressions per day per campaign). Implementation: Redis sorted set per (user_id, campaign_id) with impression timestamps. On each bid: ZRANGEBYSCORE to count impressions in the last 24 hours. If count >= cap: do not bid. ZADD the timestamp on win notification. Expiry: ZREMRANGEBYSCORE to remove entries older than 24 hours. TTL on the key = 25 hours. Cross-device frequency capping: use a device graph to link user IDs across devices and aggregate frequency counts. For very high scale (100M+ users): use a Bloom filter or Count-Min Sketch per campaign to approximate frequency — allows false positives (skip some valid impressions) but prevents over-serving with O(1) operations.


{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”Why must DSPs respond to bid requests in under 100ms?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”The RTB auction must complete before the web page renders the ad slot — typically within 100-150ms of the page load. The SSP deducts its own processing time (10-20ms), leaving 80-100ms for DSPs to receive the bid request, evaluate it, and return a response. The SSP sets a timeout and ignores any response after the deadline. DSPs that consistently time out are penalized with reduced bid request volume by the SSP. This forces DSPs to optimize every component: feature retrieval from Redis must be < 5ms, model inference < 10ms, campaign matching must use precomputed indexes rather than database queries, and the bid response must be serialized and sent in the remaining budget.”}},{“@type”:”Question”,”name”:”What is the "new enemy" problem in authorization systems and how does Zanzibar solve it?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”The new enemy problem: a user grants access to a resource, then immediately revokes it. Due to eventual consistency in distributed authorization stores, the revocation may not be visible to all nodes yet. A read that arrives microseconds after the revoke may still see the old (access-granted) state — the user who was just de-authorized is the "new enemy" who still gets through. Zanzibar solves this with Zookies (Zanzibar cookies): each write returns a token encoding the write's timestamp (a Spanner read timestamp). Subsequent authorization checks include this token and are processed at a timestamp >= the token's timestamp, guaranteeing they see the revocation. This is an external consistency guarantee built on Spanner's TrueTime.”}},{“@type”:”Question”,”name”:”How does smooth budget pacing work in RTB to avoid early budget exhaustion?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Smooth pacing distributes ad spend evenly across the day, respecting that traffic volume is not uniform (peaks in morning and evening). The ideal spend curve is precomputed from historical traffic data: for each hour, what fraction of daily impressions typically occur? At any point in the day, ideal_spend_so_far = daily_budget * cumulative_traffic_fraction_so_far. A throttle ratio = actual_spend / ideal_spend controls the bid rate: ratio < 0.8: increase bid rate (underpacing, risk of not spending the budget). Ratio > 1.0: reduce bid rate (overpacing, risk of exhausting budget early). Implementation: a Redis counter tracks actual spend; a background job computes the throttle ratio every second and updates a Redis key that the bidding engine reads before each bid decision.”}},{“@type”:”Question”,”name”:”What is the difference between RBAC and ABAC, and when should you use each?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”RBAC (Role-Based): permissions are attached to roles, users are assigned roles. Simple, auditable, works well when access patterns are defined by job function (e.g., admin, editor, viewer). Limitation: cannot express policies like "users can only edit their own documents" or "access only from corporate IP range." ABAC (Attribute-Based): policies evaluate attributes of the subject, resource, action, and environment. More expressive — can express row-level security, time-of-day restrictions, geofencing, and any combination. Limitation: harder to reason about (what can a user do?), harder to audit, higher implementation complexity. Use RBAC for coarse-grained access (role-level, resource-type-level). Use ABAC (or ReBAC) when you need fine-grained, context-aware policies that RBAC cannot express.”}},{“@type”:”Question”,”name”:”How does frequency capping at scale use probabilistic data structures?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Exact frequency capping requires a Redis sorted set per (user, campaign) to track impression timestamps — O(1) per check but O(users * campaigns) memory. At billion-user scale this is prohibitive. Probabilistic alternative: Count-Min Sketch (CMS) per campaign. A CMS is a 2D array of counters with multiple hash functions. On each impression: increment CMS[h_i(user_id)] for each hash function i. Frequency estimate: min(CMS[h_i(user_id)]) across all hash functions. CMS overestimates (never underestimates) — it may over-cap a user (show them fewer ads than the cap), but never under-cap (never shows more than the cap). Space: O(campaigns) instead of O(users * campaigns). Bloom filter variant: just check "has user seen >= N impressions?" — binary answer, very compact.”}}]}

See also: Snap Interview Prep

See also: Twitter/X Interview Prep

See also: Meta Interview Prep

Scroll to Top