Low Level Design: Ad Delivery Service

What Is an Ad Delivery Service?

The Ad Delivery Service is the final stage in the ad pipeline. Once the auction has selected a winning creative, the delivery service is responsible for: serving the creative asset (image, video, HTML) to the user's device, recording the impression, enforcing frequency caps, and triggering billing events. It sits at the intersection of high-throughput asset serving and stateful per-user accounting.

Unlike the targeting and auction layers, which operate on in-memory data and return in single-digit milliseconds, the delivery service must also interact with a CDN for asset delivery and a distributed counter store for frequency capping. Correctness of impression counting directly affects advertiser billing, so this component has stricter durability requirements than the auction log.

Data Model

CREATE TABLE creatives (
    creative_id     BIGINT PRIMARY KEY,
    campaign_id     BIGINT REFERENCES campaigns(campaign_id),
    format          ENUM('banner','video','native'),
    asset_url       TEXT,
    click_url       TEXT,
    width           INT,
    height          INT
);

CREATE TABLE impressions (
    impression_id   UUID PRIMARY KEY,
    auction_id      UUID,
    creative_id     BIGINT,
    user_id         BIGINT,
    slot_id         BIGINT,
    served_at       TIMESTAMP,
    billed          BOOLEAN DEFAULT FALSE
);

CREATE TABLE frequency_caps (
    campaign_id     BIGINT,
    user_id         BIGINT,
    window          ENUM('hour','day','week'),
    max_impressions INT,
    PRIMARY KEY (campaign_id, user_id, window)
);

-- Counters held in Redis, not SQL:
-- KEY: freq:{campaign_id}:{user_id}:{window_bucket}
-- VALUE: integer count, TTL = window duration

Core Algorithm: Frequency Capping

Frequency capping prevents a single user from seeing the same ad too many times, which degrades user experience and wastes advertiser spend. The delivery service enforces caps at serve time using a Redis counter per (campaign, user, time window):

  1. Pre-serve check: Before committing to serve the winning creative, the service calls INCR on the Redis key for each configured window (e.g., daily cap of 5). If any counter exceeds the cap after increment, the service rolls back with DECR and triggers a re-auction with the winner excluded. This adds one round-trip to Redis but keeps the check atomic.
  2. TTL management: Each Redis key's TTL is set to the window duration on creation. The window bucket is computed as floor(unix_timestamp / window_seconds) so all users' daily keys expire at the same wall-clock boundary.
  3. Impression recording: After a successful cap check, the impression is written to a Kafka topic. A consumer fleet persists records to the impressions table in micro-batches and marks the billed flag once the billing system confirms the charge.

Asset Serving Workflow

Creative assets are pre-uploaded to object storage (S3 or equivalent) and distributed to a CDN. The delivery service returns a redirect or an inline asset URL with a tracking pixel appended. Click tracking uses a redirect through the delivery service so click events can be recorded before forwarding the user to the advertiser's landing page. Video completion events are reported client-side via a VAST/VPAID wrapper that calls back to the delivery service at quartile boundaries (25%, 50%, 75%, 100%).

Failure Handling and Latency Requirements

  • Target latency: Asset URL returned to the ad server in <20 ms; CDN serves the asset independently.
  • Redis failure: If the frequency cap store is unreachable, the service fails open (serves the ad without a cap check) to preserve fill rate, and flags the impression for manual review. This is configurable per advertiser.
  • Impression deduplication: The impression_id UUID is generated client-side and sent with the beacon. The delivery service uses an idempotency check (INSERT … ON CONFLICT DO NOTHING) to handle duplicate beacons from retried requests.
  • Billing durability: The Kafka impression topic is configured with replication factor 3 and acks=all. Loss of impression records is treated as a P0 incident since it directly affects revenue accounting.

Scalability Considerations

The delivery service is stateless at the application layer; all per-user state lives in Redis and the database. Redis is sharded by user_id using consistent hashing, with each shard replicated for read scaling. At peak load, frequency cap checks represent the dominant Redis QPS; pipelining multiple cap window checks into a single round-trip reduces latency by 2-3x. CDN offloads nearly all asset bytes from the delivery fleet. Impression Kafka partitions are keyed by user_id to preserve ordering for downstream event-stream joins (e.g., attribution pipelines that match impressions to conversions).

Summary

The Ad Delivery Service closes the ad pipeline loop by serving creatives, enforcing frequency caps, and recording billable impressions with strong durability guarantees. The most interview-worthy design decisions are: atomic Redis INCR for cap enforcement, fail-open vs fail-closed policy for Redis outages, impression deduplication via idempotency keys, and Kafka durability settings for billing-critical event streams.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is the difference between ad serving and ad delivery?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Ad serving refers to the real-time selection and rendering of an ad for a single impression request. Ad delivery is the broader system that manages how a campaign’s budget and goals are spread across time and inventory, ensuring the right number of impressions are served per day without exhausting the budget too early or under-delivering.”
}
},
{
“@type”: “Question”,
“name”: “How do you implement pacing in an ad delivery system?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Pacing controls the rate at which a campaign spends its budget. Standard pacing distributes spend evenly across the day using a throttle probability computed from remaining budget and remaining time. Accelerated pacing enters every eligible auction immediately. Token-bucket algorithms or probabilistic throttling with feedback loops from a central budget server are common implementations.”
}
},
{
“@type”: “Question”,
“name”: “How do you handle frequency capping across distributed ad servers?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Frequency capping limits how many times a user sees a given ad. In a distributed system, each ad server can’t have perfect global state, so approximate counters stored in a shared cache like Redis with short TTLs are used. Some systems use probabilistic data structures like Count-Min Sketch to track frequency with bounded memory and tolerate slight over-delivery.”
}
},
{
“@type”: “Question”,
“name”: “What consistency guarantees does an ad delivery system need, and how are they achieved?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Ad delivery typically favors availability over strict consistency (AP in CAP terms). Budget counters use eventual consistency with periodic reconciliation against a source-of-truth billing ledger. Overspend is bounded by setting a soft cap below the true limit. Critical operations like campaign activation or pausing may require stronger consistency via a distributed lock or a leader-based budget server.”
}
}
]
}

See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Snap Interview Guide

Scroll to Top