Low Level Design: Experiment Framework

What Is an Experiment Framework?

An Experiment Framework is the foundational infrastructure layer that enables product teams to run controlled experiments (A/B tests, multivariate tests, holdouts, and feature rollouts) at scale across an organization. Unlike a single A/B testing tool, a framework is a platform that many teams share simultaneously, enforcing consistency in randomization, metric ownership, and statistical rigor. It is the backbone of data-driven product development at companies like Google, Meta, and Uber.

Data Model

-- Namespaces prevent experiment collisions
namespaces (
  id          BIGINT PRIMARY KEY AUTO_INCREMENT,
  key         VARCHAR(128) UNIQUE,
  description TEXT,
  total_slots INT DEFAULT 1000   -- slots available for experiments
);

-- Experiments claim slots within a namespace
experiments (
  id            BIGINT PRIMARY KEY AUTO_INCREMENT,
  namespace_id  BIGINT REFERENCES namespaces(id),
  key           VARCHAR(128) UNIQUE,
  type          ENUM('ab','multivariate','holdout','rollout'),
  slot_start    INT,
  slot_end      INT,
  owner_team    VARCHAR(128),
  status        ENUM('draft','running','stopped','archived'),
  created_at    TIMESTAMP
);

-- Variants within experiments
variants (
  id            BIGINT PRIMARY KEY AUTO_INCREMENT,
  experiment_id BIGINT REFERENCES experiments(id),
  key           VARCHAR(64),
  allocation    FLOAT,     -- fraction of experiment traffic
  config        JSON
);

-- Metrics registered by teams
metrics (
  id            BIGINT PRIMARY KEY AUTO_INCREMENT,
  key           VARCHAR(128) UNIQUE,
  name          VARCHAR(256),
  type          ENUM('conversion','mean','ratio','percentile'),
  event_type    VARCHAR(128),
  aggregation   VARCHAR(32)   -- sum, count, avg, p95
);

-- Experiment-metric bindings (primary + guardrail metrics)
experiment_metrics (
  experiment_id BIGINT,
  metric_id     BIGINT,
  role          ENUM('primary','guardrail','informational'),
  PRIMARY KEY (experiment_id, metric_id)
);

Core Algorithm: Namespace-Based Randomization

The framework must support many simultaneous experiments without their assignments interfering with each other. The namespace model solves this:

  1. Namespace slot assignment: hash the entity ID within a namespace to one of N slots (e.g., 1000). Each experiment claims a contiguous range of slots.
  2. Experiment lookup: find which experiment owns the entity’s slot. If none, the entity is in the holdout or unallocated pool.
  3. Variant assignment: within the matched experiment, apply a second hash (using the experiment key as a salt) to assign the entity to a variant by allocation weights.
  4. Layer stacking: different namespaces represent independent layers (e.g., UI layer, ranking layer, pricing layer). An entity participates in one experiment per layer simultaneously, and cross-layer independence is guaranteed because salts differ.

Pseudocode for a single layer lookup:

slot = hash(namespace_key + entity_id) mod total_slots
experiment = find_experiment_by_slot(namespace_id, slot)
if experiment is None: return default_config
variant_bucket = hash(experiment.key + entity_id) mod 100
variant = assign_by_cumulative_weight(experiment.variants, variant_bucket)
return variant.config

Metric Collection and Analysis Pipeline

The framework ingests raw events from application services and joins them with assignment data to produce per-variant metric aggregates:

  • Event ingestion: application services emit events (clicks, conversions, latency samples) to a Kafka topic tagged with entity ID and timestamp.
  • Assignment join: a streaming job (Flink or Spark Structured Streaming) looks up the entity’s experiment assignment at event time and enriches each event record.
  • Aggregation: enriched events land in a columnar store (ClickHouse, BigQuery, Druid). Scheduled jobs compute per-variant metric values, standard errors, and p-values or Bayesian posteriors.
  • Guardrail alerts: if a guardrail metric (e.g., error rate, p99 latency) degrades beyond a threshold, the framework triggers an alert and can optionally auto-stop the experiment.

Failure Handling and Performance

  • SDK-local evaluation: all assignment logic runs in-process using a cached config snapshot. No network call is on the critical path.
  • Config propagation SLA: target < 30 seconds from a flag/experiment change to all SDK instances seeing the update. Use SSE push + local polling fallback.
  • Idempotent event delivery: event producers include a UUID; the ingestion layer deduplicates on write to prevent inflated metric counts during retries.
  • Experiment collision detection: the framework UI warns if two experiments targeting the same population would compete for slots, preventing unintentional under-allocation.

Scalability Considerations

  • Config serving: publish experiment configs as versioned JSON bundles to a CDN. SDKs download diffs rather than full snapshots to minimize bandwidth.
  • Metric store partitioning: partition by experiment ID and date. Retention policies archive raw events after 90 days while preserving aggregates indefinitely.
  • Self-service and governance: at scale, hundreds of experiments run simultaneously. The framework must provide a UI for experiment creation, a review workflow for statistical setup, and automated checks (minimum detectable effect, required sample size) before an experiment goes live.
  • Holdout groups: reserve a global holdout (e.g., 1–2% of users who see no experiments) to measure the cumulative effect of all shipped features over time.

Summary

An Experiment Framework is a multi-tenant experimentation infrastructure that enforces statistical correctness, prevents cross-experiment contamination, and scales to support an entire engineering organization. The key design decisions are namespace-based slot allocation for isolation, SDK-local evaluation for performance, and a streaming pipeline for metric enrichment. In interviews, emphasize the difference between a one-off A/B test and a shared framework: the latter requires governance, collision prevention, guardrail automation, and a robust config delivery system.

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering

See also: Anthropic Interview Guide 2026: Process, Questions, and AI Safety

Scroll to Top