What Is an Experiment Framework?
An Experiment Framework is the foundational infrastructure layer that enables product teams to run controlled experiments (A/B tests, multivariate tests, holdouts, and feature rollouts) at scale across an organization. Unlike a single A/B testing tool, a framework is a platform that many teams share simultaneously, enforcing consistency in randomization, metric ownership, and statistical rigor. It is the backbone of data-driven product development at companies like Google, Meta, and Uber.
Data Model
-- Namespaces prevent experiment collisions
namespaces (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
key VARCHAR(128) UNIQUE,
description TEXT,
total_slots INT DEFAULT 1000 -- slots available for experiments
);
-- Experiments claim slots within a namespace
experiments (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
namespace_id BIGINT REFERENCES namespaces(id),
key VARCHAR(128) UNIQUE,
type ENUM('ab','multivariate','holdout','rollout'),
slot_start INT,
slot_end INT,
owner_team VARCHAR(128),
status ENUM('draft','running','stopped','archived'),
created_at TIMESTAMP
);
-- Variants within experiments
variants (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
experiment_id BIGINT REFERENCES experiments(id),
key VARCHAR(64),
allocation FLOAT, -- fraction of experiment traffic
config JSON
);
-- Metrics registered by teams
metrics (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
key VARCHAR(128) UNIQUE,
name VARCHAR(256),
type ENUM('conversion','mean','ratio','percentile'),
event_type VARCHAR(128),
aggregation VARCHAR(32) -- sum, count, avg, p95
);
-- Experiment-metric bindings (primary + guardrail metrics)
experiment_metrics (
experiment_id BIGINT,
metric_id BIGINT,
role ENUM('primary','guardrail','informational'),
PRIMARY KEY (experiment_id, metric_id)
);
Core Algorithm: Namespace-Based Randomization
The framework must support many simultaneous experiments without their assignments interfering with each other. The namespace model solves this:
- Namespace slot assignment: hash the entity ID within a namespace to one of N slots (e.g., 1000). Each experiment claims a contiguous range of slots.
- Experiment lookup: find which experiment owns the entity’s slot. If none, the entity is in the holdout or unallocated pool.
- Variant assignment: within the matched experiment, apply a second hash (using the experiment key as a salt) to assign the entity to a variant by allocation weights.
- Layer stacking: different namespaces represent independent layers (e.g., UI layer, ranking layer, pricing layer). An entity participates in one experiment per layer simultaneously, and cross-layer independence is guaranteed because salts differ.
Pseudocode for a single layer lookup:
slot = hash(namespace_key + entity_id) mod total_slots
experiment = find_experiment_by_slot(namespace_id, slot)
if experiment is None: return default_config
variant_bucket = hash(experiment.key + entity_id) mod 100
variant = assign_by_cumulative_weight(experiment.variants, variant_bucket)
return variant.config
Metric Collection and Analysis Pipeline
The framework ingests raw events from application services and joins them with assignment data to produce per-variant metric aggregates:
- Event ingestion: application services emit events (clicks, conversions, latency samples) to a Kafka topic tagged with entity ID and timestamp.
- Assignment join: a streaming job (Flink or Spark Structured Streaming) looks up the entity’s experiment assignment at event time and enriches each event record.
- Aggregation: enriched events land in a columnar store (ClickHouse, BigQuery, Druid). Scheduled jobs compute per-variant metric values, standard errors, and p-values or Bayesian posteriors.
- Guardrail alerts: if a guardrail metric (e.g., error rate, p99 latency) degrades beyond a threshold, the framework triggers an alert and can optionally auto-stop the experiment.
Failure Handling and Performance
- SDK-local evaluation: all assignment logic runs in-process using a cached config snapshot. No network call is on the critical path.
- Config propagation SLA: target < 30 seconds from a flag/experiment change to all SDK instances seeing the update. Use SSE push + local polling fallback.
- Idempotent event delivery: event producers include a UUID; the ingestion layer deduplicates on write to prevent inflated metric counts during retries.
- Experiment collision detection: the framework UI warns if two experiments targeting the same population would compete for slots, preventing unintentional under-allocation.
Scalability Considerations
- Config serving: publish experiment configs as versioned JSON bundles to a CDN. SDKs download diffs rather than full snapshots to minimize bandwidth.
- Metric store partitioning: partition by experiment ID and date. Retention policies archive raw events after 90 days while preserving aggregates indefinitely.
- Self-service and governance: at scale, hundreds of experiments run simultaneously. The framework must provide a UI for experiment creation, a review workflow for statistical setup, and automated checks (minimum detectable effect, required sample size) before an experiment goes live.
- Holdout groups: reserve a global holdout (e.g., 1–2% of users who see no experiments) to measure the cumulative effect of all shipped features over time.
Summary
An Experiment Framework is a multi-tenant experimentation infrastructure that enforces statistical correctness, prevents cross-experiment contamination, and scales to support an entire engineering organization. The key design decisions are namespace-based slot allocation for isolation, SDK-local evaluation for performance, and a streaming pipeline for metric enrichment. In interviews, emphasize the difference between a one-off A/B test and a shared framework: the latter requires governance, collision prevention, guardrail automation, and a robust config delivery system.
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering
See also: Anthropic Interview Guide 2026: Process, Questions, and AI Safety