Question 1

How do you ensure consistent experiment assignment for the same user across sessions?

Accepted Answer

Use deterministic hash-based assignment: hash(user_id + experiment_id) mod 100 gives a stable bucket (0-99) for each user-experiment pair. The bucket never changes for the same user and experiment, so the user always sees the same variant whether they reload, log out, or return days later. This is computed locally with no database lookup. The hash must be consistent across all services and platforms — use the same algorithm (e.g., MurmurHash3) everywhere.

Question 2

What is the mutual exclusion problem in A/B testing and how do experiment layers solve it?

Accepted Answer

Mutual exclusion: if users can be in multiple experiments simultaneously, the experiments' effects confound each other (you cannot tell which variant caused an observed change). Layers solve this: each layer is an independent traffic partition. An experiment lives in exactly one layer and owns a slice of that layer's traffic. A user is assigned to at most one experiment per layer, but can be in experiments across different layers simultaneously, since different layers test orthogonal features. Google's Overlapping Experiment Infrastructure popularized this approach.

Question 3

How does CUPED reduce variance in A/B test metric analysis?

Accepted Answer

CUPED (Controlled-experiment Using Pre-Experiment Data) uses a pre-experiment covariate (e.g., the same metric measured in the week before the experiment) to reduce variance in the treatment metric. Adjusted metric = metric - theta * (covariate - mean(covariate)), where theta = Cov(metric, covariate) / Var(covariate). Since the covariate is uncorrelated with treatment assignment, this adjustment does not bias the estimate. It typically reduces variance by 40-70%, giving the same statistical power with fewer users or shorter experiment runtime.

Question 4

How do you handle novelty effects and ramp-up strategy for a new feature experiment?

Accepted Answer

Novelty effect: users engage more with any new feature simply because it is new, inflating treatment metrics. Mitigation: run the experiment for at least 2 full weeks; novelty effects typically decay within a week. Also analyze by user tenure in treatment — if long-exposed users show lower lift than recent entrants, novelty is a factor. Ramp-up: start with 1% traffic to catch crashes and data pipeline issues before full rollout. Use canary metrics (error rate, latency, crash rate) as kill-switch triggers. Progressively increase to 5%, 10%, 50% before analyzing.

Question 5

What is the minimum detectable effect (MDE) and how does it affect experiment design?

Accepted Answer

MDE is the smallest true effect size the experiment is designed to detect with the specified statistical power (typically 80%). A smaller MDE requires more users (longer runtime). Formula: n = (z_alpha/2 + z_beta)^2 * 2 * sigma^2 / delta^2, where delta is the MDE, sigma^2 is metric variance, z_alpha/2 is the critical value for the false positive rate, and z_beta for the false negative rate. In practice: use a sample size calculator, input your baseline metric value, expected relative lift (MDE), significance level (0.05), and power (0.80) to get required users per variant.

System Design: A/B Testing Platform — Experiment Assignment, Metric Collection, and Statistical Analysis

Requirements

Experiment Assignment

Event Collection and Metric Pipeline

Statistical Analysis

Experiment Management and Guardrails