Question 1

Why use two separate hashes for enrollment and variant assignment?

Accepted Answer

One hash for both enrollment and variant creates a correlation: users in the "enrolled" set are those whose hash falls below traffic_pct, and within that set, the variant assignment reuses the same hash space. This can create non-uniform variant splits. Example: traffic_pct=10, two variants 50/50. With one hash, users with hash 0-9 are enrolled; users 0-4 get variant A, 5-9 get variant B. But this means users at the enrollment boundary (hash=8,9) always get variant B and are never available for variant A — the variant split within the enrolled group is correct but the enrolled group itself is a biased slice of the population. Two independent hashes avoid this: enrollment_hash = MD5("enroll:key:user") % 100, variant_hash = MD5("assign:key:user") % total_weight. The enrolled population is hash-uniform; the variant assignment within it is also hash-uniform and independent.

Question 2

How do you prevent novelty effect bias from inflating treatment metrics?

Accepted Answer

The novelty effect: users interact more with a new UI simply because it is new — engagement metrics spike in week 1 and decay back to baseline by week 3. If you conclude an experiment after 3 days showing a 20% lift, you are measuring the novelty effect, not the true effect. Mitigation: (1) run the experiment for at least 2 full business cycles (2 weeks minimum for weekly-cycle products); (2) analyze the time trend within the treatment group — if lift is decaying week-over-week, the novelty effect is present; (3) exclude users in their first N days of product use from the experiment — new users have their own novelty effects unrelated to the feature; (4) hold-out cohort: track the treatment group for 30 days post-assignment even if the feature ships — compare week-1 vs. week-4 metrics to measure novelty decay.

Question 3

How do you handle a user who clears cookies and gets a new session mid-experiment?

Accepted Answer

Cookie clearing creates a new anonymous session — the experiment system assigns the user to a variant based on user_id, not session. If the user is authenticated, the assignment lookup is by user_id and is stable regardless of session state. The UNIQUE(experiment_id, user_id) constraint ensures the database assignment record persists across sessions. The problem arises for unauthenticated experiments: if assignment is session-based, a new session means a new bucket. For unauthenticated A/B testing: use a long-lived cookie (1-year expiry) as the stable identifier, not a session cookie. On cookie clear, the user gets a new assignment — acceptable for most web experiments (the impact is small — most users don't clear cookies). If stable assignment for unauthenticated users is critical (e.g. pricing experiment), require authentication before enrollment.

Question 4

What is the minimum sample size needed before concluding an experiment?

Accepted Answer

Sample size depends on three inputs: (1) baseline conversion rate (p); (2) minimum detectable effect (MDE) — the smallest relative lift worth detecting; (3) statistical power (typically 80%). Formula: n ≈ 16 * p * (1-p) / (MDE_absolute)^2 per variant for α=0.05, power=0.80. Examples: p=5% baseline, 10% relative MDE (detect 0.5pp lift) → n≈3,040/variant; p=5%, 5% relative MDE (detect 0.25pp lift) → n≈12,160/variant. Running underpowered experiments (stopping at 200 users) leads to false negatives (missing real effects) or false positives (peeking at p-value and stopping when it first crosses 0.05). The "peeking problem": if you check significance 10 times during the experiment, your true false positive rate is ~40%, not 5%. Fix: use Sequential testing methods (mSPRT) that allow valid early stopping, or pre-commit to a sample size and check only once.

Question 5

How do you measure the long-term impact of an experiment beyond the test period?

Accepted Answer

Short-run experiment metrics (7-day conversion) may not capture long-run effects (30-day retention, 6-month LTV). A treatment that improves checkout conversion by 5% might reduce retention by 2% — you shipped a net-negative change. Long-run measurement: (1) holdback group — keep 5% of users in the control permanently after shipping, even after 100% rollout. Compare holdback vs. treatment users at 30, 60, 90 days. (2) Cohort analysis — track the assignment cohorts in the data warehouse: SELECT variant, AVG(ltv_90d) FROM ExperimentAssignment JOIN UserLTV USING (user_id) WHERE experiment_id=$id GROUP BY variant. (3) Observational analysis — after the experiment, the assignment record exists in the database. Run the regression 6 months later: users assigned to treatment vs. control, controlling for confounders. This is cheaper than maintaining a live holdback but less rigorous.

A/B Experiment System Low-Level Design: Assignment Engine, Metric Tracking, and Statistical Significance

A/B Experiment System: Low-Level Design

Core Data Model

Assignment Engine

Event Tracking

Results Aggregation Query

Statistical Significance Check

Key Design Decisions