Holdout Group Service: Low-Level Design
A holdout group service maintains a permanently withheld cohort of users who never receive new features, enabling long-term measurement of the cumulative impact of all product changes. Unlike individual A/B tests which measure single feature effects, a holdout measures the compounding value of everything shipped over months.
The Holdout Concept
Individual A/B tests answer “does feature X improve metric Y?” But they can't answer “what is the total value of all the features we shipped this quarter?” The holdout group answers that question:
- X% of users are permanently assigned to the holdout — they never receive new features
- The remaining users receive features normally
- After 6–12 months, compare holdout users vs non-holdout users on long-term metrics (retention, LTV, engagement)
- The difference is the cumulative impact of all shipped features
Assignment Stability
Holdout assignment must be permanent and stable. Unlike A/B test assignments which can be re-randomized between experiments, holdout membership never changes:
- Assigned at account creation time based on
hash(user_id + holdout_salt) mod 100 < holdout_pct - The salt is a fixed secret — never rotated
- Re-randomization would contaminate the holdout by gradually exposing holdout users to features
- New users are assigned to holdout or non-holdout at signup; existing users retain their historic assignment
Holdout Schema
holdout_assignments(
user_id UUID PRIMARY KEY,
holdout_group_id VARCHAR, -- e.g., "global_holdout_2025"
assigned_at TIMESTAMP,
is_holdout BOOLEAN
)
The table is append-only and never updated. A separate holdout_groups table defines the holdout configuration: size, start date, description, and owning team.
Interaction with the A/B Assignment System
The holdout check runs as the first step in experiment assignment:
- Look up
is_holdoutfor the user - If
is_holdout = true, skip all experiment assignment — return default (control) experience for everything - If
is_holdout = false, proceed with normal experiment assignment logic
Holdout users are excluded from all feature flags and experiments. They must receive the baseline product experience as it existed at holdout start date.
Cumulative Impact Measurement
After the holdout period (typically 6–12 months), an analysis compares the two cohorts:
- Metrics: 90-day retention, average revenue per user, feature adoption, engagement depth
- Statistical test: two-sample t-test or Mann-Whitney U on the metric distributions
- Effect size: percent lift of non-holdout vs holdout on each metric
This analysis captures compounding effects that individual A/B tests miss — features that individually showed small lifts may compound to a large cumulative effect, or may partially cancel each other out.
Bias Detection
Before drawing conclusions, verify that the holdout and non-holdout cohorts are comparable on pre-holdout baseline metrics:
- Age of account, activity level, geography, device type
- Run a balance check: t-test on each baseline metric — if any show significant imbalance, the holdout assignment is flawed
- Covariate adjustment (CUPED): reduce variance in the analysis by controlling for pre-experiment metric values, increasing statistical power without increasing sample size
Holdout Size Trade-Off
- Larger holdout — more statistical power, more accurate cumulative measurement, but more users deprived of improvements
- Smaller holdout — less deprivation, but higher variance in the cumulative measurement — may not detect small effects
- Typical range: 1–5% of the user base. At 1%, a large product with millions of users still has enough sample size for reliable measurement
- New features must explicitly exclude holdout users even if they are not A/B tested — the holdout team reviews all feature flag configurations
Contamination Prevention
Social products face a specific challenge: holdout users interact with non-holdout users, and new features affecting non-holdout users can indirectly change holdout user behavior (network effects, viral content, etc.). Mitigation strategies:
- Cluster-based holdout — assign entire social clusters (friend groups) to holdout together, minimizing cross-group interaction
- Metric selection — use metrics that are less susceptible to interference (individual-level behavior vs social graph metrics)
- Contamination quantification — estimate the magnitude of spillover by analyzing holdout users' exposure to non-holdout users' content
Stratified Holdout
The holdout must be representative across all user segments to avoid selection bias in the cumulative measurement:
- Stratify assignment by geography, user tenure, platform (iOS/Android/web), and activity tier
- Verify stratification by comparing segment distributions between holdout and non-holdout
- If the product's user base is growing fast, account for the fact that new users joining during the holdout period will have different baseline behavior than existing users
Holdout Graduation
At the end of the holdout period, the holdout group is “graduated” — released to receive all current features simultaneously:
- Measure the short-term impact of releasing all withheld features at once as a validation signal
- The cumulative impact measured at graduation should be consistent with the longitudinal comparison
- After graduation, the holdout group dissolves — users are eligible for future experiments normally
- A new holdout cohort may be created for the next measurement period with a fresh random assignment
Summary
The holdout group service complements individual A/B tests by providing long-term cumulative impact measurement. Permanent assignment stability, explicit exclusion from all experiments and feature flags, bias verification, contamination prevention, and stratified composition are the key engineering requirements for a valid holdout that produces trustworthy results.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is the purpose of a permanent holdout group?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A permanent holdout group is a fixed cohort of users deliberately excluded from all feature launches for an extended period (months to years), enabling measurement of the cumulative compound effect of many individual features that each showed small but statistically significant individual lifts. Without a holdout, the baseline keeps shifting with each launch, making it impossible to attribute metric changes to the aggregate portfolio of shipped features.”
}
},
{
“@type”: “Question”,
“name”: “How is holdout assignment kept stable over time?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Holdout membership is determined by a deterministic hash of (holdout_namespace_id + user_id) that maps to a fixed bucket range, identical in structure to experiment assignment but never reassigned. New users who hash into the holdout range are automatically added as they join, and existing members are never moved out, preserving the long-term integrity of the control cohort.”
}
},
{
“@type”: “Question”,
“name”: “How does a holdout measure cumulative feature impact?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “At the end of a measurement period, key metrics (e.g., DAU, revenue per user, retention) are compared between the holdout (baseline experience) and the general population (all features launched), with the delta representing the cumulative lift attributable to the full set of shipped changes. This guards against the scenario where many features show individual A/B wins but interact negatively in production, resulting in a net neutral or negative outcome.”
}
},
{
“@type”: “Question”,
“name”: “How is holdout contamination prevented in social products?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “In social networks, holdout users interact with non-holdout users who have received new features, creating network interference that can inflate the holdout's metrics (SUTVA violation). Contamination is mitigated by using cluster-based randomization — assigning entire social clusters or geographic units to holdout rather than individuals — so that holdout users primarily interact with other holdout users.”
}
}
]
}
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering
See also: Anthropic Interview Guide 2026: Process, Questions, and AI Safety