Low-Level Design: Subscription Box Service — Curation, Billing Cycles, Inventory Allocation, and Churn

Core Entities

Subscriber: subscriber_id, user_id, plan_id, status (ACTIVE, PAUSED, CANCELLED), billing_cycle_anchor, next_billing_date, shipping_address, preferences[], skip_count. Plan: plan_id, name, price_cents, billing_interval (MONTHLY, QUARTERLY), box_size (SMALL, MEDIUM, LARGE), items_per_box. Box: box_id, subscriber_id, billing_period (2024-01), status (PENDING, CURATED, PACKED, SHIPPED, DELIVERED), curation_notes, items[]. Product: product_id, name, category, cost_cents, inventory_quantity, tags[]. BoxItem: box_id, product_id, quantity, unit_cost_cents.

Billing Cycles

Subscription billing must be reliable and idempotent. Billing anchor: the day of month the customer first subscribed (anchor=15 -> billed on the 15th of each month). On billing date: (1) Verify subscriber is ACTIVE. (2) Check skip requests (subscriber requested to skip this month). (3) Attempt payment via payment provider using stored payment method. (4) On success: create a Box record for this billing period, mark billing period as paid. (5) On payment failure: retry with exponential backoff (retry on day+1, day+3, day+7). After 3 failed attempts: dunning email sequence (update payment method request), suspend the subscription. Idempotency: store (subscriber_id, billing_period) with a unique constraint — the billing job can safely retry without double-charging.

Box Curation

Curation selects which products go into each subscriber box based on preferences and inventory. Rule-based curation: match subscriber tags (e.g., “vegan”, “fitness”) to product tags. Exclude products previously sent to this subscriber (track history per subscriber). Ensure items_per_box products are selected, covering required categories (e.g., at least 1 snack, 1 beauty item). ML-based curation: collaborative filtering on subscriber-product rating history. Predict ratings for unseen products; select the top-N highest predicted rating products that pass inventory checks. Curation pipeline: run as a batch job 5-7 days before shipping date (to allow inventory locking). Generate a curation proposal; human review for QA; lock inventory on approval.

Inventory Allocation

Inventory must be reserved before boxes are packed. Challenge: with 100K subscribers and 20 product SKUs per box period, some products may sell out before all boxes are curated. Allocation strategy: (1) Run allocation in subscriber priority order (longest-tenured subscribers first). (2) For each box: decrement inventory.available for each selected product. If any product is out of stock: swap to the next best alternative (same category, same tags). (3) Reserve inventory atomically: UPDATE products SET reserved = reserved + qty, available = available – qty WHERE available >= qty. If 0 rows affected: out of stock, apply substitution logic. Track allocation conflicts (substitution rate) as a supply chain KPI.

Churn Prediction and Prevention

Churn prediction model features: number of boxes skipped in the last 3 months, days since last login, product rating average (low ratings predict churn), subscriber tenure, customer service contact frequency, payment failure history. Model: gradient boosting classifier trained on churned vs retained subscribers. Score all active subscribers monthly; flag high-risk subscribers (churn probability > 0.7). Retention actions: personalized discount offer (10% off next 3 months), survey to understand dissatisfaction, curation adjustment (swap categories the subscriber has rated poorly), pause offer (pause up to 2 months without cancellation). Track retention action effectiveness: A/B test offers against a control group. Store churn_score and risk_tier per subscriber in the subscriber table.

Skip and Pause Management

Subscribers can skip a month or pause for up to 3 months. Skip: before the billing cutoff (typically 5 days before billing date), subscriber requests a skip. Store skip_request for the current billing period. Billing job checks for skip requests before attempting payment. If skipped: no charge, no box for that period. Next billing date advances by one interval. Pause: set subscriber.status = PAUSED, pause_until = future_date. Billing job skips PAUSED subscribers. Resume: either automatically on pause_until date or manually. Resume sends a welcome-back email and schedules the next billing cycle. Limit skips: allow 2-3 skips per year (excess skips indicate a churn risk, trigger a retention workflow instead of allowing another skip).

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How do you prevent double-charging in subscription billing?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Subscription billing must be idempotent — running the billing job twice for the same period must not charge the customer twice. Implementation: store a billing_records table with a unique constraint on (subscriber_id, billing_period). Before charging: INSERT INTO billing_records (subscriber_id, billing_period, status=PENDING) — if this fails with a unique constraint violation, the period was already billed, skip. On successful payment: update status=PAID. On failure: update status=FAILED for retry tracking. The payment provider call uses an idempotency key: hash(subscriber_id + billing_period). If the provider receives the same key twice, it returns the original response without re-charging. This two-layer idempotency (database + payment provider) ensures no double charges even if the job crashes mid-execution.”
}
},
{
“@type”: “Question”,
“name”: “How do you run subscription billing at scale (millions of subscribers)?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Billing one million subscribers sequentially would take hours. Parallel batch processing: partition subscribers by ID ranges and run billing workers in parallel. Use a message queue: enqueue one billing message per subscriber, worker pool consumes and processes. Stagger billing: not all subscribers are billed on the same day — billing_cycle_anchor spreads billing across the month (anchor=1 for day 1, anchor=15 for day 15). This flattens the billing load. Priority queues: process subscribers on their exact billing date first; retry failed subscribers in a lower-priority queue. Idempotency: each billing job is re-runnable. Monitoring: track billing_success_rate, payment_failure_rate, retry_queue_depth in real time. Alert if success rate drops below 95%.”
}
},
{
“@type”: “Question”,
“name”: “How does the box curation algorithm work?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Curation selects products for each subscriber box while satisfying constraints: (1) Category requirements: box must include at least 1 item per required category. (2) Preference matching: items must match subscriber preference tags (vegan, fragrance-free). (3) Novelty: items must not have been sent to this subscriber in the past N months (track per-subscriber item history). (4) Inventory availability: items must have sufficient stock. Rule-based approach: filter products by category and preference tags, exclude recent items, select randomly or by popularity. ML approach: train a rating prediction model on subscriber ratings, predict scores for all eligible products, rank by predicted rating, select top N while satisfying category constraints. Output: a ranked list of products per subscriber. Human review gates the final selection before inventory is locked.”
}
},
{
“@type”: “Question”,
“name”: “How do you handle payment failures in subscription billing?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Payment failures fall into two categories: hard failures (card expired, fraud block — will not succeed on retry) and soft failures (temporary decline, insufficient funds — may succeed later). Dunning management: on first failure, retry after 1 day. On second failure: retry after 3 days + send email asking to update payment. On third failure: retry after 7 days + final warning email. After 3 failures: suspend the subscription (status=SUSPENDED), send a cancellation notice with a reactivation link. For hard failures (returned by payment provider decline codes): skip the retry cycle, send an immediate update payment request. Grace period: keep the subscription active for 7 days after a failure to allow payment recovery without losing the subscription. Track failure reason codes from the payment provider for analytics.”
}
},
{
“@type”: “Question”,
“name”: “How would you implement a churn prediction system for subscriptions?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Churn prediction uses behavioral signals to identify at-risk subscribers before they cancel. Feature engineering: months_since_last_skip (more skips = higher churn risk), average_product_rating (low satisfaction), days_since_login (disengagement), number_of_customer_service_contacts (friction), consecutive_months_subscribed (longer = lower churn), total_spend (higher value = lower churn). Model: gradient boosting (XGBoost/LightGBM) or logistic regression. Train on historical data: subscribers who churned (label=1) vs retained (label=0). Score all active subscribers monthly. Intervention thresholds: churn_prob > 0.8: proactive outreach + discount offer. 0.6-0.8: personalized curation adjustment + survey. 0.4-0.6: targeted email campaign. Track intervention lift: compare churn rate of treated vs control cohort with A/B test.”
}
}
]
}

Asked at: Shopify Interview Guide

Asked at: Stripe Interview Guide

Asked at: Airbnb Interview Guide

Asked at: DoorDash Interview Guide

Scroll to Top