Overview
A funnel analysis service determines what fraction of users complete an ordered sequence of steps — for example, landing page → signup → onboarding → first purchase. It must handle arbitrary event sequences, configurable conversion windows, step-level filtering, cohort segmentation, and both real-time and batch computation modes. This low level design covers the data model, matching algorithm, storage schema, query API, and the key trade-offs between real-time and batch approaches.
Requirements
Functional
- Define funnels as an ordered list of event steps, each with optional property filters.
- Compute per-step user counts and conversion rates.
- Support conversion windows (e.g., user must complete all steps within 7 days of step 1).
- Support strict ordering (steps must occur in exact order) and loose ordering (steps can have intervening events).
- Segment results by dimensions (country, device, plan, etc.).
- Support cohort analysis (group users by the date they entered the funnel).
- Provide both near-real-time results (data updated within 5 minutes) and historical results.
Non-functional
- Support funnels with up to 10 steps.
- Query response time < 5 s for last-30-day funnels on tenants with up to 10 M users.
- Event throughput: shares the 1 M events/sec ingestion pipeline with the broader analytics system.
- Support up to 1 000 active funnel definitions per tenant.
Funnel Definition Schema
{
funnel_id: "f_abc123",
tenant_id: "acme",
name: "Signup to Purchase",
steps: [
{
step_index: 0,
event_type: "landing_page_view",
filters: []
},
{
step_index: 1,
event_type: "signup_completed",
filters: [
{field: "properties.source", op: "eq", value: "organic"}
]
},
{
step_index: 2,
event_type: "onboarding_completed",
filters: []
},
{
step_index: 3,
event_type: "purchase",
filters: [
{field: "properties.plan", op: "in", value: ["pro","enterprise"]}
]
}
],
conversion_window_hours: 168, // 7 days
order_mode: "strict", // or loose
created_at: "2024-04-01T00:00:00Z"
}
Event Sequence Matching Algorithm
Core Problem
Given a user's event history (ordered by timestamp), determine which funnel steps they completed in order within the conversion window. The algorithm must be correct under both strict and loose ordering modes.
Strict Ordering (most common)
In strict mode, each step must be the immediately next matching event after the previous step (no intervening events of later step types are allowed between steps).
function match_strict(events, steps, window_hours):
step_idx = 0
entry_ts = null
results = [] // (step_index, matched_event_id, ts)
for event in events (sorted by ts asc):
if step_idx == 0:
if matches(event, steps[0]):
entry_ts = event.ts
results.append((0, event.event_id, event.ts))
step_idx = 1
else:
// Check conversion window
if event.ts - entry_ts > window_hours * 3600:
break
// In strict mode: if event matches a LATER step, invalidate
// intermediate steps that were skipped
if matches(event, steps[step_idx]):
results.append((step_idx, event.event_id, event.ts))
step_idx += 1
if step_idx == len(steps):
break
elif any_later_step_matches(event, steps, step_idx):
// Skip: do not advance, strict ordering violated
pass
return results // user converted through step_idx - 1
This is O(E) per user where E is the number of events in the time window — typically a few hundred at most.
Loose Ordering
In loose mode, steps must occur in order but can have any intervening events. The algorithm simplifies: scan events in time order, advance the step pointer whenever the current event matches the current step.
function match_loose(events, steps, window_hours):
step_idx = 0
entry_ts = null
for event in events (sorted by ts asc):
if step_idx == 0 and matches(event, steps[0]):
entry_ts = event.ts
step_idx = 1
elif step_idx > 0:
if event.ts - entry_ts > window_hours * 3600:
break
if matches(event, steps[step_idx]):
step_idx += 1
if step_idx == len(steps):
break
return step_idx // number of steps completed
Conversion Window
The conversion window is anchored to step 0 (funnel entry). A user who completes step 0 at time T must complete all subsequent steps before T + window. This means:
- A user who completes steps 0-2 in 5 days but does not complete step 3 within 7 days is counted as converted through step 2, not step 3.
- If the same user later completes step 3 (beyond the window), they are counted as a new funnel entry starting from the next step-0 event.
- Users can enter the funnel multiple times. The service counts each entry separately by default; a "unique users" mode deduplifies on user_id, counting only the first completed step.
Step Ordering Nuance: Re-entry
If a user completes step 0 multiple times (e.g., visits the landing page twice), the service creates multiple funnel entries, one per step-0 occurrence. Each entry is evaluated independently. The "entered funnel" count at step 0 is the number of entries, not unique users. This is the standard Mixpanel/Amplitude behavior.
Batch Computation
When to Use Batch
Historical funnel queries (date ranges in the past) are computed by a batch Spark job. The job runs on demand (triggered by the query API for cache misses) and on a nightly schedule to pre-warm the cache for common date ranges.
Spark Job
// Pseudo-code
events_df = spark.read.clickhouse(
"SELECT user_id, event_type, received_at, properties FROM events "
+ "WHERE tenant_id = 'acme' "
+ "AND received_at BETWEEN start AND end + conversion_window"
)
// Add funnel step index to each event (-1 if does not match any step)
labeled_df = events_df
.withColumn("step_index", match_steps_udf(col("event_type"), col("properties")))
.filter(col("step_index") >= 0)
// Per user, apply sequence matching
results_df = labeled_df
.groupBy("user_id")
.agg(match_funnel_udf(
collect_list(struct("step_index", "received_at")).orderBy("received_at"),
conversion_window_hours,
order_mode
).alias("max_step"))
// Aggregate across users
funnel_df = results_df
.groupBy("max_step")
.count()
.orderBy("max_step")
The match_funnel_udf is a Python or Scala UDF implementing the matching algorithm above. Grouping by user_id before applying the UDF means the per-user event list fits in memory for all realistic users.
Cohort Segmentation in Batch
To segment by cohort (entry date), the batch job partitions funnel entries by toDate(entry_ts) and computes the funnel for each date separately. This produces a time series of daily cohort conversion rates without re-scanning all events — only the entry date partitioning changes.
Real-Time Computation
When to Use Real-Time
For the "last 24 hours" or "today" time range, batch latency is unacceptable. The real-time path uses a Flink stateful job that maintains per-user funnel state as events arrive.
Flink Stateful Funnel Job
State per user (keyed stream on user_id):
UserFunnelState {
active_entries: List[FunnelEntry]
// FunnelEntry = {entry_ts, max_step_completed, last_step_ts}
}
On each incoming event:
- Determine which funnel steps the event matches (an event can match steps in multiple funnel definitions — the job handles all active funnels for the tenant).
- For each active entry, attempt to advance the step pointer.
- Expire entries where now() – entry_ts > conversion_window_hours (no longer eligible).
- If the event matches step 0, create a new entry.
- Emit a rollup update to a Redis sorted set whenever an entry advances or expires.
Real-Time State Storage
Per-funnel live counts are stored in Redis sorted sets, one per funnel per step:
Key: funnel:{tenant}:{funnel_id}:step:{step_idx}:date:{YYYYMMDD}
Type: PFADD (HyperLogLog for unique user counts)
INCR for total event counts
The Query API reads from Redis for real-time ranges and from the batch result cache (stored in ClickHouse or S3) for historical ranges. The handoff point is configurable (default: use real-time for ranges ending within the last 6 hours).
Query API
Request
POST /v1/funnels/query
{
"funnel_id": "f_abc123",
"tenant_id": "acme",
"time_range": {
"start": "2024-04-06T00:00:00Z",
"end": "2024-04-13T00:00:00Z"
},
"segment_by": "country",
"cohort_by": "entry_date",
"filters": [
{field: "country", op: "eq", value: "US"}
]
}
Response
{
"funnel_id": "f_abc123",
"steps": [
{step_index: 0, event_type: "landing_page_view", count: 10000, conversion_rate: 1.0},
{step_index: 1, event_type: "signup_completed", count: 3200, conversion_rate: 0.32},
{step_index: 2, event_type: "onboarding_completed", count: 2100, conversion_rate: 0.656},
{step_index: 3, event_type: "purchase", count: 840, conversion_rate: 0.4}
],
"cohorts": [
{
"date": "2024-04-06",
"steps": [...]
}
],
"segments": [
{"country": "US", "steps": [...]}
],
"computed_at": "2024-04-13T10:05:00Z",
"data_freshness": "batch" // or realtime
}
Funnel Result Storage (Batch Cache)
Completed batch funnel results are stored in ClickHouse:
CREATE TABLE funnel_results (
tenant_id String,
funnel_id String,
time_range_start Date,
time_range_end Date,
segment_key String, -- e.g. "country:US" or "__all__"
cohort_date Date,
step_index UInt8,
user_count UInt64,
computed_at DateTime
) ENGINE = ReplacingMergeTree(computed_at)
PARTITION BY (tenant_id, toYYYYMM(time_range_start))
ORDER BY (tenant_id, funnel_id, time_range_start, time_range_end, segment_key, cohort_date, step_index);
The Query API first checks this table. On cache miss, it enqueues a batch Spark job and returns HTTP 202 with a job_id. The client polls GET /v1/funnels/jobs/{job_id} until status = completed, then re-issues the original query.
Handling Large Tenant Volumes
For tenants with > 10 M users in the query window, the batch Spark job uses a sampling approach for exploratory queries: compute the funnel on a 10% sample (stratified by user_id hash) and scale up the counts. The response includes a sampled: true flag and a confidence interval. Full (unsampled) results are available with a longer SLA (up to 60 s).
Correctness Edge Cases
- Clock skew: events arriving out of order due to mobile offline buffering. The batch job uses received_at for ordering; the real-time job uses a bounded watermark. Historical batch results are authoritative.
- User merge: when two anonymous IDs are merged into one user_id (e.g., sign in on a second device), the funnel entry created under the old anonymous_id may now be attributed to the canonical user_id. The nightly identity resolution job rewrites affected funnel_results rows.
- Funnel definition changes: changing a funnel definition (adding a step, changing filters) invalidates all cached results for that funnel. The service stores a funnel_version hash in funnel_results and invalidates on mismatch.
- Partial completions near the time range boundary: a user who entered the funnel at the end of the query window may not have had time to complete later steps. The service excludes entries where entry_ts + conversion_window > time_range_end to avoid undercounting conversions for recent entrants.
Interview Tips
- The interviewer will often ask about the strict vs loose ordering trade-off. Strict ordering is more intuitive for product funnels (no skipping steps) but harder to implement correctly. Be prepared to code the matching algorithm on a whiteboard.
- Explain the re-entry semantics clearly: most analytics tools count funnel entries, not unique users at step 0. Know how to switch between modes.
- Discuss when to use real-time Flink vs batch Spark: real-time state per user is expensive to maintain for thousands of concurrent funnel definitions. A common approach is to limit real-time to tenant-default funnels and use batch for ad-hoc queries.
- The conversion window boundary problem (excluding incomplete entries near the end of the time range) is a subtle correctness issue that demonstrates depth of understanding.
- Know ClickHouse FINAL keyword: SELECT … FROM funnel_results FINAL is needed when reading from ReplacingMergeTree before background merges complete, to guarantee deduplication of concurrent batch writes.
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering
See also: Uber Interview Guide 2026: Dispatch Systems, Geospatial Algorithms, and Marketplace Engineering