User Segmentation Engine Low-Level Design: Real-Time Evaluation, JSONB Rules, and Incremental Updates

What Is a User Segmentation Engine?

A user segmentation engine partitions a user base into named groups (segments) based on attribute rules. Segments power targeted marketing, personalized product experiences, A/B test allocation, and access control. The core challenge is keeping segment membership current as user attributes change — the evaluation must be fast enough to be useful but cheap enough to run continuously at scale.

A segment is defined by a rule set: a tree of conditions (attribute comparisons) connected by AND/OR/NOT operators. Examples: “users who signed up in the last 30 days AND have made at least one purchase AND are in the US” or “users with account_tier = 'premium' OR lifetime_value > 1000”.

Rule Types and JSONB Storage

Storing segment rules as structured JSONB enables programmatic evaluation without a custom DSL parser. A rule set is a JSON object with a logical operator and a list of conditions:

{
  "operator": "AND",
  "conditions": [
    {"attribute": "country", "op": "eq", "value": "US"},
    {"attribute": "account_age_days", "op": "gte", "value": 30},
    {"attribute": "lifetime_value", "op": "gt", "value": 100},
    {
      "operator": "OR",
      "conditions": [
        {"attribute": "account_tier", "op": "in", "value": ["premium", "enterprise"]},
        {"attribute": "referral_source", "op": "eq", "value": "partner"}
      ]
    }
  ]
}

Supported condition operators: eq, neq, gt, gte, lt, lte, in, not_in, contains (substring), is_null, is_not_null. Rules are stored as JSONB in the Segment table, evaluated in application code against a user's attribute snapshot.

Real-Time Evaluation on Attribute Change

The event-driven approach: when a user attribute changes (e.g., a purchase raises lifetime_value from 80 to 120), the system emits an attribute_changed event. The segmentation engine subscribes to this event stream and re-evaluates all segments that reference the changed attribute.

The attribute index inverts the relationship: for each attribute name, maintain a list of segment_ids that reference it. When attribute X changes for user U, look up all segments referencing attribute X and re-evaluate those segments for user U. This avoids evaluating the entire segment portfolio for every attribute change — only segments that could be affected by the change are evaluated.

On re-evaluation, compare the new membership result (in/out) with the current SegmentMembership record. If membership changed, insert a new SegmentMembership record (recording added_at for joins) or set removed_at on the existing one.

Incremental Update with Dirty Flags

For batch recomputation (e.g., nightly full refresh), scanning the entire user table is expensive at scale. The dirty flag approach tracks which users had attribute changes since the last compute run:

Maintain a DirtyUser table (or a flag column in UserAttributeSnapshot) that records users whose attributes changed since the last compute.
The compute job reads only dirty users, re-evaluates all segments for them, and updates SegmentMembership.
After processing, clear the dirty flags.

This reduces compute cost proportionally to the fraction of users whose attributes changed since the last run. For stable user populations (e.g., 1% of users change attributes daily), this is a 100× speedup over full recompute.

Sampling for Size Estimation

Before a full segment compute, stakeholders often want to know approximately how many users a new segment will match. Running a full evaluation is expensive. TABLESAMPLE BERNOULLI samples a random fraction of rows from the user table at the storage layer, bypassing the query planner's filtering. Evaluating the segment rule on a 1% sample and extrapolating gives a ±10% estimate at 1% of the compute cost. For rough go/no-go decisions (e.g., “will this segment have at least 10,000 users?”), 1% sampling is almost always sufficient.

Composable Segments

Composable segments define their membership as a logical combination of other segments rather than raw attribute rules. “US premium users who are also in the high-engagement cohort” can be defined as an intersection of three existing segments. The evaluation engine resolves composable segments by looking up the SegmentMembership sets of the component segments and applying set intersection/union/difference. Composable segments enable segment reuse and avoid duplicating complex rule logic across multiple segments.

Segment Export

Segment membership is consumed by downstream systems: marketing platforms (Braze, Iterable), ad targeting (Facebook Custom Audiences), analytics (Amplitude cohorts), and feature flag services (LaunchDarkly). Exports can be push-based (triggered by compute completion) or pull-based (downstream system polls the SegmentMembership table). Batch exports typically generate a CSV or JSON of user_ids and upload to S3 or send via API. Export size monitoring tracks membership_count over time to detect segment drift (a segment that should have ~100K users suddenly having 10K or 1M indicates a rule or data problem).

SQL Data Model

-- Segment definitions
CREATE TABLE Segment (
    id          BIGSERIAL PRIMARY KEY,
    name        VARCHAR(255) NOT NULL UNIQUE,
    rules       JSONB NOT NULL,             -- rule tree as described above
    is_composable BOOLEAN NOT NULL DEFAULT FALSE,
    created_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Current segment membership
CREATE TABLE SegmentMembership (
    segment_id  BIGINT NOT NULL REFERENCES Segment(id),
    user_id     BIGINT NOT NULL,
    added_at    TIMESTAMPTZ NOT NULL DEFAULT now(),
    removed_at  TIMESTAMPTZ,               -- NULL means currently a member
    PRIMARY KEY (segment_id, user_id)
);

-- Latest user attribute snapshot for evaluation
CREATE TABLE UserAttributeSnapshot (
    user_id     BIGINT PRIMARY KEY,
    attributes  JSONB NOT NULL,
    updated_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Compute job audit log
CREATE TABLE SegmentComputeJob (
    id              BIGSERIAL PRIMARY KEY,
    segment_id      BIGINT NOT NULL REFERENCES Segment(id),
    triggered_by    VARCHAR(64) NOT NULL,   -- "event", "schedule", "manual"
    started_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    completed_at    TIMESTAMPTZ,
    added_count     BIGINT,
    removed_count   BIGINT
);

CREATE INDEX idx_membership_segment ON SegmentMembership(segment_id) WHERE removed_at IS NULL;
CREATE INDEX idx_membership_user ON SegmentMembership(user_id) WHERE removed_at IS NULL;
CREATE INDEX idx_snapshot_updated ON UserAttributeSnapshot(updated_at);

Python Implementation Sketch

import json, random
from typing import Any

def evaluate_condition(condition: dict, attributes: dict) -> bool:
    """Evaluate a single condition against a user's attribute map."""
    attr = condition["attribute"]
    op = condition["op"]
    val = condition.get("value")
    user_val = attributes.get(attr)
    if op == "is_null":
        return user_val is None
    if op == "is_not_null":
        return user_val is not None
    if user_val is None:
        return False
    if op == "eq":       return user_val == val
    if op == "neq":      return user_val != val
    if op == "gt":       return user_val > val
    if op == "gte":      return user_val >= val
    if op == "lt":       return user_val < val
    if op == "lte":      return user_val  bool:
    """Recursively evaluate a rule tree against user attributes."""
    if "attribute" in rules:
        return evaluate_condition(rules, attributes)
    operator = rules.get("operator", "AND")
    conditions = rules.get("conditions", [])
    if operator == "AND":
        return all(evaluate_rules(c, attributes) for c in conditions)
    if operator == "OR":
        return any(evaluate_rules(c, attributes) for c in conditions)
    if operator == "NOT":
        return not evaluate_rules(conditions[0], attributes)
    return False

def evaluate_segment(segment_id: int, user_id: int,
                     attributes: dict, segments_db: dict) -> bool:
    """Evaluate whether a user belongs to a segment."""
    segment = segments_db.get(segment_id)
    if not segment:
        return False
    return evaluate_rules(segment["rules"], attributes)

def incremental_update(changed_user_ids: list[int],
                       segments_db: dict,
                       snapshots_db: dict,
                       membership_db: dict) -> dict:
    """Re-evaluate all segments for users with changed attributes."""
    stats = {"added": 0, "removed": 0}
    for user_id in changed_user_ids:
        attrs = snapshots_db.get(user_id, {}).get("attributes", {})
        for seg_id, segment in segments_db.items():
            currently_member = membership_db.get((seg_id, user_id), False)
            should_be_member = evaluate_rules(segment["rules"], attrs)
            if should_be_member and not currently_member:
                membership_db[(seg_id, user_id)] = True
                stats["added"] += 1
            elif not should_be_member and currently_member:
                membership_db[(seg_id, user_id)] = False
                stats["removed"] += 1
    return stats

def estimate_size(segment_id: int, segments_db: dict,
                  all_users: list[dict], sample_rate: float = 0.01) -> int:
    """Estimate segment size via random sampling."""
    segment = segments_db.get(segment_id)
    if not segment:
        return 0
    sample = random.sample(all_users, max(1, int(len(all_users) * sample_rate)))
    matches = sum(1 for u in sample
                  if evaluate_rules(segment["rules"], u.get("attributes", {})))
    return int(matches / sample_rate)

def export_membership(segment_id: int, membership_db: dict,
                      destination: str) -> list[int]:
    """Return list of current member user_ids for export."""
    members = [uid for (sid, uid), is_member in membership_db.items()
               if sid == segment_id and is_member]
    # In practice: write to S3 as CSV or call marketing platform API
    print(f"Exporting {len(members)} users for segment {segment_id} to {destination}")
    return members

Frequently Asked Questions

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “When should I use incremental update vs full recompute for segment membership?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Use incremental update (dirty flag or event-driven re-evaluation) when: only a small fraction of users change attributes between compute cycles, latency requirements demand near-real-time membership updates, or the user base is too large to scan fully within the compute window. Use full recompute when: segment rules change (a new rule or modified condition requires re-evaluating all users, not just those with attribute changes), data quality issues require a clean rebuild, or the system has not run for a long time and the dirty flag set is stale. Many systems combine both: event-driven incremental for real-time updates and scheduled full recompute nightly for correctness verification.”
}
},
{
“@type”: “Question”,
“name”: “How does JSONB rule evaluation perform at scale?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “JSONB rule evaluation in application code is fast for individual users — a typical rule tree with 5-10 conditions evaluates in microseconds. At scale (millions of users), the bottleneck is reading UserAttributeSnapshot rows from the database, not the evaluation logic itself. Optimization strategies: cache attribute snapshots in Redis for hot users, batch reads using IN clauses or COPY to reduce database round-trips, and use PostgreSQL's JSONB operators to push simple predicates into the database query itself (e.g., attributes->>'country' = 'US' evaluated as a SQL filter). Complex rule trees with nested OR conditions may benefit from short-circuit evaluation — check the most selective conditions first.”
}
},
{
“@type”: “Question”,
“name”: “How accurate is TABLESAMPLE for segment size estimation?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “TABLESAMPLE BERNOULLI(p) samples each row independently with probability p. For a segment that matches fraction f of users, the sample estimate has standard error of sqrt(f*(1-f)/n) where n is the sample size. At 1% sampling (n = 10,000 from 1 million users) with f = 0.1 (10% of users match), standard error = sqrt(0.1*0.9/10000) = 0.003, giving a 95% confidence interval of 10% ± 0.6%. For rough go/no-go decisions, 1% sampling is highly accurate. For small segments (f < 0.001), use larger samples (5-10%) to get enough matches for a meaningful estimate."
}
},
{
"@type": "Question",
"name": "How do composable segments work?",
"acceptedAnswer": {
"@type": "Answer",
"text": "A composable segment defines membership as a set operation (intersection, union, difference) on the membership sets of other segments. Instead of evaluating attribute rules, the engine performs: membership_A INTERSECT membership_B for AND-composable segments, membership_A UNION membership_B for OR-composable, and membership_A EXCEPT membership_B for NOT-composable. In SQL: SELECT user_id FROM SegmentMembership WHERE segment_id = A AND removed_at IS NULL INTERSECT SELECT user_id FROM SegmentMembership WHERE segment_id = B AND removed_at IS NULL. Composable segments are evaluated after all base segment memberships are computed, in dependency order. Circular dependencies must be detected and rejected at rule creation time."
}
}
]
}

Incremental vs Full Recompute

Use incremental when a small fraction of users change attributes between cycles, or latency requirements demand near-real-time updates. Use full recompute when segment rules change (new conditions require re-evaluating all users) or when a clean rebuild is needed. Best practice: event-driven incremental for real-time updates, nightly full recompute for correctness verification.

JSONB Rule Evaluation Performance

Evaluating a 5–10 condition rule tree in application code takes microseconds per user. At scale, the bottleneck is reading UserAttributeSnapshot rows from the database. Optimize with: Redis caching for hot users, batched reads using IN clauses, and pushing simple predicates into PostgreSQL JSONB operators.

Sampling Accuracy

At 1% sampling (10,000 samples from 1M users) with 10% segment match rate: 95% CI = 10% ± 0.6%. Accurate enough for go/no-go decisions. For small segments (<0.1% of users), use 5–10% sampling to get enough matches for a meaningful estimate.

Composable Segments

Composable segments use set operations on existing membership sets: INTERSECT for AND, UNION for OR, EXCEPT for NOT. Evaluated after all base segments are computed, in dependency order. Detect and reject circular dependencies at rule creation time.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does JSONB rule evaluation work for user segments?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Segment rules are stored as a JSONB array of condition objects with fields (attribute, operator, value); the Python rule engine iterates conditions, fetches the attribute from the user's snapshot, and applies the operator function; all conditions must pass for AND groups, any for OR groups.”
}
},
{
“@type”: “Question”,
“name”: “How does incremental segment evaluation avoid full table scans?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A dirty-flag mechanism marks user IDs whose attributes changed since the last compute job; the incremental update evaluates only dirty users against all relevant segments, rather than re-evaluating all users against all segments.”
}
},
{
“@type”: “Question”,
“name”: “How is segment membership size estimated before full computation?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “TABLESAMPLE BERNOULLI(1) on the UserAttributeSnapshot table evaluates the segment rules on a 1% random sample; the estimated size is sample_match_count * 100; this runs in seconds vs. minutes for a full compute.”
}
},
{
“@type”: “Question”,
“name”: “How are composable segments (intersection/union of other segments) evaluated?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The rule engine resolves composable segments by first computing or fetching the membership sets of constituent segments, then performing set intersection or union; membership results are cached per segment per compute cycle to avoid redundant re-evaluation.”
}
}
]
}