Low Level Design: Scoring Service – Tech Interview Dot Org

What Is a Scoring Service?

A scoring service is the write-side component responsible for receiving raw score events, applying business rules (multipliers, caps, bonuses), and producing a final authoritative score for each entity. It is the upstream producer that feeds leaderboards and ranking systems. Correctness, idempotency, and throughput are its primary concerns.

Data Model

Each scoring event is persisted before processing to enable replay and audit:

CREATE TABLE score_events (
    event_id     UUID PRIMARY KEY,
    entity_id    BIGINT NOT NULL,
    namespace    VARCHAR(64) NOT NULL,
    event_type   VARCHAR(64) NOT NULL,   -- e.g. 'kill', 'purchase', 'click'
    raw_value    BIGINT NOT NULL,
    multiplier   DECIMAL(5,2) NOT NULL DEFAULT 1.00,
    final_value  BIGINT NOT NULL,
    processed_at TIMESTAMP,
    status       VARCHAR(16) NOT NULL DEFAULT 'pending',  -- 'pending', 'applied', 'rejected'
    INDEX idx_entity_namespace (entity_id, namespace),
    INDEX idx_status_created (status, processed_at)
);

CREATE TABLE entity_totals (
    entity_id    BIGINT NOT NULL,
    namespace    VARCHAR(64) NOT NULL,
    total_score  BIGINT NOT NULL DEFAULT 0,
    last_event   UUID,
    updated_at   TIMESTAMP NOT NULL DEFAULT NOW(),
    PRIMARY KEY (entity_id, namespace)
);

Core Algorithm and Workflow

Event Ingestion

Client calls POST /score-events with {entity_id, event_type, raw_value, idempotency_key}.
API validates the payload, checks the idempotency key against a Redis cache (SET idempotency:{key} 1 NX EX 86400). Duplicate requests return the original result immediately.
Event is written to score_events with status pending and published to Kafka topic raw-score-events.

Score Processing

Score processor consumer reads from Kafka.
Looks up the rule set for (namespace, event_type): base points, multiplier, cap, bonus conditions.
Applies rules: final_value = MIN(raw_value * multiplier + bonus, cap).
Atomically increments entity_totals.total_score using an optimistic lock (compare-and-swap on updated_at) or a database row lock.
Updates score_events status to applied and publishes a score-applied event to a downstream Kafka topic consumed by leaderboard and ranking services.

Rule Engine

Rules are stored in a configuration table and cached in memory with a short TTL:

CREATE TABLE scoring_rules (
    namespace    VARCHAR(64) NOT NULL,
    event_type   VARCHAR(64) NOT NULL,
    base_points  BIGINT NOT NULL,
    multiplier   DECIMAL(5,2) NOT NULL DEFAULT 1.00,
    score_cap    BIGINT,
    bonus_json   TEXT,   -- JSON blob for conditional bonuses
    PRIMARY KEY (namespace, event_type)
);

Failure Handling and Consistency

At-least-once delivery: Kafka guarantees at-least-once delivery. Idempotency keys at the API layer prevent client-side duplicates. Processor-side deduplication checks score_events.status before applying: if applied, skip and acknowledge.

Processor crash mid-flight: The event remains pending in the database. A sweeper job periodically retries events stuck in pending beyond a timeout, re-publishing them to Kafka. This guarantees eventual processing.

Rule misconfiguration: Events that fail rule validation are marked rejected with an error code. A dead-letter queue holds them for manual review and potential reprocessing after rule correction.

Optimistic concurrency: If two processors race to update entity_totals, the loser retries. Under high contention, a Redis-based lock (SET lock:entity:{id} 1 NX EX 5) serializes updates per entity while keeping throughput high across entities.

Scalability Considerations

Kafka partitioning by entity_id: Partitioning the raw-score-events topic by entity_id ensures all events for a given entity are processed in order by the same consumer, eliminating per-entity race conditions without explicit locking.

Horizontal consumer scaling: Add consumer instances up to the partition count to scale throughput linearly. Partition count should be sized for peak write volume with headroom.

Score aggregation offload: For entities with extremely high event rates (e.g., a popular streamer receiving thousands of tip events per second), a mini-aggregator buffers events in memory for 100ms and emits a single batched increment, reducing database write pressure.

Read-your-writes: After submitting a score event, clients may immediately query their score. Route these reads to the relational entity_totals table (with replication lag awareness) rather than Redis, which may not yet reflect the latest applied event.

Database sharding: Shard entity_totals and score_events by entity_id % N across database nodes. The scoring service routes writes to the correct shard. A scatter-gather is only needed for cross-entity analytics, which belongs in a data warehouse, not the operational store.

Summary

A scoring service is the authoritative write path for score data. It enforces idempotency at ingestion, applies configurable rule-based transformations, and persists results durably before propagating downstream. Kafka partitioning by entity_id provides natural ordering and horizontal scale. Optimistic concurrency and sweeper jobs guarantee correctness under failures. The service is designed to be the single source of truth, decoupling raw event volume from the leaderboard and ranking read paths.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is a scoring service in system design?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A scoring service is a backend component that computes a numeric score for an entity — such as a fraud risk score for a transaction, a credit score for a user, or a quality score for a listing — based on a set of input features and a scoring model. It is typically invoked synchronously on the critical path or asynchronously in batch pipelines.”
}
},
{
“@type”: “Question”,
“name”: “How do you design a low-latency scoring service for fraud detection?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A low-latency fraud scoring service pre-computes and caches user and merchant feature vectors in Redis. At transaction time, the service fetches these features in a single round-trip, runs a lightweight ML model (e.g., a decision tree or logistic regression), and returns a risk score within 10–20 ms. Heavier models run asynchronously for post-authorization review.”
}
},
{
“@type”: “Question”,
“name”: “How do you ensure consistency and auditability in a scoring service?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Consistency is ensured by versioning the scoring model and storing the model version alongside each score in an append-only log. Auditability requires logging all input features and the resulting score to an immutable store (e.g., S3 or a WORM-compliant database) so that any score can be replayed and explained — a critical requirement for financial services companies like Stripe and Coinbase.”
}
},
{
“@type”: “Question”,
“name”: “What are common interview questions about scoring service design?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Common interview questions at companies like Stripe, Amazon, and Coinbase include: How would you design a real-time fraud scoring service that processes 100,000 transactions per second? How do you handle model updates without downtime? How do you deal with feature drift over time? How do you design a scoring service that can explain its decisions to regulators?”
}
}
]
}