Low Level Design: Gaming Matchmaking Service

A matchmaking service pairs players into balanced, low-latency game sessions. The design touches skill rating systems, queue management, match formation algorithms, lobby lifecycle, and anti-abuse controls. Below is a class-level walkthrough suitable for a system design interview.

Player Queue Entry

When a player requests a match, the client sends a QueueRequest with the player’s ID, selected game mode, and optional party members. The QueueService fetches the player’s current MMR (matchmaking rating) from the RatingStore (a Redis hash keyed by player ID), measures or accepts a reported ping to regional servers, and creates a QueueEntry:

QueueEntry {
  playerId: string
  partyId: string | null
  mmr: float
  mmrRange: [float, float]   // expands over time
  region: string
  pingMs: int
  queuedAt: timestamp
  mode: GameMode
}

Entries are stored in a per-region, per-mode sorted set in Redis, scored by queue time so that longest-waiting players get priority consideration. Party members are bundled under a single PartyEntry with the lowest MMR of the party determining the bracket floor (to prevent boosting).

Match Formation Algorithm

A background MatchmakerWorker runs every 500ms per region/mode shard. It pulls candidate entries from the sorted set and attempts to form a complete lobby (e.g., 10 players for a 5v5 mode). The core logic:

  • MMR bucketing: group candidates by MMR bucket (e.g., 100-point wide buckets). Try to fill a lobby from a single bucket first.
  • Search radius expansion: if a lobby can’t be filled within 30 seconds, expand the acceptable MMR range by ±50 points per additional 15 seconds elapsed, up to a configured maximum spread (e.g., ±300).
  • Latency gate: candidates whose ping to the selected server exceeds a threshold (e.g., 120ms) are filtered out unless the queue is sparse and wait time exceeds 90 seconds.
  • Team balance: once enough candidates are gathered, assign teams greedily to minimize total MMR delta between teams. For a 5v5, sort by MMR and alternate assignment (snake draft).

When a valid lobby is formed, the worker emits a LobbyFormedEvent and removes all matched entries from the queue atomically via a Lua script to avoid race conditions with other workers.

Session Creation and Lobby Management

The LobbyService receives the LobbyFormedEvent and creates a GameSession record in the session database (Postgres). It assigns a dedicated game server instance via the FleetManager, which selects the nearest available server using latency data. The session record holds:

  • Session ID, game mode, map
  • Player list with team assignment
  • Server IP and port
  • Status: LOBBY | IN_PROGRESS | COMPLETED | ABANDONED

Players receive a LobbyReadyNotification via WebSocket. They have 30 seconds to accept. If any player declines or times out, the lobby is dissolved and remaining players are re-queued with their original queue time preserved (so they don’t lose wait-time progress).

Backfill for Disconnected Players

Once a session is IN_PROGRESS, the game server reports player disconnections to the SessionMonitor. If a slot opens and the game mode allows backfill (e.g., battle royale but not competitive ranked), the monitor emits a BackfillRequest. The matchmaker re-enters the normal queue flow with a priority boost for backfill candidates, targeting players already in queue whose MMR fits the running session’s average.

Skill Rating Update (Elo)

After a session completes, the RatingService processes the result. Using Elo:

Expected score for player A vs B:
  E_A = 1 / (1 + 10^((MMR_B - MMR_A) / 400))

New rating:
  MMR_A_new = MMR_A + K * (S_A - E_A)

K factor: 32 for new players (< 30 games), 16 for established

TrueSkill is an alternative for team games, modeling each rating as a Gaussian (mu, sigma) and updating all players simultaneously. The RatingStore is updated and a rating history event is appended to a Kafka topic for analytics.

Anti-Smurf Detection

A new account with suspiciously high win rates triggers the SmurfDetector. Signals include: win rate > 80% in first 20 games, KDA far above bracket average, IP/device fingerprint shared with a high-MMR account, and account age under 7 days. When flagged, the player’s provisional MMR is fast-tracked upward (applying a 2× K factor) and optionally held in a separate smurfing pool until calibration completes.

Regional Server Selection

The FleetManager maintains a registry of game server pools per region (us-east, eu-west, ap-southeast, etc.). Server selection for a lobby picks the region where the sum of all player pings is minimized. If the optimal region has no available capacity, the next-best region is used and players are notified of the expected latency. Autoscaling triggers are based on queue depth per region, with a predictive scale-up before peak hours using historical traffic data.

Key Design Decisions

  • Redis for queue storage: O(log N) sorted set operations, TTL-based cleanup of stale entries
  • Lua scripts for atomic lobby formation to prevent double-matching
  • Kafka for rating update events and analytics decoupling
  • Postgres for durable session records with FK integrity
  • Horizontal scaling of MatchmakerWorkers by sharding on region + mode

Interview tip: clarify game mode and expected concurrency early (1K vs 1M concurrent players changes the design substantially), then walk through queue entry, formation, and rating update before getting into anti-cheat or backfill.

Frequently Asked Questions

What is a matchmaking service in system design?

A matchmaking service pairs players into game sessions based on criteria such as skill rating, latency, region, and game mode. It maintains a pool of waiting players, evaluates compatibility, and creates lobbies once a suitable group is assembled. Core components include a queue manager, a rating engine, a lobby service, and a backfill mechanism for handling dropouts.

How does Elo or TrueSkill rating affect matchmaking decisions?

Rating systems like Elo (two-player) or TrueSkill (multiplayer) assign each player a numeric skill estimate, often with an uncertainty component. The matchmaker filters or ranks candidates whose rating falls within an acceptable window of the queuing player’s rating. TrueSkill also tracks variance, so newer players have wider windows until their skill is better established, allowing faster initial placement at the cost of potentially less balanced early matches.

How do you expand matchmaking search radius over time to prevent long queues?

The matchmaker applies a time-based relaxation policy: the acceptable skill delta, ping threshold, or region restriction widens at defined intervals (e.g., every 15 seconds). A priority queue or sorted set ordered by wait time lets the system preferentially match players who have waited longest. Hard caps prevent the window from expanding so far that match quality becomes unacceptable, and telemetry on average wait times drives tuning of the relaxation curve.

How do you handle player dropout and backfill during a match?

When a player disconnects, the game server emits a backfill request to the matchmaking service with the open slot’s requirements (rating range, role, game state). The matchmaker maintains a secondary backfill queue of players who opted in to joining in-progress games. A selected backfill candidate is given a short reservation window; if they decline or time out, the next candidate is tried. The original player may also be offered a rejoin token valid for a short period before the slot is surrendered to backfill.

See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Snap Interview Guide

Scroll to Top