Problem Overview
A matchmaking service groups players into balanced game sessions based on skill, latency, and preference. The system must minimize wait time while maximizing match quality. Poor matchmaking produces one-sided games; slow matchmaking drives players away — the design must balance both continuously.
Requirements and Constraints
Functional Requirements
- Players enqueue individually or as a pre-formed party
- Match players by skill rating (ELO or MMR) with configurable tolerance windows
- Prefer players with low mutual latency (ping-based region affinity)
- Create a lobby when a valid group is found; notify all matched players
- Allow players to cancel while in queue
Non-Functional Requirements
- Median wait time under 30 seconds for popular game modes
- Match quality: average ELO delta within a match under 150 points at steady state
- Handle 100,000 concurrent players in queue
- Lobby creation latency under 200 ms after a match is found
Core Data Model
Queue Entry
CREATE TABLE matchmaking_queue (
ticket_id UUID PRIMARY KEY,
player_id BIGINT NOT NULL,
party_id UUID, -- null for solo
game_mode VARCHAR(64) NOT NULL,
mmr INT NOT NULL,
region VARCHAR(32) NOT NULL,
enqueued_at TIMESTAMP NOT NULL DEFAULT NOW(),
status ENUM('waiting','matched','cancelled') DEFAULT 'waiting',
tolerance INT NOT NULL DEFAULT 50 -- current MMR tolerance window
);
Lobby
CREATE TABLE lobbies (
lobby_id UUID PRIMARY KEY,
game_mode VARCHAR(64) NOT NULL,
region VARCHAR(32) NOT NULL,
server_ip VARCHAR(45),
status ENUM('forming','ready','started','expired') DEFAULT 'forming',
created_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE lobby_players (
lobby_id UUID NOT NULL,
player_id BIGINT NOT NULL,
team TINYINT,
PRIMARY KEY (lobby_id, player_id)
);
Key Algorithms and Logic
Skill-Based Matching with Tolerance Expansion
Each ticket starts with a tight MMR window (±50). A background job runs every 2 seconds and widens the tolerance by 25 points per iteration up to a cap (±300). This ensures no player waits indefinitely. The algorithm iterates over waiting tickets sorted by enqueue time and attempts to form a complete group (e.g., 5v5 = 10 players) where all players fall within the current maximum pairwise MMR delta threshold.
Greedy group formation: sort tickets by MMR ascending. Use a sliding window to find the smallest MMR range that fits exactly N players. If multiple windows qualify, prefer the one with the lowest average wait time.
Party Handling
A party is treated as a single atomic unit with an aggregate MMR equal to the average member MMR. The party occupies N slots when matched, where N is party size. The remaining slots are filled with solo players whose MMR falls within tolerance of the party average. Teams are balanced by distributing the party to one side and filling the other side to match total team MMR within a threshold.
Region and Latency Affinity
Players report their region on enqueue. The matchmaker first attempts same-region matches. After 15 seconds without a match, it expands to adjacent regions. Predicted ping is estimated from region pairs using a pre-built latency matrix; matches exceeding 80 ms predicted ping are only allowed after full tolerance expansion.
Matching Worker Architecture
Queue state lives in Redis as a sorted set per (game_mode, region) keyed by MMR: queue:{mode}:{region}. Matching workers (stateless) use a distributed lock per (mode, region) to prevent duplicate match attempts. Workers use ZRANGEBYSCORE to extract candidates in the current MMR window, attempt group formation in memory, and atomically remove matched tickets with a Lua script that checks status before deletion.
API Design
POST /v1/matchmaking/tickets
Body: { "player_id": "...", "game_mode": "ranked_5v5", "region": "us-east", "mmr": 1420 }
Response: { "ticket_id": "uuid", "status": "waiting", "estimated_wait_sec": 18 }
DELETE /v1/matchmaking/tickets/{ticket_id}
Response: 204 No Content
GET /v1/matchmaking/tickets/{ticket_id}/status
Response: { "status": "matched", "lobby_id": "uuid" }
POST /v1/lobbies/{lobby_id}/ready
Body: { "player_id": "..." }
Response: { "lobby_status": "forming", "ready_count": 7, "total": 10 }
Lobby Lifecycle
Once a match is formed, a lobby record is created and all matched players receive a WebSocket push notification. Players have 30 seconds to click “Accept.” If any player declines or times out, the lobby is dissolved and non-offending players are re-queued with a priority boost (they retain their original enqueue timestamp). Players who repeatedly decline receive a short matchmaking cooldown.
Scalability Considerations
- Horizontal matching workers: Shard queues by (mode, region) — each shard has exactly one active worker holding a Redis lock, with hot standby for failover.
- Queue depth monitoring: Alert when median wait time exceeds 45 seconds; auto-scale matching worker frequency (run every 1 s instead of 2 s) to clear backlog.
- Game server provisioning: Pre-warm server fleet based on current queue depth and historical match rate; call a game server allocation API asynchronously as soon as a match is found.
- State durability: Redis queue is the hot path, but all ticket state is also written to the relational DB for auditability and replay on Redis failure.
Failure Modes and Mitigations
- Worker crash mid-match: Lua script atomicity ensures tickets are either all removed or none are. Orphaned “matched” tickets have a 60-second TTL watchdog that returns them to “waiting.”
- Redis restart: Rebuild queue from DB rows where status = 'waiting' on startup; use AOF persistence to reduce rebuild scope.
- MMR manipulation: Rate-limit ticket submissions per player; validate MMR server-side against the ranked profile service rather than trusting client-supplied values.
{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does ELO/MMR skill matching work in a matchmaking system?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Each player has a numeric rating (ELO or MMR). The matchmaker finds opponents whose ratings fall within an acceptable delta of the queuing player's rating. Ratings are stored per game mode and updated after each match using a K-factor weighted formula, so recent performance influences future match quality.”
}
},
{
“@type”: “Question”,
“name”: “What is queue expansion policy and when should it apply?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Queue expansion widens the acceptable MMR search range over time to prevent players from waiting indefinitely. A tiered policy starts with a tight delta (e.g. ±50 MMR), then expands in steps every 30–60 seconds. After a configurable ceiling (e.g. ±300 MMR) the player is placed in a fill match or shown an estimated wait time.”
}
},
{
“@type”: “Question”,
“name”: “How is aggregate MMR calculated for a party queuing together?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Party MMR is typically computed as the weighted average of individual member ratings, sometimes boosted by the highest member's MMR to account for smurf protection. The resulting aggregate is then used as if it were a single player's rating when searching for opponents, ensuring fair team composition.”
}
},
{
“@type”: “Question”,
“name”: “What are the key states in a lobby lifecycle?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A lobby progresses through: QUEUING (players added to pool), MATCHED (group assembled, lock acquired), ACCEPTING (each player confirms readiness with a timeout), LOADING (game server allocated, map loaded), IN_GAME (active session), and CLOSED (results persisted, MMR updated). Any acceptance timeout or disconnect before IN_GAME returns players to QUEUING.”
}
}
]
}
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering