Deadlock Detection System Low-Level Design: Wait-For Graph, Victim Selection, and Cycle Breaking

What Is a Deadlock?

A deadlock is a situation where two or more transactions are each waiting for a lock held by another transaction in the same group, creating a circular dependency. No transaction can make progress — each is blocked waiting for another to release its lock. Without intervention, all transactions would wait indefinitely.

The four conditions for deadlock (Coffman conditions): mutual exclusion (locks are exclusive), hold and wait (a transaction holds locks while waiting for more), no preemption (locks cannot be forcibly taken), and circular wait (a cycle in the wait-for graph).

Wait-For Graph

A wait-for graph is a directed graph where:

Nodes represent active transactions.
Directed edges represent waiting relationships: an edge T1 → T2 means T1 is currently blocked waiting for a lock held by T2.

A deadlock exists if and only if there is a cycle in the wait-for graph. Cycle detection in the wait-for graph is therefore the core algorithm for deadlock detection.

The wait-for graph is maintained in real time: edges are added when a transaction begins waiting for a lock (lock request blocked), and removed when a lock is granted (blocking transaction released its lock).

Cycle Detection Algorithm

Cycle detection uses depth-first search (DFS):

Maintain a recursion stack tracking the current DFS path.
For each unvisited node, run DFS: visit the node, add it to the recursion stack, then recursively visit all neighbors.
If a neighbor is already in the current recursion stack, a back edge is found — a cycle exists.
When backtracking, remove the node from the recursion stack.

Complexity: O(V + E) per detection run, where V is the number of active transactions and E is the number of wait-for edges. For OLTP databases with hundreds of concurrent transactions, this is fast enough to run every few milliseconds.

Detection Frequency

Detection frequency is a tradeoff:

Too frequent: Wastes CPU on detection overhead; most runs find nothing.
Too infrequent: Deadlocked transactions block for longer, increasing latency and holding locks that other transactions are waiting for (cascading delays).

PostgreSQL runs deadlock detection ~1 second after a lock wait begins (deadlock_timeout parameter, default 1s). This delay means most lock waits resolve naturally before detection even starts, avoiding unnecessary overhead.

Victim Selection

Once a cycle is found, one transaction (the victim) must be aborted to break the cycle. Victim selection criteria, roughly in order of preference:

Least work done: Fewest rows written, fewest operations executed. Minimizes wasted work on rollback.
Youngest transaction: Most recently started. Assumes shorter-lived transactions have less accumulated state.
Lowest priority: Application-defined priority if available.
Smallest transaction ID: Deterministic tiebreaker to ensure consistent behavior.

PostgreSQL selects the victim that would incur the lowest rollback cost, using internal heuristics based on the number of locks held and transaction age.

Abort and Retry

The victim transaction receives a deadlock error (e.g., ERROR: deadlock detected in PostgreSQL). The application must:

Catch the deadlock error.
Roll back the transaction (the database may have already done so).
Wait briefly (optional small delay to let competing transactions complete).
Retry the entire transaction from the beginning.

The retry must re-read all data — the transaction cannot resume from where it was aborted, as its view of the data may now be stale.

Timeout-Based Prevention (Alternative to Detection)

An alternative to deadlock detection is lock timeout: if a transaction cannot acquire a lock within a configured time, it is aborted. This is simpler to implement — no graph maintenance required — but has a key weakness: it may abort non-deadlocked transactions that are simply waiting for a slow but non-deadlocked transaction to release its lock. This wastes work unnecessarily.

In practice, both mechanisms are used together: lock_timeout as a safety net (prevents indefinite waits even in non-deadlock scenarios) and deadlock detection for fast, precise identification of actual circular waits.

Distributed Deadlocks

In distributed systems where transactions span multiple database nodes, wait-for edges may cross node boundaries. A local deadlock detector on each node sees only local edges and cannot detect cross-node cycles.

Solutions:

Centralized coordinator: All nodes report their local wait-for edges to a central coordinator that builds the global graph and runs cycle detection.
Edge timeout: Use lock timeouts as the primary mechanism; full distributed deadlock detection is complex and rarely worth the overhead.
Lock ordering: Prevent distributed deadlocks by acquiring locks in a global consistent order across all nodes.

Phantom Deadlocks in Distributed Detection

Distributed deadlock detection can produce phantom deadlocks: the detector sees a cycle in the global graph, but by the time it processes the cycle, one transaction in the cycle has already committed and released its locks — the cycle no longer exists. To avoid aborting a committed transaction, the coordinator must confirm that all participating transactions are still active before aborting any victim.

SQL Schema

-- Wait-for graph edges: each row represents T_waiter waiting for T_holder
CREATE TABLE LockWaitEdge (
    waiter_txn_id UUID NOT NULL,
    holder_txn_id UUID NOT NULL,
    resource_id   VARCHAR(256) NOT NULL,
    edge_created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
    PRIMARY KEY (waiter_txn_id, holder_txn_id, resource_id)
);

CREATE INDEX idx_lockwaitedge_holder ON LockWaitEdge(holder_txn_id);
CREATE INDEX idx_lockwaitedge_created ON LockWaitEdge(edge_created_at);

-- Detected deadlock cycles and resolution
CREATE TABLE DeadlockCycle (
    cycle_id           UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    participating_txns JSONB NOT NULL,    -- list of txn_ids in cycle
    victim_txn_id      UUID NOT NULL,
    detected_at        TIMESTAMPTZ NOT NULL DEFAULT now(),
    resolved_at        TIMESTAMPTZ
);

CREATE INDEX idx_deadlockcycle_victim ON DeadlockCycle(victim_txn_id);
CREATE INDEX idx_deadlockcycle_detected ON DeadlockCycle(detected_at);

Python Implementation

import uuid
from collections import defaultdict
from dataclasses import dataclass, field
from typing import Dict, List, Optional, Set

@dataclass
class TxnInfo:
    txn_id: str
    write_count: int = 0
    started_at: float = 0.0

class WaitForGraph:
    def __init__(self):
        # adjacency list: waiter -> set of holders
        self.edges: Dict[str, Set[str]] = defaultdict(set)
        self.txns: Dict[str, TxnInfo] = {}

    def register_txn(self, txn_id: str, started_at: float) -> None:
        self.txns[txn_id] = TxnInfo(txn_id=txn_id, started_at=started_at)

    def add_edge(self, waiter: str, holder: str) -> None:
        """Record that waiter is blocked waiting for a lock held by holder."""
        self.edges[waiter].add(holder)

    def remove_edge(self, waiter: str, holder: str) -> None:
        """Remove edge when lock is granted or waiter is aborted."""
        self.edges[waiter].discard(holder)

    def remove_txn(self, txn_id: str) -> None:
        """Remove all edges involving txn_id (commit or abort)."""
        self.edges.pop(txn_id, None)
        for edges in self.edges.values():
            edges.discard(txn_id)
        self.txns.pop(txn_id, None)

    def detect_cycles(self) -> List[List[str]]:
        """Find all cycles in the wait-for graph using DFS."""
        visited: Set[str] = set()
        rec_stack: Set[str] = set()
        cycles: List[List[str]] = []
        path: List[str] = []

        def dfs(node: str) -> None:
            visited.add(node)
            rec_stack.add(node)
            path.append(node)
            for neighbor in self.edges.get(node, []):
                if neighbor not in visited:
                    dfs(neighbor)
                elif neighbor in rec_stack:
                    # Extract cycle from path
                    idx = path.index(neighbor)
                    cycles.append(path[idx:])
            path.pop()
            rec_stack.discard(node)

        for node in list(self.edges.keys()):
            if node not in visited:
                dfs(node)

        return cycles

    def select_victim(self, cycle: List[str]) -> str:
        """
        Select the victim from a cycle.
        Prefer the transaction with the fewest writes (least rollback cost).
        Tiebreak by most recent start time (youngest).
        """
        def score(txn_id: str):
            info = self.txns.get(txn_id)
            if info is None:
                return (0, 0)
            return (info.write_count, -info.started_at)

        return min(cycle, key=score)

    def abort_victim(self, txn_id: str) -> None:
        """Abort the victim: remove from graph and signal the transaction."""
        self.remove_txn(txn_id)
        # In production: signal the transaction thread to raise DeadlockError

    def run_detection(self) -> List[str]:
        """
        Run one round of deadlock detection.
        Returns list of aborted victim txn_ids.
        """
        cycles = self.detect_cycles()
        aborted = []
        for cycle in cycles:
            # Confirm all transactions in cycle are still active
            active_cycle = [t for t in cycle if t in self.txns]
            if len(active_cycle) < 2:
                continue  # Phantom deadlock: cycle already resolved
            victim = self.select_victim(active_cycle)
            self.abort_victim(victim)
            aborted.append(victim)
        return aborted

FAQ

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is the time complexity of DFS-based cycle detection in the wait-for graph?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “DFS cycle detection runs in O(V + E) where V is the number of active transactions and E is the number of wait-for edges. For a typical OLTP database with hundreds of concurrent transactions, this is sub-millisecond. PostgreSQL runs detection approximately every deadlock_timeout (default 1 second) after a lock wait begins.”
}
},
{
“@type”: “Question”,
“name”: “What criteria determine which transaction is chosen as the deadlock victim?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The primary criterion is minimizing wasted work: prefer the transaction with the fewest writes, which has the lowest rollback cost. Secondary criteria include transaction age (prefer youngest, as less work has been done) and application-defined priority. PostgreSQL selects the victim that is least costly to abort based on internal lock and query metrics.”
}
},
{
“@type”: “Question”,
“name”: “What is the difference between deadlock detection and deadlock prevention?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Deadlock detection allows deadlocks to occur and then breaks them by aborting a victim transaction. It has low overhead for non-deadlock workloads but adds latency when deadlocks do occur. Deadlock prevention uses strategies like lock ordering or lock timeout to ensure deadlocks cannot form, eliminating detection overhead but potentially aborting non-deadlocked transactions (timeout approach) or requiring application coordination (ordering approach).”
}
},
{
“@type”: “Question”,
“name”: “How are distributed deadlocks handled?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Distributed deadlocks require a coordinator that aggregates wait-for edges from all nodes and runs global cycle detection. Each node reports its local edges to the coordinator periodically or on lock wait events. The coordinator detects cross-node cycles and instructs one node to abort the victim. Phantom deadlocks (stale edges showing a cycle that no longer exists) are handled by confirming all transactions in the cycle are still active before aborting.”
}
}
]
}

DFS cycle detection complexity: O(V + E); sub-millisecond for typical OLTP transaction counts.
Victim selection criteria: Fewest writes (lowest rollback cost), then youngest start time; PostgreSQL uses internal cost heuristics.
Detection vs prevention: Detection aborts after the fact (lower overhead normally, higher latency on deadlock); prevention uses ordering or timeouts (no detection cost but may abort non-deadlocked transactions).
Distributed deadlock: Requires centralized coordinator to aggregate global wait-for graph; phantom deadlocks (stale edges) require active-transaction confirmation before aborting.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How is a wait-for graph used to detect deadlocks?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A wait-for graph is a directed graph where each node is a transaction and an edge T1 -> T2 means T1 is waiting for a lock held by T2; a deadlock exists if and only if the graph contains a cycle. The lock manager periodically runs cycle detection (e.g., DFS) on this graph, and any cycle found indicates a set of mutually blocked transactions that must be broken by aborting one of them.”
}
},
{
“@type”: “Question”,
“name”: “How is a victim chosen during deadlock resolution?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “The victim selection heuristic typically aborts the transaction that minimizes rollback cost — often the youngest transaction (fewest locks held, least work done) or the transaction with the lowest priority. PostgreSQL aborts the transaction that detected the deadlock (the one whose lock request closed the cycle), which is a simple rule that avoids complex cost estimation.”
}
},
{
“@type”: “Question”,
“name”: “How does PostgreSQL detect and break deadlocks?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “PostgreSQL waits deadlock_timeout (default 1 s) before checking for a deadlock; when a lock wait exceeds the threshold, the backend traverses the wait-for graph from its own transaction and aborts with ERROR 40P01 if a cycle is found. This lazy approach avoids the overhead of continuous graph maintenance and is effective because most lock waits resolve within the timeout window.”
}
},
{
“@type”: “Question”,
“name”: “How are distributed deadlocks detected across multiple nodes?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Each node maintains a local wait-for graph and periodically ships it to a central deadlock detector or exchanges graphs with peers; the detector merges the graphs and searches for global cycles that span node boundaries. Alternatively, systems use timeout-based detection (aborting any transaction that waits longer than a threshold) to avoid the complexity of distributed cycle detection at the cost of occasionally aborting transactions that would not have deadlocked.”
}
}
]
}