Low Level Design: Skip List

⏱ 6 min read

Structure

A skip list is a probabilistic data structure that maintains a sorted collection and supports O(log N) expected time for search, insert, and delete — matching a balanced BST, but without the complexity of rotations or rebalancing.

The structure consists of multiple levels of singly linked lists (or doubly linked for bidirectional traversal). Level 0 is the complete sorted list containing every element. Higher levels are "express lanes" — sparse sublists where each node skips over many level-0 nodes, allowing faster traversal of long ranges.

Each node stores:

class Node:
    key:     comparable
    value:   any
    forward: Node[]   // forward[i] = next node at level i

A node participating in levels 0 through k has a forward array of length k+1. The skip list itself has a header node with a forward array of length MAX_LEVEL. The number of levels a node participates in is determined randomly at insert time.

A skip list with N elements and maximum level L has an expected O(N/2^i) nodes at level i. At the highest populated level, only O(1) nodes are expected. This geometric decay is what gives the structure its logarithmic search time.

Search

Search starts at the top-left corner: the header node at the highest non-empty level.

At the current node and level: look at current.forward[level].
If forward[level].key < target: advance right (current = forward[level]).
If forward[level].key >= target: drop down one level (level–).
Repeat until level 0 is exhausted.
Check current.forward[0]: if its key equals target, return its value. Otherwise, key not found.

At each level, the pointer advances only while the next key is strictly less than the target. This ensures the final position at level 0 is either the target or its immediate predecessor.

Expected time complexity is O(log N) because there are O(log N) levels and each level requires O(1) expected comparisons before dropping down. In practice the constant factor is small — benchmarks show skip list search competitive with red-black tree search despite the asymptotic equivalence.

Insert

Insert combines a modified search with node splicing:

Run the search algorithm, but record the rightmost node visited at each level in an update[] array. update[i] is the predecessor of the insert position at level i.
Determine the new node’s level using randomized level assignment (see below).
For levels above the current maximum: extend the update[] array with the header node and update the skip list’s level counter.
Create the new node. For each level i from 0 to new node’s level: splice the node in by setting new_node.forward[i] = update[i].forward[i] and update[i].forward[i] = new_node.

The splicing is O(log N) expected because the new node participates in O(log N) levels on average, and each splice is O(1). The search phase is also O(log N). Total: O(log N) expected.

Delete

Delete is the mirror of insert:

Run the search algorithm, collecting predecessors in update[].
Check that update[0].forward[0] exists and has the target key. If not, key absent — return.
For each level i from 0 to the node’s level: if update[i].forward[i] == target_node, set update[i].forward[i] = target_node.forward[i] to bypass the target.
Free the target node.
If the skip list’s maximum level decreased (the top level’s header now points to null), decrement the maximum level.

Collecting predecessors via the search traversal and then unlinking is O(log N) expected. No rebalancing step exists — the probabilistic structure self-maintains its expected shape over many insertions and deletions.

Probabilistic Level Assignment

The level of each new node is chosen randomly using a geometric distribution:

def random_level(p=0.5, max_level=32):
    level = 1
    while random() < p and level < max_level:
        level += 1
    return level

With p=0.5: level 1 with probability 1/2, level 2 with probability 1/4, level 3 with probability 1/8, and so on. Expected level = 1/(1-p) = 2. Expected number of nodes at level i = N / 2^i.

The maximum level cap should be set to ceil(log_{1/p}(N)) — for p=0.5 and N=2^32 this is 32. Capping at MAX_LEVEL ensures that creating an extremely high node (which the unbounded geometric distribution could theoretically produce) does not waste memory on mostly-null forward pointers.

The choice of p trades space against speed: lower p means fewer levels, less memory, but more nodes per level and slower search. Redis uses p=0.25 rather than 0.5 specifically to reduce memory overhead, accepting a slightly larger constant in the O(log N) bound.

Time Complexity

All three core operations — search, insert, delete — run in O(log N) expected time. "Expected" refers to the average over random level assignments, not over a random input distribution; the algorithm is correct for any key sequence.

Worst case is O(N): if every node is assigned the maximum level, or if (with vanishingly small probability) all nodes end up at level 1, the structure degenerates to a linear scan. However, the probability that any particular operation takes more than c * log N time decreases exponentially in c — the same tail bound as quicksort. In practice, worst-case behavior is never observed in large data sets.

Comparison with balanced BSTs:

AVL and red-black trees guarantee O(log N) worst case; skip lists only guarantee O(log N) expected.
Skip lists require no rotations or rebalancing — simpler to implement correctly.
Skip lists support efficient range queries: after finding the start of the range at level 0, traverse forward pointers at level 0 to enumerate all keys in range. This is cache-friendlier than an in-order BST traversal.
Concurrent skip lists are significantly simpler to implement correctly than concurrent balanced BSTs (see below).

Redis Sorted Set Implementation

Redis sorted sets (ZSET) are the canonical production example of skip lists. Each sorted set is implemented as a skip list + hashmap hybrid:

The skip list maintains elements in score order, enabling efficient range queries: ZRANGE, ZRANGEBYSCORE, ZRANGEBYLEX, ZREVRANGE, ZRANK. All O(log N + M) where M is the number of returned elements.
The hashmap maps each member string to its score, enabling O(1) score lookup for ZSCORE and O(log N) update for ZADD (hashmap gives the current score, skip list is updated if score changed).

Redis uses MAX_LEVEL=32 and p=0.25. The lower p reduces average node height from 2 (p=0.5) to 1.33, significantly reducing memory usage for the forward pointer arrays at the cost of a larger constant in search time. For Redis workloads (typically millions of members per sorted set) this tradeoff is worthwhile.

Redis’s skip list nodes also store a span field at each level: the number of nodes at level 0 that each forward pointer skips over. This enables O(log N) ZRANK (rank by score) by summing spans along the search path.

Concurrent Skip List

Skip lists have a significant advantage over balanced BSTs for concurrent access: lock-free concurrent skip lists are practical and well-understood, while lock-free balanced BSTs are notoriously complex.

The key insight enabling lock-free operation is that delete can be split into two phases:

Logical deletion: Mark the node as deleted by setting a flag bit in its level-0 forward pointer (using the low bit of the pointer, which is always 0 for aligned allocations). A marked node is considered absent for all operations. This is a single atomic CAS.
Physical deletion: Concurrent search operations that encounter a marked node help remove it by updating predecessors. Physical removal is eventually consistent — no operation waits for it.

Insert uses CAS operations to splice the new node into each level. If a CAS fails (concurrent modification), the insert retries from the affected level.

Java’s ConcurrentSkipListMap implements this algorithm. It provides linearizable get, put, remove, and range iteration operations. All operations are lock-free (they use CAS, never block). It is the go-to choice for concurrent sorted maps in Java when the sorted-order or range-query properties are needed, replacing Collections.synchronizedSortedMap(new TreeMap(...)) with far better concurrent throughput.