Cursor Pagination Low-Level Design – Tech Interview Dot Org

Cursor Pagination — Low-Level Design

Cursor pagination (also called keyset pagination) navigates large datasets stably and efficiently by using a pointer to the last seen row rather than a numeric offset. It is the correct pagination strategy for APIs, feeds, and any dataset that changes while users browse. This pattern is asked at every company building a data-heavy API.

Why OFFSET Pagination Fails at Scale

-- OFFSET pagination
SELECT * FROM Post ORDER BY created_at DESC LIMIT 20 OFFSET 1000;

Problems:
1. Performance: OFFSET 1000 scans and discards 1000 rows before returning 20.
   At OFFSET 1,000,000: scans 1M rows — seconds of latency.

2. Stability: if a new post is inserted at the top while a user is on page 3,
   all subsequent pages shift by 1. Page 4 now contains items that were
   on page 3 before — the user sees the same item twice.

3. Database work grows linearly with page number: O(offset + limit).

Cursor Pagination by Single Column

-- First page: no cursor
SELECT * FROM Post
WHERE author_id = %(uid)s
ORDER BY created_at DESC
LIMIT 21;  -- fetch 21, return 20 to detect if there is a next page

-- Extract cursor from last returned item:
next_cursor = base64(json({'created_at': last_item.created_at.isoformat()}))

-- Next page: use cursor
SELECT * FROM Post
WHERE author_id = %(uid)s
  AND created_at < %(cursor_created_at)s  -- strictly less than cursor
ORDER BY created_at DESC
LIMIT 21;

-- Index that makes this efficient:
CREATE INDEX idx_post_author_time ON Post(author_id, created_at DESC);
-- Query becomes: index seek to author_id + created_at < cursor, then forward scan

Compound Cursor (Stable Tiebreaker)

-- Problem: sorting by created_at alone is unstable if two rows share a timestamp
-- (batch imports, millisecond collisions). The same row may appear on two pages.

-- Solution: compound cursor on (created_at DESC, id DESC)
-- id is globally unique, so the compound is always stable.

SELECT * FROM Post
WHERE author_id = %(uid)s
  AND (created_at, id) < (%(cursor_ts)s, %(cursor_id)s)
ORDER BY created_at DESC, id DESC
LIMIT 21;

-- Encode cursor:
def encode_cursor(row):
    return base64.b64encode(json.dumps({
        'ts': row.created_at.isoformat(),
        'id': row.id
    }).encode()).decode()

def decode_cursor(cursor_str):
    data = json.loads(base64.b64decode(cursor_str))
    return data['ts'], data['id']

# The (created_at, id) < (ts, id) condition uses PostgreSQL's
# row value comparison — equivalent to:
# created_at < ts OR (created_at = ts AND id < cursor_id)

Cursor for “Sort by Score” (Non-Monotonic)

-- Sorting by upvote_count: scores can change while paginating
-- (a post gains votes between page 1 and page 2 fetches)
-- Accept some instability; use compound cursor as best-effort.

SELECT * FROM Post
WHERE (upvote_count, id) < (%(cursor_votes)s, %(cursor_id)s)
ORDER BY upvote_count DESC, id DESC
LIMIT 20;

-- Index:
CREATE INDEX idx_post_votes ON Post(upvote_count DESC, id DESC);

-- For "sort by score" feeds: accept that new viral posts
-- may appear unexpectedly on later pages. This is unavoidable
-- without fully materializing the sorted list.

API Response Shape

def paginate(query_fn, cursor_str=None, limit=20):
    cursor = decode_cursor(cursor_str) if cursor_str else None
    items = query_fn(cursor=cursor, limit=limit + 1)

    has_next = len(items) > limit
    if has_next:
        items = items[:limit]

    next_cursor = encode_cursor(items[-1]) if has_next else None

    return {
        'items': items,
        'next_cursor': next_cursor,    # null on last page
        'has_next': has_next,
    }

# Response:
{
  "items": [...],
  "next_cursor": "eyJ0cyI6IjIwMjQtMDEtMTVUMTA6MzA6MDAiLCJpZCI6MTIzNH0=",
  "has_next": true
}
# Client passes next_cursor as ?cursor= on the next request

Bi-Directional Cursor Pagination

-- Supporting both "next page" and "previous page" requires two cursor types

-- Forward cursor (next page): items where (ts, id)  cursor, ORDER ASC, then reverse

def get_page(cursor=None, direction='next', limit=20):
    if direction == 'next' or cursor is None:
        rows = db.execute("""
            SELECT * FROM Post WHERE (created_at, id)  (%(ts)s, %(id)s)
            ORDER BY created_at ASC, id ASC LIMIT %(lim)s
        """, ...)
        rows = list(reversed(rows))  # Re-reverse to consistent DESC order

    return rows

Key Interview Points

Cursor pagination is O(1) regardless of page depth: The database seeks to the cursor position using the index and scans forward. No rows before the cursor are visited. OFFSET pagination is O(offset) — it gets slower as the user goes deeper.
Always use a tiebreaker: A cursor on a non-unique column (created_at) can skip or duplicate rows if multiple rows share the same value. Add the primary key as a tiebreaker to make the cursor unambiguous.
Encode cursors as opaque tokens: Base64-encode the cursor data so clients treat it as an opaque string. This allows changing the cursor format (e.g., adding fields) without breaking clients that just pass it back.
Fetch limit+1 to detect next page: Fetching one extra row and checking if it exists is cheaper than a COUNT(*) query. If N+1 rows are returned, there is a next page; return only N rows to the client.

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”Why does offset pagination break at scale and what replaces it?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”OFFSET N scans and discards N rows before returning results — at OFFSET 100000, the database does 100,000 units of work to return 20 rows. Performance degrades linearly with page depth. Additionally, concurrent inserts cause data drift: a new row inserted between page 1 and page 2 fetches shifts the offset boundary, causing the same row to appear on both pages. Cursor pagination replaces this: the cursor encodes the exact position of the last seen row (e.g., {id, created_at}), and the query uses a keyset condition WHERE (created_at, id) < (cursor_ts, cursor_id) ORDER BY created_at DESC, id DESC LIMIT 20. This is O(log N) regardless of page depth and immune to data drift.”}},{“@type”:”Question”,”name”:”What must a cursor encode and why should it be opaque to the client?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”A cursor must encode all columns used in the ORDER BY clause — typically {id, created_at} for a time-ordered feed. Both columns are needed because created_at alone is not unique (multiple rows can have the same timestamp); the id breaks ties. The cursor should be base64-encoded JSON, not a raw integer or exposed column value, for two reasons: (1) opacity prevents clients from constructing cursors manually and coupling to your DB schema, (2) you can change the cursor internals (add a new sort column) without breaking the client API contract. Always validate decoded cursors on the server before using them in queries.”}},{“@type”:”Question”,”name”:”How do you return has_next without a COUNT(*) query?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Request LIMIT + 1 rows. If the result set has limit+1 rows, there is a next page — set has_next=true and return only the first limit rows. If the result set has limit or fewer rows, has_next=false. This costs exactly one extra row of work rather than a full COUNT(*) which scans the entire result set. Never use SELECT COUNT(*) for pagination — on a table with millions of rows, it is a full table scan (or full index scan) that adds hundreds of milliseconds to every paginated request.”}},{“@type”:”Question”,”name”:”What index is required for cursor pagination to be efficient?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”The index must exactly match the WHERE clause and ORDER BY clause. For SELECT * FROM Post WHERE (created_at, id) < (%s, %s) ORDER BY created_at DESC, id DESC LIMIT 20, create: CREATE INDEX idx_post_pagination ON Post(created_at DESC, id DESC). PostgreSQL can use this index for both the keyset WHERE condition and the ORDER BY, making the query an index range scan starting from the cursor position rather than a sequential scan. Without this index, PostgreSQL performs a full sequential scan and sort on every request, making cursor pagination no faster than offset.”}},{“@type”:”Question”,”name”:”How does cursor pagination work with filtered queries?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”The filter column must be added to the index as the leading column. For a feed filtered by user_id: CREATE INDEX idx_post_user ON Post(user_id, created_at DESC, id DESC). The query becomes: WHERE user_id = :uid AND (created_at, id) < (:ts, :id) ORDER BY created_at DESC, id DESC LIMIT 20. The cursor is valid only within the context of the same filter — a cursor from user_id=5’s feed cannot be used for user_id=6. Encode any filter parameters in the cursor or require the client to pass them consistently. Changing filters requires starting from the first page.”}}]}

Cursor-based feed pagination and infinite scroll design is discussed in Twitter system design interview questions.

Feed pagination and cursor-based API design is covered in LinkedIn system design interview preparation.

News feed pagination and cursor API design is discussed in Meta system design interview guide.