Cursor-Based Sync Low-Level Design: Incremental Delta Pull, Conflict Resolution, and Offline Support

A cursor-based incremental sync system lets clients efficiently download only the data that has changed since their last sync, rather than re-fetching the entire dataset. Used in mobile apps, offline-first systems, and integration APIs. Core challenges: defining a reliable sync cursor (what counts as “changed since X”), handling deletes without tombstoning every table, resolving conflicts when both sides mutate the same record, and scaling the sync endpoint without full-table scans.

Core Data Model

-- Monotonically increasing sequence number assigned to every write
-- Acts as the sync cursor
CREATE SEQUENCE global_sync_seq START 1 INCREMENT 1;

-- Every mutable table gets a sync_seq column
ALTER TABLE Product ADD COLUMN sync_seq BIGINT DEFAULT nextval('global_sync_seq');
ALTER TABLE Order   ADD COLUMN sync_seq BIGINT DEFAULT nextval('global_sync_seq');
ALTER TABLE User    ADD COLUMN sync_seq BIGINT DEFAULT nextval('global_sync_seq');

-- Trigger: update sync_seq on every INSERT or UPDATE
CREATE OR REPLACE FUNCTION set_sync_seq()
RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN
    NEW.sync_seq := nextval('global_sync_seq');
    RETURN NEW;
END;
$$;

CREATE TRIGGER trg_product_sync BEFORE INSERT OR UPDATE ON Product
    FOR EACH ROW EXECUTE FUNCTION set_sync_seq();

-- Soft-delete for sync visibility
ALTER TABLE Product ADD COLUMN deleted_at TIMESTAMPTZ;
-- Rows with deleted_at IS NOT NULL appear in sync with a deleted flag
-- Physical deletion only happens after a retention window (e.g., 90 days)

-- Client sync state (persisted server-side for diagnostic purposes)
CREATE TABLE ClientSyncState (
    client_id    TEXT NOT NULL,
    user_id      UUID NOT NULL,
    last_sync_seq BIGINT NOT NULL DEFAULT 0,
    synced_at    TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    PRIMARY KEY (client_id, user_id)
);

Sync API: Delta Pull

from dataclasses import dataclass
from datetime import datetime, timezone

@dataclass
class SyncResponse:
    changes: list[dict]
    cursor: int          # new sync position for next call
    has_more: bool       # true if more changes available in this batch

PAGE_SIZE = 500

def get_changes_since(conn, user_id: str, since_seq: int) -> SyncResponse:
    """
    Return all records modified after since_seq, across all synced tables.
    Returns a unified list of changes with type, operation, and payload.
    """
    changes = []

    # Sync Products (user-scoped)
    with conn.cursor() as cur:
        cur.execute("""
            SELECT product_id, name, price_cents, sync_seq,
                   deleted_at IS NOT NULL AS is_deleted
            FROM Product
            WHERE user_id = %s AND sync_seq > %s
            ORDER BY sync_seq ASC
            LIMIT %s
        """, (user_id, since_seq, PAGE_SIZE))
        for row in cur.fetchall():
            changes.append({
                "type": "product",
                "id": str(row[0]),
                "operation": "delete" if row[4] else "upsert",
                "payload": None if row[4] else {
                    "product_id": str(row[0]),
                    "name": row[1],
                    "price_cents": row[2]
                },
                "seq": row[3]
            })

    # Sync Orders
    with conn.cursor() as cur:
        cur.execute("""
            SELECT order_id, status, total_cents, sync_seq,
                   deleted_at IS NOT NULL AS is_deleted
            FROM Order
            WHERE user_id = %s AND sync_seq > %s
            ORDER BY sync_seq ASC
            LIMIT %s
        """, (user_id, since_seq, PAGE_SIZE))
        for row in cur.fetchall():
            changes.append({
                "type": "order",
                "id": str(row[0]),
                "operation": "delete" if row[4] else "upsert",
                "payload": None if row[4] else {
                    "order_id": str(row[0]),
                    "status": row[1],
                    "total_cents": row[2]
                },
                "seq": row[3]
            })

    # Sort all changes by seq, take first PAGE_SIZE
    changes.sort(key=lambda x: x["seq"])
    has_more = len(changes) >= PAGE_SIZE
    if has_more:
        changes = changes[:PAGE_SIZE]

    new_cursor = changes[-1]["seq"] if changes else since_seq

    # Update client sync state
    with conn.cursor() as cur:
        cur.execute("""
            INSERT INTO ClientSyncState (client_id, user_id, last_sync_seq)
            VALUES (%s, %s, %s)
            ON CONFLICT (client_id, user_id) DO UPDATE
            SET last_sync_seq=%s, synced_at=NOW()
        """, ("client", user_id, new_cursor, new_cursor))
    conn.commit()

    return SyncResponse(changes=changes, cursor=new_cursor, has_more=has_more)

Conflict Resolution: Last-Write-Wins

def apply_client_update(conn, user_id: str, resource_type: str,
                          resource_id: str, client_payload: dict,
                          client_version: int) -> dict:
    """
    Client pushes a local change back to the server.
    Last-write-wins: server version wins if server_seq > client_version.
    Returns {"status": "accepted"} or {"status": "conflict", "server_record": ...}
    """
    if resource_type == "product":
        with conn.cursor() as cur:
            cur.execute(
                "SELECT sync_seq, name, price_cents FROM Product WHERE product_id=%s AND user_id=%s",
                (resource_id, user_id)
            )
            server_row = cur.fetchone()

        if server_row and server_row[0] > client_version:
            # Server is newer — conflict. Return server record; client must merge.
            return {
                "status": "conflict",
                "server_record": {
                    "product_id": resource_id,
                    "name": server_row[1],
                    "price_cents": server_row[2],
                    "sync_seq": server_row[0]
                }
            }

        # Client version is current or newer — accept the update
        with conn.cursor() as cur:
            cur.execute("""
                UPDATE Product
                SET name=%s, price_cents=%s
                WHERE product_id=%s AND user_id=%s
            """, (client_payload['name'], client_payload['price_cents'],
                  resource_id, user_id))
        conn.commit()
        return {"status": "accepted"}

    raise ValueError(f"Unknown resource type: {resource_type}")

Key Interview Points

  • Sequence number vs timestamp cursor: Timestamps have two problems: clock skew (two writes at the same millisecond get the same timestamp) and clock drift (NTP corrections can cause time to go backwards). A monotonically increasing database sequence (nextval) is gapless and strictly ordered within a transaction. The sync cursor is a sequence number, not a timestamp — no duplicates, no backward movement possible. In multi-region setups, use a logical clock (Hybrid Logical Clock) or a globally-ordered sequence service.
  • Soft delete for sync visibility: If you physically delete a row, the client syncing after the deletion will never see the delete — their local copy is stale forever. Soft-delete (deleted_at IS NOT NULL) keeps the row visible in the sync stream with operation=’delete’. The client removes it from their local store. Physical cleanup: after a 90-day retention window, a nightly job permanently deletes soft-deleted rows and removes them from the sync stream.
  • Multi-table unified sync stream: Clients need to sync Products, Orders, and Users in a single request. Maintain a unified sync_seq across all tables (single Postgres sequence shared by all triggers). The query filters each table with sync_seq > since, merges the results, sorts by seq, and returns the top PAGE_SIZE. This ensures correct ordering of cross-table changes: if a Product was updated at seq=100 and an Order at seq=101, they arrive in that order.
  • Pagination and resume: For users with large datasets (first sync after 6 months offline), a single sync request may need to return 100K records. PAGE_SIZE=500 means 200 round trips — correct behavior. The client calls the API in a loop: while has_more: call sync(cursor=new_cursor). On each call, the cursor advances. If interrupted mid-sync, resume from the last successful cursor. Store the cursor durably on the client (SQLite local DB, not in-memory).
  • Conflict resolution strategies: Last-write-wins (LWW) by server sequence is the simplest: server data always wins if newer. This is appropriate for most CRUD apps. For collaborative editing (both client and server can independently edit the same field), use three-way merge: compare client change, server change, and common ancestor. For financial data: reject client updates that conflict — require the client to re-fetch and re-apply their change. Choose the strategy per resource type, not globally.

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”Why use a global Postgres sequence instead of updated_at timestamps for sync cursors?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”updated_at timestamps have two fatal problems for sync: (1) clock skew — servers in a cluster may have clocks differing by 10–100ms; a client using last_synced_at may miss rows written on a node whose clock was slightly behind; (2) duplicate timestamps — two rows can have the same updated_at if they were written in the same millisecond, and the client has no way to know which ones it has already seen. A BIGSERIAL sequence (or PostgreSQL’s built-in sequences) is strictly monotonically increasing per transaction commit — no two rows ever share a sequence value in the same database, and no clock is involved. The client stores the max sync_seq it has seen; the next pull fetches WHERE sync_seq > last_seen, guaranteed to be exactly the rows written after that point.”}},{“@type”:”Question”,”name”:”How does soft deletion work in the sync protocol?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Physical DELETE removes the row — clients polling for changes since their last sync_seq will never see the deleted row and will not know to remove it locally. Soft delete: add deleted_at TIMESTAMPTZ, set it on "deletion" but leave the row in place. The BEFORE UPDATE trigger fires and assigns a new sync_seq to the row, so it appears in the next sync delta with deleted_at IS NOT NULL. Clients see the row and know to remove it from their local store. After all clients have synced (typically 30-day inactivity threshold), the row can be physically deleted by a cleanup job. The key constraint: never show soft-deleted rows in application queries (add WHERE deleted_at IS NULL to all read queries), but always include them in sync delta responses.”}},{“@type”:”Question”,”name”:”How do you handle conflicts when two offline clients edit the same record?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Last-Write-Wins (LWW) by wall clock: compare updated_at timestamps on the incoming client change vs. the server row. Whichever has a later timestamp wins. Simple and predictable, but requires synchronized clocks (use server time, not client time, for the server copy’s timestamp). For most fields on most objects (profile name, settings, document title), LWW is acceptable — the user who edited last wins. Merge-friendly alternatives: (1) field-level LWW — track updated_at per field, not per row; each field resolves independently; (2) CRDT (Conflict-free Replicated Data Type) for append-only structures (comment lists, activity feeds) — CRDTs guarantee convergence without coordination; (3) operational transform (OT) for collaborative text (Google Docs model) — complex to implement, reserve for real collaborative editing.”}},{“@type”:”Question”,”name”:”How do you efficiently sync multiple entity types (users, posts, comments) in one request?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”A single GET /sync?since=12345 endpoint returns changes across all entity types sorted by sync_seq. The server queries each table independently and merges with ORDER BY sync_seq ASC LIMIT 1000: SELECT sync_seq, ‘user’ AS type, id, … FROM users WHERE sync_seq > $since AND user_id=$uid UNION ALL SELECT sync_seq, ‘post’ AS type, id, … FROM posts WHERE sync_seq > $since AND user_id=$uid ORDER BY sync_seq ASC LIMIT 1000. The LIMIT 1000 caps response size. The client advances its cursor to the max sync_seq in the response and paginates if it received 1000 rows (more data available). This single-endpoint approach is simpler than per-entity endpoints and ensures the client always applies changes in global causal order.”}},{“@type”:”Question”,”name”:”How do you prevent the sync_seq sequence from becoming a write bottleneck?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”A single PostgreSQL sequence serves all tables — every INSERT or UPDATE across the database increments the same counter. At high write rates (10K writes/second across all tables), this sequence can become a contention point: each nextval() call acquires a brief lock. Mitigations: (1) PostgreSQL sequence cache — SET CACHE 100 on the sequence; each backend caches 100 sequence values locally, reducing lock contention by 100x. Trade-off: if a backend restarts, its cached values are lost, creating gaps in the sequence (gaps are fine — clients only need monotonic ordering, not consecutive values); (2) partition the sequence: use separate sequences per entity type, and include the entity type in the cursor. The client sends last_seen as {"user":450,"post":320,"comment":891} — more complex but eliminates cross-table contention entirely.”}}]}

Cursor-based sync and offline-first data design is discussed in Google system design interview questions.

Cursor-based sync and cross-device data synchronization design is covered in Apple system design interview preparation.

Cursor-based sync and distributed data replication design is discussed in Amazon system design interview guide.

Scroll to Top