Question 1

Why use a global Postgres sequence instead of updated_at timestamps for sync cursors?

Accepted Answer

updated_at timestamps have two fatal problems for sync: (1) clock skew — servers in a cluster may have clocks differing by 10–100ms; a client using last_synced_at may miss rows written on a node whose clock was slightly behind; (2) duplicate timestamps — two rows can have the same updated_at if they were written in the same millisecond, and the client has no way to know which ones it has already seen. A BIGSERIAL sequence (or PostgreSQL's built-in sequences) is strictly monotonically increasing per transaction commit — no two rows ever share a sequence value in the same database, and no clock is involved. The client stores the max sync_seq it has seen; the next pull fetches WHERE sync_seq > last_seen, guaranteed to be exactly the rows written after that point.

Question 2

How does soft deletion work in the sync protocol?

Accepted Answer

Physical DELETE removes the row — clients polling for changes since their last sync_seq will never see the deleted row and will not know to remove it locally. Soft delete: add deleted_at TIMESTAMPTZ, set it on "deletion" but leave the row in place. The BEFORE UPDATE trigger fires and assigns a new sync_seq to the row, so it appears in the next sync delta with deleted_at IS NOT NULL. Clients see the row and know to remove it from their local store. After all clients have synced (typically 30-day inactivity threshold), the row can be physically deleted by a cleanup job. The key constraint: never show soft-deleted rows in application queries (add WHERE deleted_at IS NULL to all read queries), but always include them in sync delta responses.

Question 3

How do you handle conflicts when two offline clients edit the same record?

Accepted Answer

Last-Write-Wins (LWW) by wall clock: compare updated_at timestamps on the incoming client change vs. the server row. Whichever has a later timestamp wins. Simple and predictable, but requires synchronized clocks (use server time, not client time, for the server copy's timestamp). For most fields on most objects (profile name, settings, document title), LWW is acceptable — the user who edited last wins. Merge-friendly alternatives: (1) field-level LWW — track updated_at per field, not per row; each field resolves independently; (2) CRDT (Conflict-free Replicated Data Type) for append-only structures (comment lists, activity feeds) — CRDTs guarantee convergence without coordination; (3) operational transform (OT) for collaborative text (Google Docs model) — complex to implement, reserve for real collaborative editing.

Question 4

How do you efficiently sync multiple entity types (users, posts, comments) in one request?

Accepted Answer

A single GET /sync?since=12345 endpoint returns changes across all entity types sorted by sync_seq. The server queries each table independently and merges with ORDER BY sync_seq ASC LIMIT 1000: SELECT sync_seq, 'user' AS type, id, ... FROM users WHERE sync_seq > $since AND user_id=$uid UNION ALL SELECT sync_seq, 'post' AS type, id, ... FROM posts WHERE sync_seq > $since AND user_id=$uid ORDER BY sync_seq ASC LIMIT 1000. The LIMIT 1000 caps response size. The client advances its cursor to the max sync_seq in the response and paginates if it received 1000 rows (more data available). This single-endpoint approach is simpler than per-entity endpoints and ensures the client always applies changes in global causal order.

Question 5

How do you prevent the sync_seq sequence from becoming a write bottleneck?

Accepted Answer

A single PostgreSQL sequence serves all tables — every INSERT or UPDATE across the database increments the same counter. At high write rates (10K writes/second across all tables), this sequence can become a contention point: each nextval() call acquires a brief lock. Mitigations: (1) PostgreSQL sequence cache — SET CACHE 100 on the sequence; each backend caches 100 sequence values locally, reducing lock contention by 100x. Trade-off: if a backend restarts, its cached values are lost, creating gaps in the sequence (gaps are fine — clients only need monotonic ordering, not consecutive values); (2) partition the sequence: use separate sequences per entity type, and include the entity type in the cursor. The client sends last_seen as {"user":450,"post":320,"comment":891} — more complex but eliminates cross-table contention entirely.

Cursor-Based Sync Low-Level Design: Incremental Delta Pull, Conflict Resolution, and Offline Support

Core Data Model

Sync API: Delta Pull

Conflict Resolution: Last-Write-Wins

Key Interview Points