A bookmark (save) system lets users collect and organize content they want to revisit — articles, products, listings, posts. Core design challenges: supporting folders and collections, syncing across devices in real time, fast retrieval by folder, handling very large bookmark counts (power users with 50,000+ bookmarks), and enabling full-text search across saved content metadata.
Core Data Model
CREATE TABLE BookmarkFolder (
folder_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL,
name TEXT NOT NULL,
parent_id UUID REFERENCES BookmarkFolder(folder_id), -- nested folders
sort_order INT NOT NULL DEFAULT 0,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE (user_id, name, parent_id)
);
CREATE INDEX idx_folder_user ON BookmarkFolder (user_id, parent_id);
CREATE TABLE Bookmark (
bookmark_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL,
folder_id UUID REFERENCES BookmarkFolder(folder_id) ON DELETE SET NULL,
content_type TEXT NOT NULL, -- 'article', 'product', 'post', 'video'
content_id TEXT NOT NULL,
-- Snapshot metadata for display without re-fetching
title TEXT,
thumbnail_url TEXT,
url TEXT,
-- Sync
sort_order INT NOT NULL DEFAULT 0,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE (user_id, content_type, content_id) -- one bookmark per content per user
);
CREATE INDEX idx_bookmark_user_folder ON Bookmark (user_id, folder_id, sort_order);
CREATE INDEX idx_bookmark_user_recent ON Bookmark (user_id, created_at DESC);
Bookmark CRUD
from uuid import uuid4
from datetime import datetime, timezone
import psycopg2
MAX_BOOKMARKS_PER_USER = 50_000
def add_bookmark(conn, user_id: str, content_type: str, content_id: str,
folder_id: str | None = None, metadata: dict | None = None) -> str:
"""
Add a bookmark. Returns existing bookmark_id if already bookmarked (idempotent).
Enforces per-user limit.
"""
# Check existing
with conn.cursor() as cur:
cur.execute(
"SELECT bookmark_id FROM Bookmark WHERE user_id=%s AND content_type=%s AND content_id=%s",
(user_id, content_type, content_id)
)
existing = cur.fetchone()
if existing:
return str(existing[0])
# Check limit
with conn.cursor() as cur:
cur.execute("SELECT COUNT(*) FROM Bookmark WHERE user_id=%s", (user_id,))
count = cur.fetchone()[0]
if count >= MAX_BOOKMARKS_PER_USER:
raise ValueError(f"Bookmark limit ({MAX_BOOKMARKS_PER_USER}) reached")
meta = metadata or {}
bookmark_id = str(uuid4())
with conn.cursor() as cur:
cur.execute("""
INSERT INTO Bookmark
(bookmark_id, user_id, folder_id, content_type, content_id,
title, thumbnail_url, url)
VALUES (%s,%s,%s,%s,%s,%s,%s,%s)
""", (bookmark_id, user_id, folder_id, content_type, content_id,
meta.get('title'), meta.get('thumbnail_url'), meta.get('url')))
conn.commit()
return bookmark_id
def move_bookmark(conn, bookmark_id: str, user_id: str, new_folder_id: str | None):
"""Move a bookmark to a different folder (or to root if folder_id=None)."""
with conn.cursor() as cur:
cur.execute(
"UPDATE Bookmark SET folder_id=%s, updated_at=NOW() WHERE bookmark_id=%s AND user_id=%s",
(new_folder_id, bookmark_id, user_id)
)
if cur.rowcount == 0:
raise ValueError("Bookmark not found or not owned by user")
conn.commit()
def get_bookmarks(conn, user_id: str, folder_id: str | None = None,
limit: int = 50, before: datetime | None = None) -> list[dict]:
"""Cursor-paginated bookmark list for a folder (or root/all)."""
with conn.cursor() as cur:
params = [user_id]
folder_clause = "AND folder_id IS NULL" if folder_id is None else "AND folder_id = %s"
if folder_id:
params.append(folder_id)
cursor_clause = ""
if before:
cursor_clause = "AND created_at < %s"
params.append(before)
cur.execute(f"""
SELECT bookmark_id, content_type, content_id, title, thumbnail_url, url,
folder_id, created_at
FROM Bookmark
WHERE user_id = %s {folder_clause} {cursor_clause}
ORDER BY created_at DESC
LIMIT %s
""", params + [limit])
cols = [d[0] for d in cur.description]
return [dict(zip(cols, row)) for row in cur.fetchall()]
Sync Across Devices (Last-Write-Wins with Change Log)
-- Change log for sync (replicated to client on reconnect)
CREATE TABLE BookmarkChangeLog (
change_id BIGSERIAL PRIMARY KEY,
user_id UUID NOT NULL,
operation TEXT NOT NULL, -- 'add', 'remove', 'move', 'rename_folder'
bookmark_id UUID,
folder_id UUID,
payload JSONB,
occurred_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_changelog_user ON BookmarkChangeLog (user_id, change_id ASC);
def get_changes_since(conn, user_id: str, since_change_id: int) -> list[dict]:
"""
Return all bookmark changes since the client's last sync point.
Client stores last seen change_id; sends it on reconnect.
"""
with conn.cursor() as cur:
cur.execute("""
SELECT change_id, operation, bookmark_id, folder_id, payload, occurred_at
FROM BookmarkChangeLog
WHERE user_id = %s AND change_id > %s
ORDER BY change_id ASC
LIMIT 1000
""", (user_id, since_change_id))
cols = [d[0] for d in cur.description]
return [dict(zip(cols, row)) for row in cur.fetchall()]
Key Interview Points
- Snapshot metadata for offline display: When a user views their bookmarks, the bookmarked URLs may be dead (404), paywalled, or slow. Store a snapshot of the title and thumbnail at bookmark time in the Bookmark row. This enables fast, reliable list display without re-fetching the original URL. Refresh snapshots in a background job (weekly) — detect dead links and mark them.
- UNIQUE constraint deduplication: UNIQUE (user_id, content_type, content_id) prevents the same content from being bookmarked twice. The INSERT … ON CONFLICT pattern makes add_bookmark idempotent — double-tapping the bookmark icon from the client won’t create two rows. Return the existing bookmark_id on conflict instead of an error.
- Change log for sync: Mobile apps go offline and come back. On reconnect, send the last seen change_id; receive all changes since then. This is cheaper than comparing full bookmark lists (O(changes since disconnect) vs O(total bookmarks)). Prune the change log after 30 days — devices offline longer than 30 days do a full sync instead. change_id is a BIGSERIAL (monotonically increasing), so ordering by it gives a reliable event sequence.
- Folder hierarchy depth: Cap nesting depth at 3–5 levels (similar to browser bookmarks). Deep hierarchies add complexity to tree queries without user benefit. Validate on folder creation: SELECT COUNT(*) traversing ancestors via parent_id until NULL — reject if depth would exceed limit. Use the materialized path pattern (from lld-organization-hierarchy) if deeper hierarchies are needed.
- Full-text search across bookmarks: SELECT * FROM Bookmark WHERE user_id = X AND to_tsvector(title || ‘ ‘ || url) @@ plainto_tsquery($query). Create a GIN index: CREATE INDEX ON Bookmark USING gin(to_tsvector(‘english’, coalesce(title,”) || ‘ ‘ || coalesce(url,”))). This enables full-text search across saved content titles. For richer search (search inside saved content body), index via Elasticsearch where the document text is stored at bookmark time.
{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”Why snapshot metadata at bookmark time instead of fetching it live?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”At display time, fetching metadata (title, thumbnail) for 50 bookmarked URLs would require 50 HTTP requests — slow and unreliable. Worse: URLs go dead (404), get paywalled, or change their title. If metadata is stored at bookmark time, the list always renders instantly from the database — no external requests needed. The snapshot also means bookmarks remain useful even after the original content is deleted or moved. Trade-off: the stored title may be slightly outdated (article renamed). Mitigate with a background refresh job that re-fetches metadata for bookmarks viewed in the last 30 days (weekly refresh). Mark dead links (HTTP 404/410) with a broken_link=TRUE flag and show a warning icon in the UI.”}},{“@type”:”Question”,”name”:”How does the change log enable reliable multi-device sync?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Mobile apps go offline for hours or days. On reconnect, the app sends its last seen change_id. The server returns all changes since that ID: SELECT * FROM BookmarkChangeLog WHERE user_id=X AND change_id > last_seen ORDER BY change_id ASC LIMIT 1000. The client applies each change in sequence — add inserts a bookmark, remove deletes one, move changes folder_id. The change_id is a BIGSERIAL (auto-increment) — monotonically increasing per database insert, providing a reliable ordering. This is cheaper than comparing full bookmark lists (O(N) for N bookmarks) — the delta sync is O(changes since last sync). Prune the log after 30 days; clients offline longer than 30 days fall back to a full sync (download all bookmark data).”}},{“@type”:”Question”,”name”:”How do you implement nested folders without deep recursion?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”BookmarkFolder.parent_id forms a tree. Querying "all bookmarks in folder X and its subfolders" naively requires recursive CTE or application-level traversal. Limit depth to 3–4 levels (cap in create_folder validation: traverse parent_id chain to root, reject if depth would exceed 4). With this constraint, recursive CTE is fast: WITH RECURSIVE sub AS (SELECT folder_id FROM BookmarkFolder WHERE folder_id = $1 UNION ALL SELECT f.folder_id FROM BookmarkFolder f JOIN sub ON f.parent_id = sub.folder_id) SELECT bookmark_id FROM Bookmark WHERE folder_id IN (SELECT folder_id FROM sub). For deeper hierarchies (rarely needed): use materialized path as in org hierarchy design.”}},{“@type”:”Question”,”name”:”How do you efficiently implement bookmark full-text search?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”PostgreSQL full-text search over bookmarks: CREATE INDEX ON Bookmark USING GIN(to_tsvector(‘english’, coalesce(title,”) || ‘ ‘ || coalesce(url,”))). Query: SELECT * FROM Bookmark WHERE user_id=X AND to_tsvector(‘english’, coalesce(title,”) || ‘ ‘ || coalesce(url,”)) @@ plainto_tsquery(‘english’, $query). The GIN index makes this fast for individual users’ bookmarks (each user has at most 50K bookmarks). For richer search (inside saved article text): store extracted body text at bookmark time (truncated to 5,000 characters) and include it in the tsvector. Rank results: SELECT *, ts_rank(to_tsvector(…), query) AS rank ORDER BY rank DESC. PostgreSQL built-in FTS is sufficient at this scale — no Elasticsearch needed.”}},{“@type”:”Question”,”name”:”How do you handle the 50,000 bookmark limit for power users?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”The limit exists for performance (unlimited bookmarks would make pagination and sync expensive) and storage. Enforcement: before inserting, SELECT COUNT(*) FROM Bookmark WHERE user_id=X and compare with MAX_BOOKMARKS_PER_USER. If at limit: offer to upgrade to a "Power User" tier with higher limits, or suggest archiving old bookmarks (DELETE oldest 1,000 + export to CSV). Soft limit approach: allow users to reach 60,000 but show a warning at 50,000 ("You’re approaching your bookmark limit"). Hard stop at 60,000. For the COUNT query to be fast: maintain a bookmark_count column in UserStats (increment/decrement on add/remove) — avoids a full COUNT(*) on each add operation. The UNIQUE constraint on (user_id, content_type, content_id) also implicitly bounds count by preventing duplicates.”}}]}
Bookmark and content organization system design is discussed in Atlassian system design interview questions.
Bookmark and saved content system design is covered in LinkedIn system design interview preparation.
Bookmark and cross-device sync system design is discussed in Google system design interview guide.