Bookmark System Low-Level Design: Folders, Sync, and Full-Text Search

A bookmark (save) system lets users collect and organize content they want to revisit — articles, products, listings, posts. Core design challenges: supporting folders and collections, syncing across devices in real time, fast retrieval by folder, handling very large bookmark counts (power users with 50,000+ bookmarks), and enabling full-text search across saved content metadata.

Core Data Model

CREATE TABLE BookmarkFolder (
    folder_id   UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id     UUID NOT NULL,
    name        TEXT NOT NULL,
    parent_id   UUID REFERENCES BookmarkFolder(folder_id),  -- nested folders
    sort_order  INT NOT NULL DEFAULT 0,
    created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    UNIQUE (user_id, name, parent_id)
);
CREATE INDEX idx_folder_user ON BookmarkFolder (user_id, parent_id);

CREATE TABLE Bookmark (
    bookmark_id   UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id       UUID NOT NULL,
    folder_id     UUID REFERENCES BookmarkFolder(folder_id) ON DELETE SET NULL,
    content_type  TEXT NOT NULL,    -- 'article', 'product', 'post', 'video'
    content_id    TEXT NOT NULL,
    -- Snapshot metadata for display without re-fetching
    title         TEXT,
    thumbnail_url TEXT,
    url           TEXT,
    -- Sync
    sort_order    INT NOT NULL DEFAULT 0,
    created_at    TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at    TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    UNIQUE (user_id, content_type, content_id)   -- one bookmark per content per user
);
CREATE INDEX idx_bookmark_user_folder ON Bookmark (user_id, folder_id, sort_order);
CREATE INDEX idx_bookmark_user_recent ON Bookmark (user_id, created_at DESC);

Bookmark CRUD

from uuid import uuid4
from datetime import datetime, timezone
import psycopg2

MAX_BOOKMARKS_PER_USER = 50_000

def add_bookmark(conn, user_id: str, content_type: str, content_id: str,
                  folder_id: str | None = None, metadata: dict | None = None) -> str:
    """
    Add a bookmark. Returns existing bookmark_id if already bookmarked (idempotent).
    Enforces per-user limit.
    """
    # Check existing
    with conn.cursor() as cur:
        cur.execute(
            "SELECT bookmark_id FROM Bookmark WHERE user_id=%s AND content_type=%s AND content_id=%s",
            (user_id, content_type, content_id)
        )
        existing = cur.fetchone()
    if existing:
        return str(existing[0])

    # Check limit
    with conn.cursor() as cur:
        cur.execute("SELECT COUNT(*) FROM Bookmark WHERE user_id=%s", (user_id,))
        count = cur.fetchone()[0]
    if count >= MAX_BOOKMARKS_PER_USER:
        raise ValueError(f"Bookmark limit ({MAX_BOOKMARKS_PER_USER}) reached")

    meta = metadata or {}
    bookmark_id = str(uuid4())
    with conn.cursor() as cur:
        cur.execute("""
            INSERT INTO Bookmark
            (bookmark_id, user_id, folder_id, content_type, content_id,
             title, thumbnail_url, url)
            VALUES (%s,%s,%s,%s,%s,%s,%s,%s)
        """, (bookmark_id, user_id, folder_id, content_type, content_id,
              meta.get('title'), meta.get('thumbnail_url'), meta.get('url')))
    conn.commit()
    return bookmark_id

def move_bookmark(conn, bookmark_id: str, user_id: str, new_folder_id: str | None):
    """Move a bookmark to a different folder (or to root if folder_id=None)."""
    with conn.cursor() as cur:
        cur.execute(
            "UPDATE Bookmark SET folder_id=%s, updated_at=NOW() WHERE bookmark_id=%s AND user_id=%s",
            (new_folder_id, bookmark_id, user_id)
        )
        if cur.rowcount == 0:
            raise ValueError("Bookmark not found or not owned by user")
    conn.commit()

def get_bookmarks(conn, user_id: str, folder_id: str | None = None,
                   limit: int = 50, before: datetime | None = None) -> list[dict]:
    """Cursor-paginated bookmark list for a folder (or root/all)."""
    with conn.cursor() as cur:
        params = [user_id]
        folder_clause = "AND folder_id IS NULL" if folder_id is None else "AND folder_id = %s"
        if folder_id:
            params.append(folder_id)
        cursor_clause = ""
        if before:
            cursor_clause = "AND created_at < %s"
            params.append(before)

        cur.execute(f"""
            SELECT bookmark_id, content_type, content_id, title, thumbnail_url, url,
                   folder_id, created_at
            FROM Bookmark
            WHERE user_id = %s {folder_clause} {cursor_clause}
            ORDER BY created_at DESC
            LIMIT %s
        """, params + [limit])

        cols = [d[0] for d in cur.description]
        return [dict(zip(cols, row)) for row in cur.fetchall()]

Sync Across Devices (Last-Write-Wins with Change Log)

-- Change log for sync (replicated to client on reconnect)
CREATE TABLE BookmarkChangeLog (
    change_id    BIGSERIAL PRIMARY KEY,
    user_id      UUID NOT NULL,
    operation    TEXT NOT NULL,   -- 'add', 'remove', 'move', 'rename_folder'
    bookmark_id  UUID,
    folder_id    UUID,
    payload      JSONB,
    occurred_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_changelog_user ON BookmarkChangeLog (user_id, change_id ASC);

def get_changes_since(conn, user_id: str, since_change_id: int) -> list[dict]:
    """
    Return all bookmark changes since the client's last sync point.
    Client stores last seen change_id; sends it on reconnect.
    """
    with conn.cursor() as cur:
        cur.execute("""
            SELECT change_id, operation, bookmark_id, folder_id, payload, occurred_at
            FROM BookmarkChangeLog
            WHERE user_id = %s AND change_id > %s
            ORDER BY change_id ASC
            LIMIT 1000
        """, (user_id, since_change_id))
        cols = [d[0] for d in cur.description]
        return [dict(zip(cols, row)) for row in cur.fetchall()]

Key Interview Points

  • Snapshot metadata for offline display: When a user views their bookmarks, the bookmarked URLs may be dead (404), paywalled, or slow. Store a snapshot of the title and thumbnail at bookmark time in the Bookmark row. This enables fast, reliable list display without re-fetching the original URL. Refresh snapshots in a background job (weekly) — detect dead links and mark them.
  • UNIQUE constraint deduplication: UNIQUE (user_id, content_type, content_id) prevents the same content from being bookmarked twice. The INSERT … ON CONFLICT pattern makes add_bookmark idempotent — double-tapping the bookmark icon from the client won’t create two rows. Return the existing bookmark_id on conflict instead of an error.
  • Change log for sync: Mobile apps go offline and come back. On reconnect, send the last seen change_id; receive all changes since then. This is cheaper than comparing full bookmark lists (O(changes since disconnect) vs O(total bookmarks)). Prune the change log after 30 days — devices offline longer than 30 days do a full sync instead. change_id is a BIGSERIAL (monotonically increasing), so ordering by it gives a reliable event sequence.
  • Folder hierarchy depth: Cap nesting depth at 3–5 levels (similar to browser bookmarks). Deep hierarchies add complexity to tree queries without user benefit. Validate on folder creation: SELECT COUNT(*) traversing ancestors via parent_id until NULL — reject if depth would exceed limit. Use the materialized path pattern (from lld-organization-hierarchy) if deeper hierarchies are needed.
  • Full-text search across bookmarks: SELECT * FROM Bookmark WHERE user_id = X AND to_tsvector(title || ‘ ‘ || url) @@ plainto_tsquery($query). Create a GIN index: CREATE INDEX ON Bookmark USING gin(to_tsvector(‘english’, coalesce(title,”) || ‘ ‘ || coalesce(url,”))). This enables full-text search across saved content titles. For richer search (search inside saved content body), index via Elasticsearch where the document text is stored at bookmark time.

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”Why snapshot metadata at bookmark time instead of fetching it live?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”At display time, fetching metadata (title, thumbnail) for 50 bookmarked URLs would require 50 HTTP requests — slow and unreliable. Worse: URLs go dead (404), get paywalled, or change their title. If metadata is stored at bookmark time, the list always renders instantly from the database — no external requests needed. The snapshot also means bookmarks remain useful even after the original content is deleted or moved. Trade-off: the stored title may be slightly outdated (article renamed). Mitigate with a background refresh job that re-fetches metadata for bookmarks viewed in the last 30 days (weekly refresh). Mark dead links (HTTP 404/410) with a broken_link=TRUE flag and show a warning icon in the UI.”}},{“@type”:”Question”,”name”:”How does the change log enable reliable multi-device sync?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Mobile apps go offline for hours or days. On reconnect, the app sends its last seen change_id. The server returns all changes since that ID: SELECT * FROM BookmarkChangeLog WHERE user_id=X AND change_id > last_seen ORDER BY change_id ASC LIMIT 1000. The client applies each change in sequence — add inserts a bookmark, remove deletes one, move changes folder_id. The change_id is a BIGSERIAL (auto-increment) — monotonically increasing per database insert, providing a reliable ordering. This is cheaper than comparing full bookmark lists (O(N) for N bookmarks) — the delta sync is O(changes since last sync). Prune the log after 30 days; clients offline longer than 30 days fall back to a full sync (download all bookmark data).”}},{“@type”:”Question”,”name”:”How do you implement nested folders without deep recursion?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”BookmarkFolder.parent_id forms a tree. Querying "all bookmarks in folder X and its subfolders" naively requires recursive CTE or application-level traversal. Limit depth to 3–4 levels (cap in create_folder validation: traverse parent_id chain to root, reject if depth would exceed 4). With this constraint, recursive CTE is fast: WITH RECURSIVE sub AS (SELECT folder_id FROM BookmarkFolder WHERE folder_id = $1 UNION ALL SELECT f.folder_id FROM BookmarkFolder f JOIN sub ON f.parent_id = sub.folder_id) SELECT bookmark_id FROM Bookmark WHERE folder_id IN (SELECT folder_id FROM sub). For deeper hierarchies (rarely needed): use materialized path as in org hierarchy design.”}},{“@type”:”Question”,”name”:”How do you efficiently implement bookmark full-text search?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”PostgreSQL full-text search over bookmarks: CREATE INDEX ON Bookmark USING GIN(to_tsvector(‘english’, coalesce(title,”) || ‘ ‘ || coalesce(url,”))). Query: SELECT * FROM Bookmark WHERE user_id=X AND to_tsvector(‘english’, coalesce(title,”) || ‘ ‘ || coalesce(url,”)) @@ plainto_tsquery(‘english’, $query). The GIN index makes this fast for individual users’ bookmarks (each user has at most 50K bookmarks). For richer search (inside saved article text): store extracted body text at bookmark time (truncated to 5,000 characters) and include it in the tsvector. Rank results: SELECT *, ts_rank(to_tsvector(…), query) AS rank ORDER BY rank DESC. PostgreSQL built-in FTS is sufficient at this scale — no Elasticsearch needed.”}},{“@type”:”Question”,”name”:”How do you handle the 50,000 bookmark limit for power users?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”The limit exists for performance (unlimited bookmarks would make pagination and sync expensive) and storage. Enforcement: before inserting, SELECT COUNT(*) FROM Bookmark WHERE user_id=X and compare with MAX_BOOKMARKS_PER_USER. If at limit: offer to upgrade to a "Power User" tier with higher limits, or suggest archiving old bookmarks (DELETE oldest 1,000 + export to CSV). Soft limit approach: allow users to reach 60,000 but show a warning at 50,000 ("You’re approaching your bookmark limit"). Hard stop at 60,000. For the COUNT query to be fast: maintain a bookmark_count column in UserStats (increment/decrement on add/remove) — avoids a full COUNT(*) on each add operation. The UNIQUE constraint on (user_id, content_type, content_id) also implicitly bounds count by preventing duplicates.”}}]}

Bookmark and content organization system design is discussed in Atlassian system design interview questions.

Bookmark and saved content system design is covered in LinkedIn system design interview preparation.

Bookmark and cross-device sync system design is discussed in Google system design interview guide.

Scroll to Top