GDPR Right to Erasure Low-Level Design: Anonymization, Legal Holds, and Audit Trail

GDPR Article 17 grants users the “right to erasure” — the ability to request that all their personal data be deleted. Implementing this correctly across a distributed system requires: identifying every location where user data exists, deleting or anonymizing it within 30 days, handling data that must be retained for legal reasons (invoices, fraud records), and providing a verifiable audit trail of the deletion. Getting this wrong risks fines up to €20M or 4% of global annual revenue.

Core Data Model

CREATE TYPE erasure_status AS ENUM (
    'requested','in_progress','completed','partial','failed'
);

CREATE TABLE ErasureRequest (
    request_id      UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id         UUID NOT NULL UNIQUE,   -- one pending request per user
    email           TEXT NOT NULL,          -- capture at request time (PII)
    status          erasure_status NOT NULL DEFAULT 'requested',
    requested_at    TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    deadline        TIMESTAMPTZ NOT NULL DEFAULT NOW() + interval '30 days',
    completed_at    TIMESTAMPTZ,
    -- Track progress per data category
    categories_done TEXT[] NOT NULL DEFAULT '{}',
    categories_failed TEXT[] NOT NULL DEFAULT '{}'
);

-- Audit trail: what was deleted/anonymized and when
CREATE TABLE ErasureAudit (
    audit_id        BIGSERIAL PRIMARY KEY,
    request_id      UUID NOT NULL REFERENCES ErasureRequest(request_id),
    category        TEXT NOT NULL,  -- 'profile', 'orders', 'sessions', 'backups'
    action          TEXT NOT NULL,  -- 'deleted', 'anonymized', 'retained_legal'
    record_count    INT,
    details         TEXT,
    performed_at    TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Legal hold registry: data that must be retained despite erasure request
CREATE TABLE LegalRetentionHold (
    hold_id         UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id         UUID NOT NULL,
    category        TEXT NOT NULL,
    reason          TEXT NOT NULL,  -- 'tax_record_7yr', 'fraud_investigation', 'court_order'
    retain_until    TIMESTAMPTZ NOT NULL,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

Erasure Pipeline

from dataclasses import dataclass
from datetime import datetime, timezone
from typing import Callable
import psycopg2

@dataclass
class ErasureCategory:
    name: str
    handler: Callable
    can_anonymize: bool  # True if data can be anonymized instead of deleted

def delete_user_profile(conn, user_id: str, request_id: str) -> int:
    """Delete or anonymize core user profile."""
    with conn.cursor() as cur:
        # Anonymize: replace PII with placeholder, keep user_id for FK integrity
        cur.execute("""
            UPDATE User SET
                email = %s,
                name = 'Deleted User',
                phone = NULL,
                avatar_url = NULL,
                date_of_birth = NULL,
                address = NULL
            WHERE user_id = %s
        """, (f"deleted_{user_id}@erasure.invalid", user_id))
        count = cur.rowcount
    conn.commit()
    return count

def delete_user_sessions(conn, user_id: str, request_id: str) -> int:
    """Delete all sessions — revoke all active logins."""
    with conn.cursor() as cur:
        cur.execute("DELETE FROM UserSession WHERE user_id = %s", (user_id,))
        count = cur.rowcount
    conn.commit()
    return count

def delete_user_content(conn, user_id: str, request_id: str) -> int:
    """Delete user-generated content (posts, comments, reviews)."""
    with conn.cursor() as cur:
        cur.execute("DELETE FROM Review WHERE user_id = %s", (user_id,))
        reviews = cur.rowcount
        cur.execute("DELETE FROM Comment WHERE user_id = %s", (user_id,))
        comments = cur.rowcount
        # Posts: anonymize instead of delete (maintains thread integrity)
        cur.execute("""
            UPDATE Post SET user_id = NULL, author_name = '[Deleted]'
            WHERE user_id = %s
        """, (user_id,))
        posts = cur.rowcount
    conn.commit()
    return reviews + comments + posts

def handle_orders_with_legal_hold(conn, user_id: str, request_id: str) -> int:
    """
    Orders within the legal retention window (7 years for tax records) must be retained.
    Anonymize PII fields but keep financial data.
    """
    with conn.cursor() as cur:
        # Anonymize customer details, keep amounts and dates for accounting
        cur.execute("""
            UPDATE Order SET
                shipping_name = 'Deleted User',
                shipping_address = NULL,
                billing_address = NULL,
                customer_email = %s
            WHERE user_id = %s
              AND placed_at > NOW() - interval '7 years'
        """, (f"deleted_{user_id}@erasure.invalid", user_id))
        retained_count = cur.rowcount

        # Orders older than 7 years can be deleted
        cur.execute("DELETE FROM Order WHERE user_id = %s AND placed_at <= NOW() - interval '7 years'", (user_id,))
        deleted_count = cur.rowcount
    conn.commit()

    # Record the legal hold
    with conn.cursor() as cur:
        cur.execute("""
            INSERT INTO LegalRetentionHold (user_id, category, reason, retain_until)
            VALUES (%s, 'orders', 'tax_record_7yr', NOW() + interval '7 years')
            ON CONFLICT DO NOTHING
        """, (user_id,))
    conn.commit()

    return retained_count + deleted_count

ERASURE_CATEGORIES = [
    ErasureCategory('sessions',    delete_user_sessions,      can_anonymize=False),
    ErasureCategory('content',     delete_user_content,       can_anonymize=True),
    ErasureCategory('orders',      handle_orders_with_legal_hold, can_anonymize=True),
    ErasureCategory('profile',     delete_user_profile,       can_anonymize=True),
]

def process_erasure_request(conn, request_id: str):
    """Run the full erasure pipeline for a request. Checkpoints progress."""
    with conn.cursor() as cur:
        cur.execute(
            "SELECT user_id, categories_done FROM ErasureRequest WHERE request_id=%s FOR UPDATE",
            (request_id,)
        )
        row = cur.fetchone()
    if not row:
        return
    user_id, done = row

    with conn.cursor() as cur:
        cur.execute("UPDATE ErasureRequest SET status='in_progress' WHERE request_id=%s", (request_id,))
    conn.commit()

    for category in ERASURE_CATEGORIES:
        if category.name in done:
            continue  # already processed — skip

        try:
            count = category.handler(conn, user_id, request_id)
            with conn.cursor() as cur:
                cur.execute("""
                    UPDATE ErasureRequest
                    SET categories_done = array_append(categories_done, %s)
                    WHERE request_id=%s
                """, (category.name, request_id))
                cur.execute("""
                    INSERT INTO ErasureAudit (request_id, category, action, record_count)
                    VALUES (%s,%s,'deleted',%s)
                """, (request_id, category.name, count))
            conn.commit()
        except Exception as e:
            with conn.cursor() as cur:
                cur.execute("""
                    UPDATE ErasureRequest
                    SET categories_failed = array_append(categories_failed, %s)
                    WHERE request_id=%s
                """, (category.name, request_id))
            conn.commit()

    # Check completion
    with conn.cursor() as cur:
        cur.execute(
            "SELECT categories_done, categories_failed FROM ErasureRequest WHERE request_id=%s",
            (request_id,)
        )
        done, failed = cur.fetchone()

    all_cats = {c.name for c in ERASURE_CATEGORIES}
    if not failed:
        final_status = 'completed'
    elif set(done) == all_cats:
        final_status = 'completed'
    else:
        final_status = 'partial'

    with conn.cursor() as cur:
        cur.execute(
            "UPDATE ErasureRequest SET status=%s, completed_at=NOW() WHERE request_id=%s",
            (final_status, request_id)
        )
    conn.commit()

Key Interview Points

  • Anonymization vs deletion: Physical deletion is impossible for data that other records reference (a FK violation breaks the DB). Anonymize instead: replace PII (email, name, address) with a non-identifying placeholder, keeping the row for referential integrity. The GDPR considers anonymized data to no longer be personal data — once a user cannot be re-identified, the obligation is fulfilled.
  • Legal retention exceptions: GDPR Article 17(3) explicitly allows retaining data for legal obligations. Tax law in most EU countries requires invoice and transaction records for 7 years. Fraud investigation records may need to be retained indefinitely. Document every exception in the LegalRetentionHold table — regulators want proof that retained data has a legitimate basis, not that you simply failed to delete it.
  • Backup data: Database backups retain a full snapshot. After erasure, the user’s data persists in backup files for the backup retention period. GDPR guidance: backups are exempt during the retention window, but when backups are restored (e.g., disaster recovery), the erasure must be re-applied before the restored data enters production. Document this policy and test it — it is a common audit finding.
  • Cascading deletions across systems: User data exists in: main database, Elasticsearch index, S3 (profile pictures, uploaded files), email marketing lists, analytics platforms (Mixpanel, Amplitude), CDN cache. Each system needs an erasure handler. Use a GDPR inventory document to list all data stores. Automate where possible (APIs to delete from email lists, S3 object deletion, search index updates). Some third-party processors have their own 30-day SLAs — trigger them immediately on request receipt.
  • The 30-day deadline: GDPR Article 12 requires responding within 1 month (extendable to 3 months for complex requests). Store deadline = requested_at + 30 days. Monitor: alert when requests approach deadline with >0 uncompleted categories. Prioritize erasure jobs over other background work. Report monthly: “X erasure requests received, Y completed on time, Z completed late.” Late completions must be documented with a reason.

{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”Why anonymize data instead of physically deleting it for GDPR erasure?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Physical DELETE breaks foreign key constraints: orders referencing user_id, audit logs referencing actor_id, and comments referencing author_id all have FK relationships. Deleting the user row would require cascading deletes across dozens of tables — destroying legitimate business records. The GDPR right to erasure is the right to erasure of *personal data*, not the right to destroy business records. Anonymization satisfies GDPR: replace the personal identifiers (name, email, phone, address) with anonymous tokens. The orders still exist for accounting; the audit logs still exist for compliance — but no one can identify whose they are. The anonymization is irreversible (you delete the original values, not just mask them) — a user_id=42 row with email="erased-42@anon.invalid" and name="[Deleted]" cannot be re-identified.”}},{“@type”:”Question”,”name”:”How do legal holds interact with GDPR erasure requests?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”GDPR Article 17(3) explicitly allows refusing erasure when retention is required by law. Tax and financial regulations (US IRC, EU VAT directives) typically require 7-year retention of transaction records. A LegalRetentionHold record marks specific categories (orders, invoices, payments) for a user as exempt from erasure. The erasure process checks for holds: if any hold exists for a category with retain_until in the future, skip that category. Log the skip in ErasureAudit with reason=’legal_hold’. When the hold expires (retain_until passes), re-run the erasure for the held categories. The key compliance requirement: document the hold reason in ErasureAudit. A regulator auditing a GDPR erasure request needs to see "orders retained due to 7-year tax obligation, expires 2031-03-15" — not a mysterious gap.”}},{“@type”:”Question”,”name”:”How do you ensure the erasure process is resumable after a partial failure?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Erasure touches multiple categories (profile, payments, orders, media, analytics, third-party) across different services. If the worker crashes after processing profile but before analytics, a naive retry would re-anonymize the profile (already erased — now you have a double-write with a different anon token, possibly creating an inconsistency). Checkpoint pattern: categories_done JSONB in ErasureRequest records each completed category as an array: ["profile", "payments"]. On retry, load the ErasureRequest and skip categories already in categories_done. Each category completes atomically: the anonymization SQL and the categories_done update are in the same transaction. A crash anywhere within a category leaves it incomplete but not partially committed — the retry re-runs the full category. This is the same pattern used in Stripe’s idempotent API design.”}},{“@type”:”Question”,”name”:”How do you handle third-party data erasure (analytics, marketing, support systems)?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Third parties (Segment, Mixpanel, Intercom, Zendesk) hold copies of user data. GDPR requires erasure from all systems, not just your database. Each vendor provides a deletion API. Third-party erasure is unreliable — APIs time out, rate-limit, or return ambiguous errors. Track each vendor separately: third_party_erasures JSONB in ErasureRequest maps vendor → status (pending/complete/failed). Retry failed vendor calls independently with exponential backoff. Maintain a deadline: all third-party deletions must complete within 30 days (within the 30-day GDPR window). Send a daily digest to the privacy team listing any vendor erasures pending >7 days. For vendors without deletion APIs (legacy systems): maintain a suppression list (erased user IDs) checked before any new data is sent to that vendor.”}},{“@type”:”Question”,”name”:”How do you validate that erasure was actually complete?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Post-erasure verification: after all categories are marked done, run a verification scan. For each PII field in the schema, query for the erased user’s original values: SELECT COUNT(*) FROM UserProfile WHERE email=%s (should be 0 after erasure). Verification queries are pre-generated from a schema registry that maps tables and columns containing PII. Log verification results in ErasureAudit: "verified_clean: [UserProfile, PaymentMethod, UserAddress]" or "pii_residual: [SupportTicket.email — 3 rows not anonymized]". Residual PII triggers an immediate alert and a follow-up anonymization pass. Run this verification within 24 hours of erasure completion. For compliance reporting: ErasureAudit provides a complete timeline (request received, each category completed, verification passed) that can be exported as a PDF for Data Subject Access Request (DSAR) responses.”}}]}

GDPR erasure and data privacy system design is discussed in LinkedIn system design interview questions.

GDPR erasure and user data deletion design is covered in Meta system design interview preparation.

GDPR right to erasure and data anonymization design is discussed in Google system design interview guide.

Scroll to Top