GDPR Data Deletion System Low-Level Design

What is GDPR Right to Erasure?

GDPR Article 17 gives EU users the “right to be forgotten” — they can request deletion of all personal data your company holds about them. Upon receiving a valid deletion request, you have 30 days to delete the data. This sounds simple but is technically complex: data is spread across dozens of services, databases, data warehouses, backups, caches, search indexes, and third-party integrations. Building a systematic deletion pipeline is essential for any product serving EU users.

Requirements

  • Accept user deletion requests via API or in-product UI
  • Delete or anonymize personal data across all systems within 30 days
  • Systems: primary DB, read replicas, analytics DB, search index, CDN/object storage, email lists, third-party integrations
  • Maintain audit log of deletion requests and completion status (ironically, required for compliance)
  • Handle partial failures: retry failed deletions, alert on stale requests
  • Business data (financial transactions, fraud logs) may need to be retained — anonymize rather than delete

Data Inventory: Know Where PII Lives

Before building deletion, catalog where personal data exists:

Primary DB tables with PII:
  users: email, name, phone, address, date_of_birth
  orders: shipping_address, billing_address (linked to user_id)
  messages: content (may contain PII), sender_id
  sessions: ip_address, user_agent

External systems:
  Analytics (Mixpanel, Amplitude): user events by user_id
  Email (SendGrid/Mailchimp): subscriber list, send history
  CDN/S3: profile photos, uploaded documents
  Elasticsearch: indexed user profiles, message content
  Data warehouse (Snowflake/BigQuery): user events, analytics tables
  Third-party: Intercom, Salesforce, Stripe customer objects

Deletion Request Data Model

DeletionRequest(request_id UUID, user_id UUID, email VARCHAR,
                requested_at TIMESTAMP, deadline TIMESTAMP,   -- requested_at + 30 days
                status ENUM(PENDING, IN_PROGRESS, COMPLETED, FAILED),
                completed_at TIMESTAMP,
                requester_ip VARCHAR, verification_method VARCHAR)

DeletionTask(task_id UUID, request_id UUID,
             system VARCHAR,     -- 'primary_db', 'elasticsearch', 'sendgrid', 's3', ...
             status ENUM(PENDING, COMPLETED, FAILED, SKIPPED),
             error_message TEXT,
             attempted_at TIMESTAMP, completed_at TIMESTAMP,
             retry_count INT DEFAULT 0)

Deletion Orchestration

def process_deletion(request_id):
    request = db.get(DeletionRequest, request_id)
    user_id = request.user_id

    # Define tasks for each system
    tasks = [
        ('primary_db', delete_from_primary_db),
        ('elasticsearch', delete_from_elasticsearch),
        ('s3_uploads', delete_s3_objects),
        ('sendgrid', remove_from_sendgrid),
        ('analytics', anonymize_analytics_events),
        ('data_warehouse', anonymize_warehouse_data),
    ]

    for system_name, handler in tasks:
        task = db.create(DeletionTask(request_id=request_id, system=system_name))
        try:
            handler(user_id)
            db.update(task, status='COMPLETED', completed_at=now())
        except Exception as e:
            db.update(task, status='FAILED', error_message=str(e))

    failed = db.count(DeletionTask, request_id=request_id, status='FAILED')
    if failed == 0:
        db.update(request, status='COMPLETED', completed_at=now())
    else:
        db.update(request, status='FAILED')
        alert_compliance_team(request_id, failed_tasks=failed)

Delete vs. Anonymize

Some data must be retained for legal/business reasons but personal identifiers must be removed:

def delete_from_primary_db(user_id):
    # Hard delete: truly remove the record
    db.execute('DELETE FROM sessions WHERE user_id=?', user_id)
    db.execute('DELETE FROM messages WHERE user_id=?', user_id)

    # Anonymize: retain business data, remove PII
    db.execute('''
        UPDATE users SET
            email = 'deleted-' || user_id || '@deleted.invalid',
            name = 'Deleted User',
            phone = NULL,
            address = NULL,
            date_of_birth = NULL,
            status = 'DELETED'
        WHERE user_id=?
    ''', user_id)

    # Orders must be retained for financial records; anonymize the address
    db.execute('''
        UPDATE orders SET
            shipping_name = 'Deleted User',
            shipping_address = '[deleted]',
            billing_address = '[deleted]'
        WHERE user_id=?
    ''', user_id)

Backup Handling

Backups contain PII snapshots. Options: (1) Exclude deleted users from future backups (feasible for incremental backups). (2) Accept that current backups contain PII and establish a backup retention policy (e.g., 90-day backup TTL); after TTL, the backup containing the user’s data expires naturally. (3) For compliance: document your backup retention policy; GDPR allows reasonable time for backup expiry.

Key Design Decisions

  • Task-per-system model — independent retry for each system; one failure doesn’t block others
  • Anonymize financial records rather than delete — legal retention requirements outweigh erasure right for transactional data
  • Audit log of deletion tasks — ironic but required: you need to prove you deleted data
  • 30-day deadline alert — cron job checks DeletionRequests WHERE deadline < NOW() AND status != ‘COMPLETED’ and alerts compliance team
  • Verification before deletion — require re-authentication or email confirmation before processing deletion request

GDPR data deletion and compliance system design is discussed in Stripe system design interview guide.

Data privacy, retention, and deletion compliance is covered in Coinbase system design interview questions.

GDPR compliance and data deletion systems are discussed in Atlassian system design interview preparation.

Scroll to Top