What is GDPR Right to Erasure?
GDPR Article 17 gives EU users the “right to be forgotten” — they can request deletion of all personal data your company holds about them. Upon receiving a valid deletion request, you have 30 days to delete the data. This sounds simple but is technically complex: data is spread across dozens of services, databases, data warehouses, backups, caches, search indexes, and third-party integrations. Building a systematic deletion pipeline is essential for any product serving EU users.
Requirements
- Accept user deletion requests via API or in-product UI
- Delete or anonymize personal data across all systems within 30 days
- Systems: primary DB, read replicas, analytics DB, search index, CDN/object storage, email lists, third-party integrations
- Maintain audit log of deletion requests and completion status (ironically, required for compliance)
- Handle partial failures: retry failed deletions, alert on stale requests
- Business data (financial transactions, fraud logs) may need to be retained — anonymize rather than delete
Data Inventory: Know Where PII Lives
Before building deletion, catalog where personal data exists:
Primary DB tables with PII: users: email, name, phone, address, date_of_birth orders: shipping_address, billing_address (linked to user_id) messages: content (may contain PII), sender_id sessions: ip_address, user_agent External systems: Analytics (Mixpanel, Amplitude): user events by user_id Email (SendGrid/Mailchimp): subscriber list, send history CDN/S3: profile photos, uploaded documents Elasticsearch: indexed user profiles, message content Data warehouse (Snowflake/BigQuery): user events, analytics tables Third-party: Intercom, Salesforce, Stripe customer objects
Deletion Request Data Model
DeletionRequest(request_id UUID, user_id UUID, email VARCHAR,
requested_at TIMESTAMP, deadline TIMESTAMP, -- requested_at + 30 days
status ENUM(PENDING, IN_PROGRESS, COMPLETED, FAILED),
completed_at TIMESTAMP,
requester_ip VARCHAR, verification_method VARCHAR)
DeletionTask(task_id UUID, request_id UUID,
system VARCHAR, -- 'primary_db', 'elasticsearch', 'sendgrid', 's3', ...
status ENUM(PENDING, COMPLETED, FAILED, SKIPPED),
error_message TEXT,
attempted_at TIMESTAMP, completed_at TIMESTAMP,
retry_count INT DEFAULT 0)
Deletion Orchestration
def process_deletion(request_id):
request = db.get(DeletionRequest, request_id)
user_id = request.user_id
# Define tasks for each system
tasks = [
('primary_db', delete_from_primary_db),
('elasticsearch', delete_from_elasticsearch),
('s3_uploads', delete_s3_objects),
('sendgrid', remove_from_sendgrid),
('analytics', anonymize_analytics_events),
('data_warehouse', anonymize_warehouse_data),
]
for system_name, handler in tasks:
task = db.create(DeletionTask(request_id=request_id, system=system_name))
try:
handler(user_id)
db.update(task, status='COMPLETED', completed_at=now())
except Exception as e:
db.update(task, status='FAILED', error_message=str(e))
failed = db.count(DeletionTask, request_id=request_id, status='FAILED')
if failed == 0:
db.update(request, status='COMPLETED', completed_at=now())
else:
db.update(request, status='FAILED')
alert_compliance_team(request_id, failed_tasks=failed)
Delete vs. Anonymize
Some data must be retained for legal/business reasons but personal identifiers must be removed:
def delete_from_primary_db(user_id):
# Hard delete: truly remove the record
db.execute('DELETE FROM sessions WHERE user_id=?', user_id)
db.execute('DELETE FROM messages WHERE user_id=?', user_id)
# Anonymize: retain business data, remove PII
db.execute('''
UPDATE users SET
email = 'deleted-' || user_id || '@deleted.invalid',
name = 'Deleted User',
phone = NULL,
address = NULL,
date_of_birth = NULL,
status = 'DELETED'
WHERE user_id=?
''', user_id)
# Orders must be retained for financial records; anonymize the address
db.execute('''
UPDATE orders SET
shipping_name = 'Deleted User',
shipping_address = '[deleted]',
billing_address = '[deleted]'
WHERE user_id=?
''', user_id)
Backup Handling
Backups contain PII snapshots. Options: (1) Exclude deleted users from future backups (feasible for incremental backups). (2) Accept that current backups contain PII and establish a backup retention policy (e.g., 90-day backup TTL); after TTL, the backup containing the user’s data expires naturally. (3) For compliance: document your backup retention policy; GDPR allows reasonable time for backup expiry.
Key Design Decisions
- Task-per-system model — independent retry for each system; one failure doesn’t block others
- Anonymize financial records rather than delete — legal retention requirements outweigh erasure right for transactional data
- Audit log of deletion tasks — ironic but required: you need to prove you deleted data
- 30-day deadline alert — cron job checks DeletionRequests WHERE deadline < NOW() AND status != ‘COMPLETED’ and alerts compliance team
- Verification before deletion — require re-authentication or email confirmation before processing deletion request
GDPR data deletion and compliance system design is discussed in Stripe system design interview guide.
Data privacy, retention, and deletion compliance is covered in Coinbase system design interview questions.
GDPR compliance and data deletion systems are discussed in Atlassian system design interview preparation.