Low Level Design: Soft Delete vs Hard Delete Design

Deleting data is irreversible by default. Soft delete marks records as deleted without removing them from the database; hard delete removes them permanently. The choice between soft and hard delete affects data recovery, referential integrity, compliance, storage costs, query performance, and audit requirements. Most production systems use soft delete for critical business data and hard delete for ephemeral or compliance-constrained data.

Soft Delete Implementation

Add a deleted_at TIMESTAMP column (NULL = active, non-NULL = deleted). Every query that should exclude deleted records must include WHERE deleted_at IS NULL. This is the critical maintenance burden of soft delete: any query that omits the filter leaks deleted records to the caller. Enforcement strategies: application-level default scopes (Rails’ paranoia gem, Django’s soft-delete mixin), database-level row security policies (PostgreSQL RLS: CREATE POLICY on the table filtering deleted_at IS NULL), and database views (expose a view that filters deleted records; application queries the view). Views and RLS are the most reliable: the filter is enforced at the database level regardless of application code.

Advantages of Soft Delete

Recovery: accidentally deleted records can be restored (UPDATE SET deleted_at = NULL). Audit trail: deleted records remain in the database with their deletion timestamp — useful for compliance reporting and user support (“why is my order gone?”). Referential integrity: foreign keys remain valid — a deleted user’s orders still reference the user row; hard delete of the user would break order foreign keys without cascading deletes. Historical analytics: queries can include deleted records for historical analysis (what was the monthly churn rate?). Business intelligence: soft-deleted data provides complete historical context for analytics that needs to include churned customers or cancelled orders.

Disadvantages of Soft Delete

Query complexity: every query must filter deleted_at IS NULL or risk leaking deleted data. This filter can’t be forgotten. Performance: soft-deleted rows remain in indexes and table scans, consuming storage and I/O for rows that are logically gone. A users table with 10M deleted and 1M active users wastes 10x storage and index space on deleted data. Unique constraints: a unique constraint on email allows only one record with a given email — after soft-deleting a user, a new registration with the same email fails the uniqueness check. Workaround: partial unique index (CREATE UNIQUE INDEX ON users(email) WHERE deleted_at IS NULL) — only active records must be unique. GDPR: soft-deleted data is not deleted for GDPR purposes — right-to-erasure requires actual deletion, not just a deleted_at flag.

Hard Delete Use Cases

Hard delete permanently removes the row: DELETE FROM users WHERE id = 42. Use hard delete for: GDPR right-to-erasure (the regulation requires actual deletion, not a flag), ephemeral data (shopping cart items, temporary tokens — no value in retaining), compliance-constrained data (PII that must not be retained beyond a specific period), and storage-constrained scenarios where soft-deleted rows would accumulate without bound. Before hard deleting: cascade to all related records (orders, sessions, audit logs — or handle in the application), remove from search indexes (Elasticsearch), invalidate caches, and archive if historical data is needed.

Archival vs Deletion

Between soft delete (keep in primary table) and hard delete (remove permanently), archival moves data to a separate archive store. Archive the record to a cold table (archive_users) or object storage (S3 as JSON/Parquet) before deleting from the primary table. This: removes the data from active queries (performance improvement), preserves historical data for analytics and compliance, reduces primary database size and index size, and enables GDPR deletion (delete from the archive store within the right-to-erasure timeline). Archival is particularly valuable for time-series data and operational records that need 7-year compliance retention but are never queried in normal operations.

Handling Foreign Keys

Deleting a parent record when child records reference it fails with a foreign key constraint. Options: cascade delete (DELETE on parent automatically deletes children — dangerous for soft delete, destructive for hard delete), restrict (prevent deletion if children exist — safest, but requires the application to handle the constraint error), set null (set the child’s foreign key to NULL when the parent is deleted — appropriate for optional references), soft delete parent only (the foreign key remains valid; children see the parent as soft-deleted and must be designed to handle this), and orphan cleanup (a background job finds and handles orphaned children after hard deletion). Design foreign key behavior explicitly — the database default (RESTRICT) is safe but not always appropriate for the business logic.

GDPR Compliance and the Right to Erasure

GDPR Article 17 requires deleting personal data when requested. Soft delete alone is insufficient — the data is still in the database and backups. True erasure requires: hard deleting the record from all primary tables, deleting derived records (activity logs, email archives, analytics events), anonymizing backup data within the retention period, purging from search indexes and caches, and deleting or anonymizing in third-party systems (analytics platforms, CDPs, CRMs). Track erasure requests with a status (pending → in_progress → completed) and verify completion across all data stores. Test the erasure process: restore from a backup taken after erasure and verify the data is gone. Most GDPR erasure implementations are incomplete — they delete from the primary database but miss derived datasets and backups.

{ “@context”: “https://schema.org”, “@type”: “FAQPage”, “mainEntity”: [ { “@type”: “Question”, “name”: “What is the biggest operational risk of soft delete?”, “acceptedAnswer”: {“@type”: “Answer”, “text”: “The biggest risk is leaking soft-deleted records to API responses when a query omits the WHERE deleted_at IS NULL filter. Any query that scans the table without the filter will return logically deleted data to the caller. Mitigation: enforce the filter at the database level (PostgreSQL Row-Level Security policy or a view that filters deleted rows) rather than relying on application-level discipline. This makes the filter unforgettable. A secondary risk is unique constraint violations: soft-deleting a user doesn’t free their email for re-registration unless you use a partial unique index (WHERE deleted_at IS NULL).”} }, { “@type”: “Question”, “name”: “Is soft delete sufficient for GDPR right-to-erasure compliance?”, “acceptedAnswer”: {“@type”: “Answer”, “text”: “No. GDPR Article 17 requires actual deletion of personal data, not just marking it as deleted. Soft-deleted data remains in the database (and backups) and is still subject to GDPR. True erasure requires: hard deleting from primary tables, deleting from derived datasets (analytics, audit logs, search indexes, caches), anonymizing backup data within the retention period, and deleting from third-party systems (CDPs, analytics platforms). Track erasure requests with a state machine (pending → processing → completed) and verify completion across all data stores. Most soft-delete implementations are insufficient for GDPR erasure — they satisfy the UI requirement (the record is gone) but not the legal requirement.”} }, { “@type”: “Question”, “name”: “How do you handle foreign keys when hard-deleting a parent record?”, “acceptedAnswer”: {“@type”: “Answer”, “text”: “Foreign key options on parent delete: CASCADE (automatically delete child records — dangerous, easy to lose data); RESTRICT/NO ACTION (block deletion if children exist — safest, forces the application to handle the constraint explicitly); SET NULL (set the child’s foreign key to NULL — appropriate for optional references where the child can exist independently); SET DEFAULT (set to a default value — rarely used). For GDPR erasure: consider anonymizing the parent record (replace PII with a placeholder) rather than hard-deleting, which preserves relational integrity for order history and audit trails while removing personal data. Design the deletion behavior explicitly in schema migrations — the database default (RESTRICT) is safe but may not match business logic.”} }, { “@type”: “Question”, “name”: “When should you use data archival instead of soft or hard delete?”, “acceptedAnswer”: {“@type”: “Answer”, “text”: “Use archival when you need both performance (remove old data from active tables) and historical access (compliance reporting, analytics, customer support). Archive records to a cold store (a separate archive table, S3 as Parquet, or a data warehouse) before removing from the primary table. This removes data from indexes and active queries (improving read/write performance), retains complete historical records for analytics and compliance, enables GDPR erasure (delete from the archive store within the required timeline), and reduces primary database size. Archival is particularly valuable for: order history older than 2 years, financial transactions for compliance retention, and operational logs that are never queried in real-time but must be retained.”} } ] }