Archive System — Low-Level Design
An archive system moves cold data from primary storage to cheaper, slower tiers while keeping it accessible on demand. This pattern appears at every company managing large datasets: email archiving, log archiving, database record archiving, and document storage.
Tiered Storage Model
Hot tier: Primary PostgreSQL / MySQL — latest 90 days, sub-10ms reads
Warm tier: Separate "archive" DB or read replica — 90 days to 2 years
Cold tier: S3 / GCS object storage — 2+ years, minutes to retrieve
Glacier: AWS Glacier / S3 Glacier — 7+ years, hours to retrieve
Data schema to track location across tiers:
Record
id BIGSERIAL PK
created_at TIMESTAMPTZ NOT NULL
archived_at TIMESTAMPTZ -- null = still in hot tier
archive_tier TEXT -- 'warm', 'cold', 'glacier'
archive_key TEXT -- S3 key or warm DB reference
data JSONB -- null if moved to another tier
ArchiveJob
id BIGSERIAL PK
tier_target TEXT NOT NULL
cutoff_date TIMESTAMPTZ NOT NULL
status TEXT DEFAULT 'running'
records_moved INT DEFAULT 0
started_at TIMESTAMPTZ
completed_at TIMESTAMPTZ
Archive Worker: Moving to Cold Tier
def archive_batch_to_s3(cutoff_date, batch_size=1000):
while True:
# Fetch a batch of records to archive
records = db.execute("""
SELECT * FROM Record
WHERE created_at < %(cutoff)s
AND archived_at IS NULL
ORDER BY created_at
LIMIT %(batch_size)s
""", {'cutoff': cutoff_date, 'batch_size': batch_size})
if not records:
break
# Serialize to JSON and write to S3
year_month = records[0].created_at.strftime('%Y/%m')
s3_key = f'archive/{year_month}/{records[0].id}-{records[-1].id}.jsonl.gz'
payload = gzip.compress(
'n'.join(json.dumps(r) for r in records).encode()
)
s3.put_object(Bucket='my-archive', Key=s3_key, Body=payload)
# Mark records as archived and remove from hot tier
ids = [r.id for r in records]
db.execute("""
UPDATE Record
SET archived_at=NOW(), archive_tier='cold',
archive_key=%(key)s, data=NULL
WHERE id = ANY(%(ids)s)
""", {'key': s3_key, 'ids': ids})
db.commit()
Retrieval: Restoring an Archived Record
def get_record(record_id):
record = db.get(Record, record_id)
if record.data is not None:
return record.data # Hot tier — return immediately
if record.archive_tier == 'warm':
return warm_db.get(record_id)
if record.archive_tier == 'cold':
return restore_from_s3(record.archive_key, record_id)
if record.archive_tier == 'glacier':
# Initiate restore (async — takes hours)
glacier.restore_object(Key=record.archive_key, Days=7)
raise ArchiveRestorePending(f'Glacier restore initiated, retry in 3-5 hours')
def restore_from_s3(s3_key, target_id):
obj = s3.get_object(Bucket='my-archive', Key=s3_key)
lines = gzip.decompress(obj['Body'].read()).decode().split('n')
for line in lines:
r = json.loads(line)
if r['id'] == target_id:
return r
raise NotFound(f'Record {target_id} not found in archive file {s3_key}')
Indexing Archived Data
Problem: archived records have data=NULL in the primary DB. You need to find records without fetching every S3 file.
Solution: Keep a lightweight index in PostgreSQL even after archiving:
-- After archiving, keep id, created_at, archive_key, and key searchable fields
-- Remove only the large data column
ALTER TABLE Record DROP COLUMN data; -- or set to NULL
-- Add index for date-range queries on archived records
CREATE INDEX idx_record_archived ON Record(created_at)
WHERE archived_at IS NOT NULL;
-- For text search on archived content: Elasticsearch index
-- Pipe records through Kafka to ES before/during archival
-- ES retains searchable fields; full record retrieved from S3
Archive Policy Configuration
ArchivePolicy
id BIGSERIAL PK
record_type TEXT NOT NULL -- 'orders', 'logs', 'messages'
hot_retention INT NOT NULL -- days to keep in hot tier
warm_retention INT -- days in warm tier (null = skip)
cold_retention INT -- days in cold tier
glacier_enabled BOOLEAN DEFAULT false
compress BOOLEAN DEFAULT true
enabled BOOLEAN DEFAULT true
Store policies in the database — not hardcoded — so they can be updated without code deploys. The archive worker reads the policy table to determine per-record-type cutoff dates.
Safety: Soft Delete Before Hard Delete
-- Phase 1 (day 0): Mark records for archival, do NOT delete yet
UPDATE Record SET archived_at=NOW(), archive_tier='cold', archive_key=:key, data=NULL
-- Phase 2 (day 7): Verify S3 file is accessible and correct count
SELECT count(*) FROM Record WHERE archive_key=:key
-- Compare against line count in S3 file
-- Phase 3 (day 30): Hard delete raw data if verified
-- At this point data=NULL already; the S3 file is the record of truth
Never hard-delete from S3 during the same job that writes to S3. Add a verification step with a delay.
Key Interview Points
- Separate archival from deletion: Archive first, verify, delete later. Data loss from a buggy archive job is catastrophic.
- Keep index rows in primary DB: Retain id, created_at, archive_key. The full payload moves to S3; searching remains fast.
- Batch size tuning: 1,000-5,000 records per batch avoids long transactions and excessive S3 PUTs. S3 PUT costs $0.005/1K requests — batching matters.
- Glacier retrieval latency: Standard retrieval takes 3-5 hours; Expedited costs $0.03/GB but takes 1-5 min. Design UI to set expectations — no “instant” access for glacier data.
- Compression: JSONL files gzip at 5-10x typical compression. A 1GB archive batch becomes 100-200MB in S3. Always compress.
{“@context”:”https://schema.org”,”@type”:”FAQPage”,”mainEntity”:[{“@type”:”Question”,”name”:”What are the tradeoffs between hot, warm, and cold storage tiers?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Hot tier (primary DB): sub-10ms reads, most expensive ($0.10-0.25/GB/month for SSD). Use for data accessed daily. Warm tier (separate DB / read replica): 10-100ms reads, cheaper. Use for data accessed occasionally (weekly/monthly). Cold tier (S3/GCS): 100ms-1s reads, ~$0.023/GB/month. Use for data accessed rarely (quarterly/yearly). Glacier: hours to retrieve, ~$0.004/GB/month. Use for compliance retention with no access expectation. Design: keep the index (IDs, dates) in the hot DB; move only the payload to cold storage.”}},{“@type”:”Question”,”name”:”How do you safely archive records without losing data?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Three-phase approach: (1) Write phase — serialize records to S3 in compressed JSONL batches; mark records in DB as archived_at=NOW(), archive_key=s3_key, data=NULL. (2) Verification phase — 7 days later, verify that the S3 file is accessible and the line count matches the expected number of records. (3) Cleanup phase — only after verification passes, do any additional cleanup. Never hard-delete the original rows in the same transaction that writes to S3. The gap between phases gives you time to detect and recover from S3 write failures.”}},{“@type”:”Question”,”name”:”How do you make archived records searchable without loading all S3 files?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Keep lightweight index rows in the primary database after archiving: retain id, created_at, user_id, and archive_key but NULL the large payload column. Index created_at for date-range queries. For full-text search on archived content: before or during archival, pipe records through Kafka to an Elasticsearch index. Elasticsearch retains searchable fields (name, description, tags); the full record is fetched from S3 on demand. This gives you fast search without loading multi-GB S3 files.”}},{“@type”:”Question”,”name”:”What batch size should you use for archiving records to S3?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”1,000-10,000 records per batch. Too small (100 records): too many S3 PUT operations ($0.005 per 1,000 PUTs). Too large (100,000 records): single transaction takes too long, exceeds PostgreSQL statement timeout, and makes partial failure recovery harder. For each batch: one DB transaction, one S3 multipart upload. S3 multipart requires minimum 5MB per part — gzip-compressed JSONL for 5,000 records typically exceeds this threshold. Use gzip compression: JSONL compresses 5-10x, dramatically reducing S3 storage costs.”}},{“@type”:”Question”,”name”:”How do you handle a request for data that is in Glacier?”,”acceptedAnswer”:{“@type”:”Answer”,”text”:”Return an immediate 202 Accepted response: the data will be available in 3-5 hours (Standard retrieval) or 1-5 minutes (Expedited, $0.03/GB). Initiate the Glacier restore with a 7-day expiry window. Store the restore request in a RestoreRequest table with status=pending and notify the user via email when complete. Poll the S3 restore status (or use S3 event notifications) to detect when the restore finishes. Never block the HTTP request waiting for Glacier — it will time out.”}}]}
Archive system and S3 tiered storage design is discussed in Amazon system design interview questions.
Data archival and cold storage system design is covered in Google system design interview preparation.
Data archival and tiered storage pipeline design is covered in Databricks system design interview guide.