Why Hot-Cold Tiering?
Not all data is accessed equally. A small fraction of items (the “hot” set) drives the majority of reads, while the bulk of the dataset (the “cold” set) is rarely or never accessed. Storing all data on fast, expensive SSD storage wastes money. Hot-cold tiering automatically moves data between storage tiers based on access frequency, delivering both performance and cost efficiency.
Typical cost differential: SSD NVMe at $0.10/GB/month vs S3 object storage at $0.023/GB/month — over 4x savings for cold data, and real-world workloads often have 80-90% of data in the cold tier.
Access Frequency Tracking
Each item maintains an access counter with time-decay. Two common models:
- LFU with exponential decay: access count is halved every N hours. Recent accesses count more than old ones. An item accessed 100 times two weeks ago may score lower than one accessed 10 times today.
- Sliding window count: count accesses in the last 30 days only. Simple; no decay math required. Old accesses fall off naturally as the window slides.
On every read, access_count_30d is incremented and last_accessed_at is updated. A background job re-computes counts by querying the access event log, keeping the counter accurate over rolling windows.
Hot Promotion Threshold
An item is promoted to the hot tier if its access count exceeds a threshold within the tracking window. For example: promote if access_count_30d > 100. The threshold is tunable per data class — user profile photos may have a lower threshold than product images.
Promotion is executed by the background tiering job, not on the read path, to avoid write amplification on every access.
Cold Migration
Items not accessed in N days (configurable per tier) are migrated to cold storage:
- Background batch job queries
DataItem WHERE last_accessed_at < now() - interval 'N days' AND tier = 'hot'. - For each candidate: compress the item, upload to S3 (or HDD path), record the
cold_storage_keyin the metadata store. - Mark the item as
tier = 'cold'in the hot-tier metadata store. - Optionally delete the hot-tier copy (or retain for a short grace period).
Migration is async and non-blocking. Reads during migration are served from the source (hot) tier until the migration flag is cleared.
Retrieval on Cold Miss
When a read request arrives for a cold item:
- Look up
cold_storage_keyfrom the metadata store. - Fetch the compressed object from S3/HDD.
- Decompress and serve the item.
- Evaluate whether to re-promote: if the item's access pattern crosses the hot threshold again, schedule a promotion job.
Cold retrieval adds latency (10ms-1s depending on storage backend). Surface this to callers via a cache-hit header or metric so SLAs can be set appropriately.
Warm Tier (Optional Middle Layer)
A warm tier (e.g., HDD local disk) can sit between hot SSD and cold object storage. Items moving from hot to cold pass through warm first, allowing re-promotion without a full S3 round-trip. Warm tier adds operational complexity but reduces re-promotion latency.
SQL Schema
CREATE TABLE DataItem (
id BIGSERIAL PRIMARY KEY,
item_key TEXT UNIQUE NOT NULL,
tier TEXT NOT NULL DEFAULT 'hot' CHECK (tier IN ('hot','warm','cold')),
access_count_30d INT NOT NULL DEFAULT 0,
last_accessed_at TIMESTAMPTZ NOT NULL DEFAULT now(),
cold_storage_key TEXT,
size_bytes BIGINT,
checksum TEXT
);
CREATE INDEX idx_item_tier_access ON DataItem (tier, last_accessed_at);
CREATE TABLE TierMigrationJob (
id BIGSERIAL PRIMARY KEY,
item_id BIGINT REFERENCES DataItem(id),
from_tier TEXT NOT NULL,
to_tier TEXT NOT NULL,
started_at TIMESTAMPTZ NOT NULL DEFAULT now(),
completed_at TIMESTAMPTZ,
bytes_migrated BIGINT,
status TEXT NOT NULL DEFAULT 'running' CHECK (status IN ('running','complete','failed'))
);
CREATE TABLE AccessEvent (
id BIGSERIAL PRIMARY KEY,
item_id BIGINT REFERENCES DataItem(id),
accessed_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX idx_access_item_time ON AccessEvent (item_id, accessed_at);
Python Implementation Sketch
import boto3, time, gzip, json
from datetime import datetime, timedelta
class TieringManager:
def __init__(self, db, s3_bucket: str):
self.db = db
self.s3 = boto3.client('s3')
self.s3_bucket = s3_bucket
self.hot_threshold = 100 # accesses in 30 days
self.cold_after_days = 30 # migrate if not accessed in N days
def track_access(self, item_id: int):
self.db.execute(
"INSERT INTO AccessEvent (item_id) VALUES (%s)", (item_id,)
)
self.db.execute(
"UPDATE DataItem SET access_count_30d = access_count_30d + 1, last_accessed_at = now() WHERE id = %s",
(item_id,)
)
def evaluate_tier(self, item_id: int) -> str:
row = self.db.fetchone(
"SELECT tier, access_count_30d, last_accessed_at FROM DataItem WHERE id = %s", (item_id,)
)
if not row:
return 'unknown'
if row['tier'] == 'cold' and row['access_count_30d'] >= self.hot_threshold:
return 'promote'
if row['tier'] == 'hot' and (datetime.utcnow() - row['last_accessed_at']).days >= self.cold_after_days:
return 'demote'
return row['tier']
def migrate_to_cold(self, item_id: int):
row = self.db.fetchone("SELECT * FROM DataItem WHERE id = %s", (item_id,))
job_id = self.db.fetchone(
"INSERT INTO TierMigrationJob (item_id, from_tier, to_tier) VALUES (%s, %s, %s) RETURNING id",
(item_id, 'hot', 'cold')
)['id']
try:
data = self._load_from_hot(row['item_key'])
compressed = gzip.compress(json.dumps(data).encode())
cold_key = f"cold/{item_id}/{row['item_key']}"
self.s3.put_object(Bucket=self.s3_bucket, Key=cold_key, Body=compressed)
self.db.execute(
"UPDATE DataItem SET tier = 'cold', cold_storage_key = %s WHERE id = %s",
(cold_key, item_id)
)
self.db.execute(
"UPDATE TierMigrationJob SET completed_at = now(), bytes_migrated = %s, status = 'complete' WHERE id = %s",
(len(compressed), job_id)
)
except Exception as e:
self.db.execute("UPDATE TierMigrationJob SET status = 'failed' WHERE id = %s", (job_id,))
raise
def retrieve_cold(self, item_id: int) -> dict:
row = self.db.fetchone("SELECT cold_storage_key FROM DataItem WHERE id = %s", (item_id,))
obj = self.s3.get_object(Bucket=self.s3_bucket, Key=row['cold_storage_key'])
compressed = obj['Body'].read()
data = json.loads(gzip.decompress(compressed).decode())
self.track_access(item_id)
if self.evaluate_tier(item_id) == 'promote':
# Schedule async promotion back to hot
pass
return data
def _load_from_hot(self, item_key: str) -> dict:
return self.db.fetchone("SELECT * FROM hot_storage WHERE item_key = %s", (item_key,))
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Databricks Interview Guide 2026: Spark Internals, Delta Lake, and Lakehouse Architecture