Hot-Cold Data Tiering Low-Level Design: Access Pattern Detection, Tier Migration, and Storage Cost Optimization

Why Hot-Cold Tiering?

Not all data is accessed equally. A small fraction of items (the “hot” set) drives the majority of reads, while the bulk of the dataset (the “cold” set) is rarely or never accessed. Storing all data on fast, expensive SSD storage wastes money. Hot-cold tiering automatically moves data between storage tiers based on access frequency, delivering both performance and cost efficiency.

Typical cost differential: SSD NVMe at $0.10/GB/month vs S3 object storage at $0.023/GB/month — over 4x savings for cold data, and real-world workloads often have 80-90% of data in the cold tier.

Access Frequency Tracking

Each item maintains an access counter with time-decay. Two common models:

  • LFU with exponential decay: access count is halved every N hours. Recent accesses count more than old ones. An item accessed 100 times two weeks ago may score lower than one accessed 10 times today.
  • Sliding window count: count accesses in the last 30 days only. Simple; no decay math required. Old accesses fall off naturally as the window slides.

On every read, access_count_30d is incremented and last_accessed_at is updated. A background job re-computes counts by querying the access event log, keeping the counter accurate over rolling windows.

Hot Promotion Threshold

An item is promoted to the hot tier if its access count exceeds a threshold within the tracking window. For example: promote if access_count_30d > 100. The threshold is tunable per data class — user profile photos may have a lower threshold than product images.

Promotion is executed by the background tiering job, not on the read path, to avoid write amplification on every access.

Cold Migration

Items not accessed in N days (configurable per tier) are migrated to cold storage:

  1. Background batch job queries DataItem WHERE last_accessed_at < now() - interval 'N days' AND tier = 'hot'.
  2. For each candidate: compress the item, upload to S3 (or HDD path), record the cold_storage_key in the metadata store.
  3. Mark the item as tier = 'cold' in the hot-tier metadata store.
  4. Optionally delete the hot-tier copy (or retain for a short grace period).

Migration is async and non-blocking. Reads during migration are served from the source (hot) tier until the migration flag is cleared.

Retrieval on Cold Miss

When a read request arrives for a cold item:

  1. Look up cold_storage_key from the metadata store.
  2. Fetch the compressed object from S3/HDD.
  3. Decompress and serve the item.
  4. Evaluate whether to re-promote: if the item's access pattern crosses the hot threshold again, schedule a promotion job.

Cold retrieval adds latency (10ms-1s depending on storage backend). Surface this to callers via a cache-hit header or metric so SLAs can be set appropriately.

Warm Tier (Optional Middle Layer)

A warm tier (e.g., HDD local disk) can sit between hot SSD and cold object storage. Items moving from hot to cold pass through warm first, allowing re-promotion without a full S3 round-trip. Warm tier adds operational complexity but reduces re-promotion latency.

SQL Schema

CREATE TABLE DataItem (
    id                  BIGSERIAL PRIMARY KEY,
    item_key            TEXT UNIQUE NOT NULL,
    tier                TEXT NOT NULL DEFAULT 'hot' CHECK (tier IN ('hot','warm','cold')),
    access_count_30d    INT NOT NULL DEFAULT 0,
    last_accessed_at    TIMESTAMPTZ NOT NULL DEFAULT now(),
    cold_storage_key    TEXT,
    size_bytes          BIGINT,
    checksum            TEXT
);

CREATE INDEX idx_item_tier_access ON DataItem (tier, last_accessed_at);

CREATE TABLE TierMigrationJob (
    id              BIGSERIAL PRIMARY KEY,
    item_id         BIGINT REFERENCES DataItem(id),
    from_tier       TEXT NOT NULL,
    to_tier         TEXT NOT NULL,
    started_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    completed_at    TIMESTAMPTZ,
    bytes_migrated  BIGINT,
    status          TEXT NOT NULL DEFAULT 'running' CHECK (status IN ('running','complete','failed'))
);

CREATE TABLE AccessEvent (
    id          BIGSERIAL PRIMARY KEY,
    item_id     BIGINT REFERENCES DataItem(id),
    accessed_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE INDEX idx_access_item_time ON AccessEvent (item_id, accessed_at);

Python Implementation Sketch

import boto3, time, gzip, json
from datetime import datetime, timedelta

class TieringManager:
    def __init__(self, db, s3_bucket: str):
        self.db = db
        self.s3 = boto3.client('s3')
        self.s3_bucket = s3_bucket
        self.hot_threshold = 100   # accesses in 30 days
        self.cold_after_days = 30  # migrate if not accessed in N days

    def track_access(self, item_id: int):
        self.db.execute(
            "INSERT INTO AccessEvent (item_id) VALUES (%s)", (item_id,)
        )
        self.db.execute(
            "UPDATE DataItem SET access_count_30d = access_count_30d + 1, last_accessed_at = now() WHERE id = %s",
            (item_id,)
        )

    def evaluate_tier(self, item_id: int) -> str:
        row = self.db.fetchone(
            "SELECT tier, access_count_30d, last_accessed_at FROM DataItem WHERE id = %s", (item_id,)
        )
        if not row:
            return 'unknown'
        if row['tier'] == 'cold' and row['access_count_30d'] >= self.hot_threshold:
            return 'promote'
        if row['tier'] == 'hot' and (datetime.utcnow() - row['last_accessed_at']).days >= self.cold_after_days:
            return 'demote'
        return row['tier']

    def migrate_to_cold(self, item_id: int):
        row = self.db.fetchone("SELECT * FROM DataItem WHERE id = %s", (item_id,))
        job_id = self.db.fetchone(
            "INSERT INTO TierMigrationJob (item_id, from_tier, to_tier) VALUES (%s, %s, %s) RETURNING id",
            (item_id, 'hot', 'cold')
        )['id']
        try:
            data = self._load_from_hot(row['item_key'])
            compressed = gzip.compress(json.dumps(data).encode())
            cold_key = f"cold/{item_id}/{row['item_key']}"
            self.s3.put_object(Bucket=self.s3_bucket, Key=cold_key, Body=compressed)
            self.db.execute(
                "UPDATE DataItem SET tier = 'cold', cold_storage_key = %s WHERE id = %s",
                (cold_key, item_id)
            )
            self.db.execute(
                "UPDATE TierMigrationJob SET completed_at = now(), bytes_migrated = %s, status = 'complete' WHERE id = %s",
                (len(compressed), job_id)
            )
        except Exception as e:
            self.db.execute("UPDATE TierMigrationJob SET status = 'failed' WHERE id = %s", (job_id,))
            raise

    def retrieve_cold(self, item_id: int) -> dict:
        row = self.db.fetchone("SELECT cold_storage_key FROM DataItem WHERE id = %s", (item_id,))
        obj = self.s3.get_object(Bucket=self.s3_bucket, Key=row['cold_storage_key'])
        compressed = obj['Body'].read()
        data = json.loads(gzip.decompress(compressed).decode())
        self.track_access(item_id)
        if self.evaluate_tier(item_id) == 'promote':
            # Schedule async promotion back to hot
            pass
        return data

    def _load_from_hot(self, item_key: str) -> dict:
        return self.db.fetchone("SELECT * FROM hot_storage WHERE item_key = %s", (item_key,))

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How does access decay work in hot-cold tiering?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Access decay reduces the weight of older accesses over time, ensuring that items that were popular long ago do not block cold migration indefinitely. In LFU with exponential decay, the access count is halved periodically (e.g., every 12 hours). A sliding window count achieves similar results by only counting accesses within a fixed recent window (e.g., 30 days), letting old accesses fall off naturally.”
}
},
{
“@type”: “Question”,
“name”: “What threshold triggers migration to the cold tier?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Migration is triggered when an item has not been accessed for N days (e.g., 30 days). The threshold is configurable per data class. The tiering job queries for hot-tier items where last_accessed_at is older than the threshold and schedules them for cold migration. Items near the threshold can be placed in a warm tier as a buffer before full cold migration.”
}
},
{
“@type”: “Question”,
“name”: “What is the retrieval latency for cold-tier data?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Cold-tier retrieval latency depends on the storage backend. S3 GET requests typically complete in 10-100ms for small objects. HDD sequential reads may be 1-10ms but require the object to be on a local or NAS-attached disk. Decompression adds CPU time proportional to object size. Applications should surface cold hits to callers via response headers or metrics so SLAs can account for the higher p99 latency.”
}
},
{
“@type”: “Question”,
“name”: “When should a cold item be re-promoted to the hot tier?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Re-promotion should be considered when a cold item's access count crosses the hot-tier promotion threshold again within the tracking window. This is evaluated after each cold retrieval. Re-promotion is executed asynchronously by the tiering job to avoid blocking the read path. A hysteresis margin (e.g., require 2x the demotion threshold for re-promotion) prevents thrashing between tiers.”
}
}
]
}

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “How is access frequency tracked for tiering decisions?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Access frequency is tracked using a sliding-window counter or an approximate data structure such as a Count-Min Sketch per object key, recording read and write hits over configurable time windows (e.g., last 1h and last 24h). A tiering daemon periodically evaluates each object's frequency score against configurable hot/cold thresholds and marks objects for promotion or demotion accordingly.”
}
},
{
“@type”: “Question”,
“name”: “How is data migrated between tiers without downtime?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Migration follows a copy-then-delete pattern: the object is written to the destination tier first, the metadata pointer is atomically updated to reference the new location, and only then the source copy is deleted. During the migration window, reads are served from the source tier or redirected transparently so clients experience no interruption.”
}
},
{
“@type”: “Question”,
“name”: “How does a read from cold tier work?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “A read miss on the hot tier triggers a fetch from cold storage (e.g., object store or tape), which is returned to the caller while optionally being promoted back to the hot tier if access frequency warrants it. Because cold-tier reads can have latency measured in seconds or minutes, the system may queue the request and deliver results asynchronously, surfacing a 202 Accepted response and polling endpoint to the caller.”
}
},
{
“@type”: “Question”,
“name”: “How are tiering thresholds tuned?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Thresholds are tuned by modeling the cost function that balances storage cost per tier against the latency and bandwidth penalty of cold reads, then selecting the access-frequency cutoff that minimizes total cost given observed access distributions. Adaptive systems continuously adjust thresholds using feedback from hit-rate metrics, storage utilization, and egress costs rather than relying on static configuration.”
}
}
]
}

See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering

See also: Netflix Interview Guide 2026: Streaming Architecture, Recommendation Systems, and Engineering Excellence

See also: Databricks Interview Guide 2026: Spark Internals, Delta Lake, and Lakehouse Architecture

Scroll to Top