Question 1

How does access decay work in hot-cold tiering?

Accepted Answer

Access decay reduces the weight of older accesses over time, ensuring that items that were popular long ago do not block cold migration indefinitely. In LFU with exponential decay, the access count is halved periodically (e.g., every 12 hours). A sliding window count achieves similar results by only counting accesses within a fixed recent window (e.g., 30 days), letting old accesses fall off naturally.

Question 2

What threshold triggers migration to the cold tier?

Accepted Answer

Migration is triggered when an item has not been accessed for N days (e.g., 30 days). The threshold is configurable per data class. The tiering job queries for hot-tier items where last_accessed_at is older than the threshold and schedules them for cold migration. Items near the threshold can be placed in a warm tier as a buffer before full cold migration.

Question 3

What is the retrieval latency for cold-tier data?

Accepted Answer

Cold-tier retrieval latency depends on the storage backend. S3 GET requests typically complete in 10-100ms for small objects. HDD sequential reads may be 1-10ms but require the object to be on a local or NAS-attached disk. Decompression adds CPU time proportional to object size. Applications should surface cold hits to callers via response headers or metrics so SLAs can account for the higher p99 latency.

Question 4

When should a cold item be re-promoted to the hot tier?

Accepted Answer

Re-promotion should be considered when a cold item's access count crosses the hot-tier promotion threshold again within the tracking window. This is evaluated after each cold retrieval. Re-promotion is executed asynchronously by the tiering job to avoid blocking the read path. A hysteresis margin (e.g., require 2x the demotion threshold for re-promotion) prevents thrashing between tiers.

Question 5

How is access frequency tracked for tiering decisions?

Accepted Answer

Access frequency is tracked using a sliding-window counter or an approximate data structure such as a Count-Min Sketch per object key, recording read and write hits over configurable time windows (e.g., last 1h and last 24h). A tiering daemon periodically evaluates each object's frequency score against configurable hot/cold thresholds and marks objects for promotion or demotion accordingly.

Question 6

How is data migrated between tiers without downtime?

Accepted Answer

Migration follows a copy-then-delete pattern: the object is written to the destination tier first, the metadata pointer is atomically updated to reference the new location, and only then the source copy is deleted. During the migration window, reads are served from the source tier or redirected transparently so clients experience no interruption.

Question 7

How does a read from cold tier work?

Accepted Answer

A read miss on the hot tier triggers a fetch from cold storage (e.g., object store or tape), which is returned to the caller while optionally being promoted back to the hot tier if access frequency warrants it. Because cold-tier reads can have latency measured in seconds or minutes, the system may queue the request and deliver results asynchronously, surfacing a 202 Accepted response and polling endpoint to the caller.

Question 8

How are tiering thresholds tuned?

Accepted Answer

Thresholds are tuned by modeling the cost function that balances storage cost per tier against the latency and bandwidth penalty of cold reads, then selecting the access-frequency cutoff that minimizes total cost given observed access distributions. Adaptive systems continuously adjust thresholds using feedback from hit-rate metrics, storage utilization, and egress costs rather than relying on static configuration.

Hot-Cold Data Tiering Low-Level Design: Access Pattern Detection, Tier Migration, and Storage Cost Optimization

Why Hot-Cold Tiering?

Access Frequency Tracking

Hot Promotion Threshold

Cold Migration

Retrieval on Cold Miss

Warm Tier (Optional Middle Layer)

SQL Schema

Python Implementation Sketch