Low Level Design: Edge Cache Service – Tech Interview Dot Org

What Is an Edge Cache Service?

An Edge Cache Service is a low-latency, high-throughput caching layer deployed at the network edge, physically close to end users. Unlike a centralized cache (e.g., a Redis cluster in one data center), edge caches are co-located with CDN PoPs, ISP exchange points, or on-premise infrastructure to minimize round-trip time. They store computed responses, database query results, and static assets so that downstream services are shielded from repeated expensive work.

Edge caches are a foundational component in modern distributed architectures. Services like Varnish, Nginx proxy cache, Cloudflare Workers KV, and AWS Lambda@Edge all implement variants of this pattern.

Data Model and Schema

The core data structures of an edge cache are kept deliberately simple for maximum throughput.

-- In-memory cache entry (conceptual)
STRUCT CacheEntry {
  key         : string       -- normalized request fingerprint
  value       : bytes        -- serialized response body
  headers     : map          -- response headers to replay
  created_at  : unix_ts
  expires_at  : unix_ts
  etag        : string
  size        : int          -- bytes
  access_count: int
}

-- LRU eviction index (doubly linked list + hash map)
STRUCT LRUIndex {
  head        : *Node        -- most recently used
  tail        : *Node        -- least recently used (eviction candidate)
  lookup      : HashMap<key, *Node>
  capacity    : int          -- max entries or max bytes
}

Persistent metadata (for warm restarts or multi-node coordination) can be written to a local RocksDB instance or pushed to a shared distributed store such as Redis or etcd.

Core Algorithm and Workflow

The edge cache lifecycle consists of four operations: get, set, evict, and invalidate.

function get(key):
  entry = lru_index.lookup(key)
  if entry is null:
    return MISS
  if entry.expires_at < now():
    lru_index.remove(key)
    return MISS
  lru_index.move_to_head(entry)
  entry.access_count++
  return HIT(entry.value, entry.headers)

function set(key, value, headers, ttl):
  if size(value) > MAX_OBJECT_SIZE:
    return  -- skip oversized objects
  while current_bytes + size(value) > capacity_bytes:
    evict_lru()
  entry = CacheEntry{key, value, headers, now(), now()+ttl, ...}
  lru_index.insert_at_head(entry)

function evict_lru():
  victim = lru_index.tail
  lru_index.remove(victim.key)
  current_bytes -= victim.size

function invalidate(pattern):
  for key in lru_index.keys():
    if matches(key, pattern):
      lru_index.remove(key)

Cache key construction is critical. A typical key combines: scheme + host + path + sorted_query_params + vary_headers. The Vary HTTP header instructs the cache to store separate entries for different Accept-Encoding or Accept-Language values.

Failure Handling and Performance

Thundering herd protection: Use a mutex or promise-based lock per cache key so that only one goroutine/thread fetches from the upstream when a key expires. All others wait and share the result.
Stale-while-revalidate: Serve the stale cached entry immediately while triggering an asynchronous background refresh. This eliminates latency spikes on expiry.
Negative caching: Cache 404 and 503 responses for a short TTL to prevent repeated upstream calls for non-existent resources under adversarial or bursty traffic.
Memory pressure: Monitor cache byte usage continuously. When memory exceeds a soft limit, begin proactive LRU eviction rather than waiting for hard limits to trigger OOM kills.

Scalability Considerations

Consistent hashing: When multiple edge cache nodes form a cluster, consistent hashing routes requests for the same key to the same node, maximizing hit rates and minimizing cross-node traffic.
Hot key detection: Monitor per-key access frequency. Extremely hot keys can be replicated to all nodes (broadcast invalidation required) or handled via local micro-caches at the application layer.
Compression: Compress cache values with LZ4 or Zstandard to increase effective cache capacity. LZ4 is preferred at the edge due to its low decompression latency.
Write-through vs write-behind: For caches that also back a datastore, write-through updates the cache and store synchronously. Write-behind batches store updates asynchronously for higher write throughput at the cost of potential data loss on crash.

Summary

An Edge Cache Service is a near-user caching layer that uses LRU eviction, TTL-based expiry, and careful cache key design to deliver low-latency responses at scale. The most important design choices are the eviction policy, thundering herd protection, and invalidation strategy. In system design interviews, be prepared to discuss how you handle cache stampedes, how you size memory capacity, and how you coordinate invalidation across a distributed fleet of edge nodes.

{
“@context”: “https://schema.org”,
“@type”: “FAQPage”,
“mainEntity”: [
{
“@type”: “Question”,
“name”: “What is an edge cache and how does it differ from a CDN?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “An edge cache is a caching layer positioned at the network edge, close to end users, that stores copies of content to reduce round-trip time and origin load. While a CDN is a full commercial service with global PoPs, routing, and management planes, an edge cache is the specific caching component within that infrastructure. Companies like Meta and Google operate their own proprietary edge caches tuned to their traffic patterns rather than using third-party CDNs.”
}
},
{
“@type”: “Question”,
“name”: “What eviction policies are used in edge cache design?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Common eviction policies include LRU (Least Recently Used), LFU (Least Frequently Used), and SLRU (Segmented LRU). At the edge, LRU variants dominate because access patterns are bursty and recency is a strong predictor of future demand. Some implementations use TinyLFU or W-TinyLFU (as in Caffeine) to better handle scan-resistant and frequency-skewed workloads.”
}
},
{
“@type”: “Question”,
“name”: “How do you handle cache stampedes in an edge cache?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Cache stampedes occur when a popular cached item expires and many requests simultaneously reach the origin. Mitigations include: request coalescing (collapsing concurrent cache-miss requests into a single origin fetch), probabilistic early expiry (recomputing the cache entry slightly before it expires), mutex-based locking per cache key, and serving stale content while revalidating asynchronously in the background.”
}
},
{
“@type”: “Question”,
“name”: “How is consistency maintained between edge cache nodes?”,
“acceptedAnswer”: {
“@type”: “Answer”,
“text”: “Edge caches are typically designed for eventual consistency rather than strong consistency, since strict coordination across geographically distributed nodes is prohibitively expensive. Techniques include TTL-based expiry, push-based invalidation via a pub/sub control plane, and versioned cache keys tied to content hashes. For write-heavy or highly personalized content, requests are often routed past the cache to origin directly.”
}
}
]
}