Low Level Design: Memory Allocator Design

⏱ 3 min read

A memory allocator manages heap memory — fulfilling malloc/new requests and returning freed memory for reuse. The C standard library allocator (ptmalloc), jemalloc (used by Firefox, Redis, Rust), tcmalloc (used by Go, Chrome), and mimalloc implement different strategies for trading off fragmentation, throughput, cache efficiency, and multi-threaded performance. Understanding allocator design is valuable for systems programming interviews and for explaining why language runtimes and databases implement custom allocators.

Free List and Segregated Size Classes

A naive allocator maintains a single linked list of free memory blocks. malloc() searches for a block large enough (first-fit or best-fit), splits it if larger, and returns a pointer. free() prepends the block to the free list. Problems: O(n) search time; fragmentation (many small free blocks unable to satisfy a large allocation). Modern allocators use size classes: memory is organized in fixed-size buckets (8, 16, 32, 64, 128, 256 bytes…). A malloc(20) request is rounded up to the next size class (32 bytes) and served from that class’s free list in O(1). Size classes reduce fragmentation (internal fragmentation trades external fragmentation) and enable O(1) allocation and deallocation for small objects.

// Simplified size-class allocator
struct SizeClass {
    size_t size;
    FreeBlock* free_list;  // singly-linked list of free blocks
};

// jemalloc-style size classes (bytes):
// Small: 8, 16, 32, 48, 64, 80, 96, 112, 128, 192, 256, 320, 384, 448, 512
// Medium: 1KB, 2KB, 4KB, 8KB
// Large: served directly from OS via mmap

void* malloc(size_t size) {
    int sc = size_class_index(size);  // round up to nearest class
    if (size_classes[sc].free_list != NULL) {
        // O(1): pop from free list
        FreeBlock* block = size_classes[sc].free_list;
        size_classes[sc].free_list = block->next;
        return (void*)block;
    }
    // Refill: request new memory slab from OS, partition into blocks
    return refill_and_alloc(sc);
}

Thread-Local Caches (TCMalloc/jemalloc)

Multi-threaded allocation with a shared free list requires locking — a bottleneck for high-throughput servers. TCMalloc (Thread-Caching Malloc) gives each thread its own free list cache (thread-local storage, no locks). Most small allocations are served from the thread-local cache without any synchronization. When the thread-local cache empties, it refills from a central cache (with locking). When the thread-local cache is too full, it returns blocks to the central cache. This design achieves near-zero contention for common allocation patterns. jemalloc uses per-CPU arenas instead of per-thread, which scales better when threads outnumber CPUs.

Memory Fragmentation

Internal fragmentation: allocated block is larger than requested (malloc(20) allocates a 32-byte size class block — 12 bytes wasted). External fragmentation: free memory exists but is split into non-contiguous chunks too small to satisfy large requests. Heap compaction (moving live objects to eliminate gaps) solves external fragmentation but requires updating all pointers — practical in garbage-collected languages (JVM, Go) but impossible in C/C++ with raw pointers. Redis uses jemalloc and periodically calls MEMORY PURGE to return unused pages to the OS. Postgres uses its own memory context allocator (palloc) that allocates from slabs per query — all freed at query end, avoiding per-object free() overhead entirely.

Arena Allocators and Region-Based Memory

An arena allocator allocates memory linearly from a large pre-allocated block (bump pointer allocation). Allocation is O(1) (just increment a pointer). Deallocation of individual objects is a no-op — the entire arena is freed at once when all objects are no longer needed. Perfect for request-scoped allocations: allocate all memory for processing one HTTP request into an arena; free the entire arena when the response is sent. No per-object free() calls, no fragmentation, no synchronization overhead. Databases (PostgreSQL MemoryContext), compilers (LLVM BumpPtrAllocator), and web servers use arena allocators for request handling. Drawback: cannot free individual objects — must free all or nothing.

Key Interview Discussion Points

sbrk vs mmap: small allocations use sbrk() to grow the heap contiguously; large allocations (typically >128KB) use mmap() for anonymous memory — mmap pages can be returned to the OS independently
Memory pools: pre-allocate a fixed number of same-size objects and manage them with a free list — eliminates fragmentation and provides O(1) allocation for known object types (network packet buffers, database row objects)
False sharing: two threads allocating adjacent memory on the same CPU cache line cause cache invalidation when either modifies their object — allocators pad allocations to cache line size (64 bytes) to prevent this
Garbage collector pressure: in GC languages, allocation rate directly drives GC frequency; allocating in hot paths triggers more GC pauses — object pooling (sync.Pool in Go, ObjectPool in Java) reduces allocation rate
ASAN/valgrind: AddressSanitizer instruments every memory access to detect use-after-free, buffer overflows, and memory leaks — replaces the allocator with a shadow memory version that tracks allocation metadata