Garbage collection determines when and how unused memory is reclaimed. Understanding GC internals helps you debug memory leaks, tune application performance, and answer system design questions about latency-sensitive services. This guide covers GC algorithms, JVM garbage collectors, GC tuning strategies, and memory leak detection — essential for senior backend engineering and performance engineering interviews.
Why Garbage Collection Matters for System Design
GC pauses directly impact application latency. A GC pause of 200ms in a service with a 100ms P99 latency target causes a latency spike that violates the SLO. In distributed systems, GC pauses can trigger cascading failures: a service pauses for GC, misses heartbeats, is marked as unhealthy by the load balancer, traffic shifts to remaining instances, overloading them. In consensus systems (Raft, ZooKeeper), a GC pause on the leader can cause a leader election, disrupting the entire cluster. Understanding GC behavior is critical for: choosing the right GC collector for your workload, sizing heap memory appropriately, diagnosing latency spikes, and designing systems that tolerate GC pauses. Languages with GC: Java (JVM), Go, Python, C#, JavaScript (V8). Languages without GC: C, C++ (manual memory management), Rust (ownership system, no runtime GC).
Mark-and-Sweep Algorithm
The foundational GC algorithm. Two phases: (1) Mark — starting from root references (stack variables, static fields, CPU registers), traverse all reachable objects and mark them as alive. Any object not reachable from a root is garbage. (2) Sweep — scan the entire heap and free memory occupied by unmarked (dead) objects. The mark phase uses graph traversal (DFS or BFS) to follow references from live objects to other live objects. Time: O(L) where L is the number of live objects (not total objects — dead objects are not traversed). The sweep phase is O(H) where H is the heap size (must scan every allocated block). Problem: stop-the-world — the application must be paused during marking to prevent reference graph changes. If an object is moved or a reference is updated during marking, the GC may incorrectly classify a live object as dead. This pause scales with heap size and number of live objects. A 10GB heap with millions of live objects can pause for seconds.
Generational Garbage Collection
The generational hypothesis: most objects die young. A web request creates temporary objects (request parsing, response building) that become garbage when the request completes. Long-lived objects (caches, connection pools) are a minority. Generational GC exploits this by dividing the heap into generations: Young Generation — newly allocated objects. Collected frequently (minor GC). Most objects die here, so minor GC is fast (small heap region, mostly garbage). Old Generation (Tenured) — objects that survived multiple minor GCs. Collected infrequently (major GC). Major GC is slower (larger region, more live objects). JVM default heap layout: Eden space (where new objects are allocated) + two Survivor spaces (objects that survive one minor GC are copied here) = Young Generation. After surviving N minor GCs (default N=15), objects are promoted to the Old Generation. Minor GC (young generation) typically takes 5-50ms. Major GC (old generation) can take 100ms-several seconds. Tuning strategy: size the young generation large enough that most short-lived objects die before promotion. This reduces pressure on the old generation and minimizes major GC frequency.
JVM Garbage Collectors
The JVM offers multiple collectors for different workloads: (1) G1 GC (Garbage-First) — the default since Java 9. Divides the heap into equal-sized regions (1-32MB). Collects regions with the most garbage first (hence “garbage-first”). Targets a configurable pause time (default 200ms). Mixes young and old generation collection incrementally. Best for: general-purpose applications with moderate heap sizes (4-32GB). (2) ZGC (Z Garbage Collector) — designed for ultra-low latency. Pause times under 1ms regardless of heap size (tested up to 16TB). Uses colored pointers and load barriers for concurrent marking and compaction. Best for: latency-sensitive applications (real-time bidding, trading systems, interactive services). Available since Java 15. (3) Shenandoah — similar goals to ZGC (sub-millisecond pauses). Uses Brooks pointers for concurrent compaction. Available in OpenJDK. (4) Parallel GC — maximizes throughput by using multiple threads for GC. Higher pause times but better total throughput. Best for: batch processing where throughput matters more than latency. (5) Serial GC — single-threaded. Best for: small heaps (<100MB) and single-core machines.
GC Tuning Strategies
Tuning goals: minimize GC pause frequency and duration while maintaining throughput. Key parameters (G1 GC): (1) Heap size (-Xms, -Xmx) — set min and max equal to avoid resizing overhead. Size: too small causes frequent GC, too large causes long pauses. Start with 2-4x the live data set size. (2) Pause time target (-XX:MaxGCPauseMillis=200) — G1 adjusts its behavior to meet this target. Lower values mean more frequent, shorter collections. (3) Region size (-XX:G1HeapRegionSize) — affects allocation and collection granularity. Auto-calculated based on heap size. (4) Young generation size — G1 auto-tunes this to meet the pause target. Override with -XX:NewRatio if needed. Monitoring: enable GC logging (-Xlog:gc*:file=gc.log). Analyze with GCViewer or GCEasy. Key metrics: GC pause duration (P99), GC frequency, allocation rate, promotion rate, and heap occupancy after GC. Red flags: frequent full GCs (heap too small or memory leak), increasing heap occupancy after GC (memory leak — objects accumulate but are never freed), and long pauses (heap too large, wrong collector choice).
Memory Leaks in Garbage-Collected Languages
GC does not prevent memory leaks — it prevents dangling pointers. A memory leak in a GC language occurs when objects are kept reachable but never used. Common causes: (1) Growing collections — a cache map that adds entries but never removes them. Over time, the map consumes the entire heap. Fix: use a bounded cache (LRU eviction) or weak references. (2) Listener/callback registration — registering an event listener without unregistering it when the object is no longer needed. The event system holds a reference, preventing GC. (3) Static fields — static collections that accumulate data over the application lifetime. (4) ThreadLocal variables — values stored in ThreadLocal are not GC-eligible until the thread ends. In thread pools (web servers), threads live forever, so ThreadLocal values accumulate. Fix: call ThreadLocal.remove() after use. (5) Connection leaks — database connections or HTTP connections that are opened but not closed. The connection object remains in the pool, and the pool grows. Detection: take a heap dump (jmap -dump:live,format=b,file=heap.hprof) and analyze with Eclipse MAT or VisualVM. Look for dominator trees: large objects that retain many other objects. Track heap usage over time — a steadily increasing trend after GC indicates a leak.
Go and Python GC Compared
Go GC: a concurrent, tri-color mark-and-sweep collector with no generational separation. Go trades throughput for low latency: GC pauses are typically under 1ms because most work is done concurrently while the application runs. The GC triggers based on the GOGC environment variable (default 100): GC runs when the heap grows to 2x the live data size. Go has no tuning knobs beyond GOGC — simplicity over configurability. Since Go 1.19, soft memory limits (GOMEMLIMIT) provide better control. Python GC: uses reference counting as the primary mechanism (each object has a counter tracking references to it; when the count drops to zero, the object is freed immediately). A cyclic garbage collector handles reference cycles (two objects referencing each other with no external references). The cyclic GC runs periodically based on allocation thresholds. Python GC pauses are generally short but unpredictable. The GIL (Global Interpreter Lock) means only one thread executes Python code at a time, limiting true parallelism but simplifying GC. For CPU-bound workloads, Python applications use multiprocessing (separate processes with separate heaps and GCs) to achieve parallelism.