Garbage collection is the automatic memory management mechanism used by runtimes like the JVM, .NET CLR, and Go. Understanding its internals is essential for writing low-latency services, diagnosing pauses, and tuning production systems. This post digs into the algorithms, trade-offs, and modern GC implementations.
Reachability and GC Roots
A garbage collector does not track memory deallocations — it tracks reachability. An object is live if it can be reached by following references from a set of GC roots. GC roots are the starting points of the object graph and include: local variables and method parameters on the JVM thread stacks, static fields of loaded classes, active JNI (Java Native Interface) references held by native code, and references in monitor/synchronization objects.
Any object not reachable from any GC root is, by definition, garbage and eligible for collection. The collector never needs to ask "who allocated this?" — it only asks "can anything still reach this?"
Mark-and-Sweep
The classic algorithm has two phases. In the mark phase, the collector traverses the object graph starting from GC roots and marks every reachable object. In the sweep phase, it scans the entire heap and reclaims memory occupied by unmarked (unreachable) objects.
The core problem with naive mark-and-sweep is heap fragmentation. After repeated collection cycles, the free memory consists of many small non-contiguous chunks. Allocation of large objects may fail even when enough total free bytes exist, and the allocator must maintain complex free-lists. This is why mark-and-sweep alone is rarely used in production runtimes.
Mark-Compact
Mark-compact extends mark-and-sweep by adding a compaction phase: after marking, surviving objects are slid toward one end of the heap, eliminating gaps. All pointers to moved objects must be updated. The result is a dense heap with a single contiguous free region, enabling fast bump-pointer allocation.
The cost is high: compaction requires multiple passes over the heap (to compute new addresses, update references, and copy data) and is inherently a stop-the-world operation. It is used in Java’s Serial and Parallel GC for old-generation collections, and is acceptable when throughput matters more than latency.
Copying GC (Semi-Space)
Copying collectors divide the heap into two equal halves: from-space and to-space. At collection time, live objects are copied from from-space into to-space. When copying is complete, from-space is entirely free — no sweep needed, no fragmentation. The roles of the two halves flip for the next cycle.
Allocation is trivially fast: a bump pointer advances through to-space. The major drawback is a 2x memory overhead — half the heap is always empty. This trade-off is acceptable for short-lived objects (young generation), which is exactly where copying GC is universally deployed.
Generational Hypothesis and Generational GC
The generational hypothesis is the empirical observation that most objects die young. Studies across many workloads show that the vast majority of allocated objects become unreachable within milliseconds or within a single function call. This insight drives the design of all modern production GCs.
A generational GC divides the heap into a young generation (new allocations, collected frequently via copying GC — called a minor GC) and an old generation (objects that survived several minor GCs, collected less frequently — called a major or full GC). Objects are promoted from young to old after surviving a configurable number of collections. Minor GCs are fast because they only scan the young generation; the old generation is treated as a root source via remembered sets tracking cross-generational references.
Tri-Color Marking
Tri-color marking is the foundation for concurrent and incremental GC algorithms. Objects are assigned one of three colors:
- White: not yet visited — either garbage or not yet reached.
- Gray: discovered (reachable from roots) but not yet fully scanned — its references have not been followed.
- Black: fully scanned — all references from this object have been followed.
The collector starts by graying all GC roots, then repeatedly picks a gray object, marks it black, and grays any white objects it references. When the gray set is empty, all white objects are unreachable and can be reclaimed.
The challenge with concurrent marking is that the application (mutator) can modify the object graph while the GC is running. A mutator could create a new reference from a black object to a white object and then delete the only gray-to-white reference — making the white object look like garbage even though it’s reachable. Write barriers prevent this: on every reference store, the runtime either re-grays the source object (Dijkstra barrier) or records the overwritten reference (Yuasa barrier), maintaining the tri-color invariant.
Stop-the-World vs. Concurrent GC
A stop-the-world (STW) pause freezes all application threads while the GC runs. This simplifies the algorithm — the object graph cannot change during collection — but causes visible latency spikes. The Serial and Parallel GCs in the JVM are fully STW.
Concurrent marking (used in CMS and G1) runs the mark phase alongside the application. This dramatically reduces pause times but requires write barriers and a final STW remark phase to handle mutations that occurred during concurrent marking. Incremental GC breaks work into small slices interleaved with application execution, bounding individual pauses.
G1GC
G1 (Garbage First), the default JVM GC since Java 9, divides the heap into many equal-sized regions (typically 1–32 MB each). Regions are dynamically assigned as young, old, or humongous (for large objects). This regionalized layout allows G1 to collect the regions with the most garbage first (hence the name), maximizing reclaimed bytes per pause.
G1 performs concurrent marking to build a live-data map, then schedules mixed collections that collect all young regions plus a selection of the most profitable old regions. The key feature is a configurable pause time goal (-XX:MaxGCPauseMillis, default 200ms) — G1 uses pause prediction models to select how many regions to collect within the target. It does not guarantee the goal but attempts to meet it.
ZGC and Shenandoah
Both ZGC (Java 15+, production) and Shenandoah target sub-millisecond pauses regardless of heap size, achieving this via concurrent relocation — objects are moved while the application is running.
ZGC uses colored pointers: metadata bits embedded in the 64-bit pointer itself encode the GC state of the referenced object. Load barriers — code injected at every object reference load — check these bits and perform any needed fixup (forwarding to the new address) on the fly. ZGC’s STW pauses are limited to root scanning and are typically under 1ms even on terabyte heaps.
Shenandoah achieves concurrent relocation differently: it installs a Brooks pointer — an extra forwarding word prepended to every object. During relocation, the Brooks pointer is updated to point to the new copy. Load barriers check this indirection. Both approaches pay a small per-access overhead to eliminate long pauses.
GC Tuning Parameters and Log Analysis
Key JVM GC tuning levers include:
-Xms/-Xmx: initial and maximum heap size. Setting them equal avoids resize pauses.-XX:NewRatio: ratio of old to young generation size (default 2 means young = 1/3 of heap).-XX:SurvivorRatio: ratio of eden to each survivor space within young gen.-XX:MaxGCPauseMillis: pause time goal for G1.-XX:GCTimeRatio: target ratio of GC time to application time.-XX:+UseZGC/-XX:+UseShenandoahGC: select the collector.
Enable GC logging with -Xlog:gc*:file=gc.log:time,uptime,level,tags. Key signals to watch: pause duration, allocation rate, promotion rate (young-to-old), and humongous allocation frequency. A rising old-gen after each cycle indicates a memory leak or insufficient heap. Frequent full GCs signal that minor GC promotion is overwhelming the old gen — tune survivor spaces or increase heap.
See also: Meta Interview Guide 2026: Facebook, Instagram, WhatsApp Engineering
See also: Scale AI Interview Guide 2026: Data Infrastructure, RLHF Pipelines, and ML Engineering
See also: Anthropic Interview Guide 2026: Process, Questions, and AI Safety