A Node.js service I once helped debug had a steady, slowly rising memory profile in production. Every fifteen minutes, residency would spike, drop, and resume the climb. The team had read enough about V8 garbage collection to be dangerous: someone had added --expose-gc and a manual global.gc() call on a timer, hoping that "running GC more often" would help. It did not. The forced collections cost more than they saved, and the underlying leak (a Map that never deleted its keys) kept the application's working set growing regardless. The fix was a one-line code change. The detour through manual GC had cost two days.
The opinion I want to defend across this piece is that V8's garbage collector is engineered to be invisible most of the time, and the engineering effort that goes into making it invisible is fascinating to read about, but the right relationship between application code and the collector is "leave it alone, write code that does not produce avoidable garbage, and reach for diagnostics only when the heap profile says you have to." Mystifying the GC is what leads to bad fixes; learning the model lets you skip them.
This is not an exhaustive walkthrough of every V8 internal. It is the working engineer's tour: enough mental model to understand what GC is doing during a heap snapshot, what the names in a flame graph mean, and which categories of mistake make the GC's job harder than it needs to be.
The generational hypothesis is the spine
Every modern garbage collector relies on a single empirical observation: most objects die young. In a typical program, the vast majority of allocations become garbage shortly after they are created (function-local data, short-lived intermediate values, request-scoped objects), while a small minority survive long enough to matter (the application's long-lived state, framework instances, caches). This is the generational hypothesis, and V8's collector is built around exploiting it.
The exploit is simple: split the heap into two regions, called the young generation (or "new space") and the old generation (or "old space"). New allocations go into the young generation. The young generation is collected frequently and quickly, because most of what it contains is already garbage by the time the collector arrives. The few survivors are promoted into the old generation, which is collected less often, with a more thorough algorithm.
The factor-of-ten in collection frequency between the two generations is the productivity of the model. Most allocations never escape the young generation, which means the cheap, fast collector handles them; only the survivors end up in the more expensive collector's territory.
The Scavenger: cheap collection of the young generation
The young generation in V8 is split into two equal halves, called semi-spaces: "from-space" and "to-space". At any moment, allocations happen only in from-space. The to-space sits empty, waiting.
When from-space fills up, V8 runs a scavenge. The collector walks every reachable object in from-space and copies the live ones into to-space, packing them adjacently. Dead objects are not moved. After the walk, the roles flip: the freshly-packed to-space becomes the new from-space, and the now-empty from-space becomes the new to-space.
This is the Cheney algorithm, a copying garbage collector. The runtime cost is proportional to the live data, not the total allocations. If 95 percent of the young generation is garbage, the scavenger touches only the 5 percent that survives. The dead objects cost nothing to collect; they are simply abandoned in from-space, which gets reused as scratch on the next cycle.
The cost of the cheap collection is paid in two ways. First, the young generation can hold at most half the physical memory it occupies, because the other half must always be empty as the destination. Second, every survivor pays the cost of being copied. An object that survives many scavenges in a row gets copied many times, which is wasted work. The fix for the second cost is promotion: if an object survives two scavenges, V8 promotes it to the old generation instead of copying it back into to-space. The hypothesis is that twice-survived objects are likely long-lived, and promoting them avoids the per-cycle copy cost.
Mark-Sweep-Compact: the heavier collection of the old generation
The old generation cannot use the copying algorithm. The space is too large, the live ratio is too high, and the cost of copying everything would dominate. Instead, V8 runs a mark-sweep-compact collector.
Mark. Walk the object graph from the GC roots (the call stack, global objects, anything pinned by the runtime). Mark every reachable object as live. Anything not marked is garbage.
Sweep. Walk the heap. For every unmarked region, add it to a free list so future allocations can reuse the space.
Compact. Move live objects together to defragment the heap. Without compaction, repeated sweeps leave the heap in a Swiss-cheese pattern, which slows allocation and limits how big a contiguous object can be allocated. Compaction is expensive (every moved object has its references updated everywhere they appear), so V8 does it selectively, on pages that have become heavily fragmented.
A full mark-sweep-compact pause used to be the source of multi-hundred-millisecond hiccups in early V8. Two engineering pushes have made it usable for production: incremental marking and concurrent collection.
Incremental marking: pay the mark cost in small slices
The mark phase can be the longest part of a major GC, because it walks every reachable object. Doing the whole walk in one stop-the-world pause is a problem for latency-sensitive applications (web pages must paint at 60Hz, servers must respond to requests promptly).
Incremental marking breaks the walk into many small steps interleaved with application execution. Each step processes a few thousand objects, then yields back to the JS code. The collector keeps track of what has been marked so far on a per-page basis. Small write barriers in the application code (added by the engine, transparent to user code) inform the collector when a previously-walked object is mutated to point to a previously-unwalked one, so the collector can re-mark it.
The trade-off is that the total mark time is somewhat higher than a single stop-the-world pass (because of the bookkeeping), but the longest individual pause is much shorter (often single-digit milliseconds instead of hundreds). For interactive applications, "shorter individual pause" is what matters; total throughput is rarely the bottleneck.
Orinoco: concurrent collection on a worker thread
The latest generation of V8's collector, codenamed Orinoco, runs significant portions of the GC on a separate thread, in parallel with the application. Marking is concurrent: a helper thread walks the heap while the JS main thread runs application code. Sweeping and compacting are also offloaded to helpers when possible.
The architectural shift Orinoco represents is moving from "stop the world for a small slice" to "barely stop the world at all, do the work in the background". The main thread still has brief pauses for the parts that cannot be safely done concurrently (the final mark of any objects that the application touched during the concurrent walk, the so-called remark phase), but the bulk of the work happens off the critical path.
The user-visible result for application engineers: in modern Node and Chrome, you should not see GC pauses dominating your performance traces unless you have a working set in the hundreds of megabytes or are allocating very aggressively. If you are seeing them, the diagnosis is usually "your application is doing something that the collector cannot avoid", not "the collector is slow".
What "the collector cannot avoid" looks like in real code
Five patterns I have profiled to bad memory behavior, in roughly decreasing order of frequency.
Caches without bounds. A Map that accumulates entries on every request and never evicts. The map keeps every value alive, which prevents the old generation from being collected, which means the old generation keeps growing. The fix is an LRU or TTL eviction strategy, or in the right shape of problem, a WeakMap that does not root its keys.
Closures over large captured environments. A function defined inside a hot loop that closes over a large array, then registered as an event listener. The listener pins the array for as long as it stays subscribed. The fix is to pull the closure out of the loop and reference the array narrowly, or to unsubscribe the listener explicitly when the work is done.
Detached DOM nodes. A node removed from the document but still referenced in JavaScript (kept in an array, captured by a closure). The browser cannot collect it because JS still holds a reference; JS rarely notices because the node is no longer visually present. The fix is the same as listeners: drop the JS reference when the node is no longer needed.
Mid-life-promotion thrash. Objects allocated in the young generation that survive long enough to promote, then become garbage shortly after. They paid the promotion cost (the copy into old space) for nothing, and now sit in old space waiting for the next major GC. The fix when this pattern dominates is hard to apply at the user level (V8's tuning is mostly opaque), but reducing the rate of allocation in the hot path (object pooling, reuse of buffers, avoiding spurious string concatenations) makes the issue smaller.
Polymorphic call sites that pollute hidden classes. Not strictly a GC issue, but related: V8's optimizer assumes objects of the "same shape" share a hidden class. Adding properties to objects in different orders, or removing properties dynamically, fragments the hidden-class graph and slows the optimizer. The optimizer's deopt cost is paid in extra allocations and extra type-check overhead, which feeds back into more GC work.
What application code can actually do
The single biggest lever is allocate less. Every allocation eventually needs to be collected; reducing the rate reduces GC work proportionally. The patterns I lean on:
Reuse objects in tight loops. If a loop creates { x, y } on every iteration, hoist the object out and reassign its fields. The hot-path allocations vanish.
Prefer array methods that mutate in place when ownership permits. arr.push and arr[i] = x cost less than [...arr, x], and they do not create the intermediate array that the spread does.
For string-heavy work, build with arrays and join rather than +=. The latter creates a temporary on every append; the former allocates once.
For binary data, use TypedArray and Buffer (Node), which are flat memory regions, not chains of small heap objects. Their GC profile is fundamentally different from a Number[].
When you legitimately need long-lived state, give it a clear owner and a clear lifetime. Singletons that grow without bound, listeners that never unsubscribe, and modules that accumulate caches are all "long-lived" in a way the collector cannot help with.
When in doubt, measure. The Chrome DevTools Memory panel and Node's --inspect heap snapshots are the same kind of tool; both let you take a snapshot, force a GC, take another snapshot, and diff them. If the diff shows objects that should have been collected but were not, the snapshot tells you who is keeping them alive. The diagnostic loop is empirical, not theoretical: do not guess; profile.
Tuning flags and when to ignore them
V8 exposes flags that tune GC behavior: --max-old-space-size, --max-semi-space-size, --gc-interval, --expose-gc (for global.gc()), and many others. Most teams should ignore most of them most of the time. The defaults are tuned by people whose entire job is tuning them, against benchmarks that approximate real workloads better than ad-hoc fiddling does.
The two flags I have actually used in production:
--max-old-space-size, set to the container's memory limit minus a buffer, when running Node in a memory-bounded container. Without this, V8 defaults to a heap size based on its own heuristics, which can be smaller than the container allows. Setting it explicitly avoids "process killed by OOM at half the memory limit" surprises.
--inspect, for taking heap snapshots when the heap profile in production looks wrong. This is not a tuning flag, but it is the right tool for the diagnosis the tuning flags are unlikely to help with anyway.
The flags I avoid: anything claiming to "make the GC faster" without a corresponding benchmark, anything that disables incremental or concurrent collection, and --expose-gc plus manual global.gc() calls. The opening anecdote was the manual-GC trap; the lesson stuck.
Hidden classes and inline caches: the optimizer's view of objects
The collector and the optimizer interact more than people realise. V8 assigns each JS object a hidden class that describes its property layout. Objects created with the same property names in the same order share a hidden class, which lets the optimizer cache property-access offsets and produce specialized code for that shape. Objects with mismatched shapes trigger deoptimisation, which means more dynamic dispatch, more allocations, and more GC pressure.
Two practical rules emerge. First, define all object properties in the constructor, not lazily. An object whose properties are added in different orders across different code paths fragments its hidden class. Second, avoid delete. Deleting a property transitions the object into a slow "dictionary mode" that disables several optimizations, including hidden-class sharing.
These are micro-optimisations that mostly do not matter, until they do. The right time to think about them is when a hot path's profile shows time inside V8 internals (deopt, ICs, type feedback) rather than inside your own JS frames.
Escape analysis and stack allocation
V8's optimizer occasionally proves, through escape analysis, that an allocated object never leaves the function it was created in. If the object's lifetime is bounded by the function call, the optimizer can elide the heap allocation entirely, keeping the fields on the stack instead. This is "stack allocation" of objects, and it makes the per-loop-iteration allocation overhead vanish for the cases the analyzer can prove.
Engineers cannot trigger escape analysis directly, but the patterns that help it succeed are: small, short-lived objects, defined in one place, never returned from the function, never assigned to a longer-lived structure, never captured by a closure that escapes. These are the same patterns that produce "easy to read, easy to reason about" code, which is a satisfying coincidence.
What I would tell my team about V8 GC
The condensed version, in five bullets:
- The collector is engineered to be invisible. Most of the time, it is. Trust the defaults.
- The young generation is collected with a copying scavenger; the old generation with mark-sweep-compact. The split is what makes the system fast.
- Long-lived objects need clear ownership. Caches without eviction, listeners without unsubscribe, and globals that accumulate are the realistic causes of memory bugs in production.
- Allocation rate matters more than total residency for steady-state performance. Reduce allocations in hot paths first.
- When you suspect a leak, take heap snapshots. The diagnostic is empirical. The tuning flags are usually not the answer.
GC literature can read like a tour through architecture porn (semi-spaces, write barriers, remembered sets, parallel marking), and the architecture is genuinely interesting if you enjoy that kind of thing. None of it changes the application engineer's job, which is to write code that does not produce avoidable work for the collector. The architecture exists so we do not have to think about it; the failure mode is when application code prevents the collector from doing what it was designed to do.
The opening anecdote's manual-GC fix was a category error: the team thought of the collector as a service they could nudge into running more often. It is closer to a self-tuning system that works against a workload your application defines. Change the workload and the collector adapts. Try to control the collector directly and you have skipped the part that actually matters.
