The Hidden Cost of Garbage Collection

The Promise and the Trade

Garbage collection was supposed to be the end of memory bugs. No more free() on a dangling pointer. No more use-after-free. No more double-free crashes at 3 AM. The runtime would handle it, and you’d never think about memory again.

That promise was half-kept. GC did eliminate specific classes of memory corruption bugs. But it replaced them with something else: unpredictable pauses, invisible memory leaks, allocation pressure that degrades throughput under load, and a false sense of security that makes you sloppy about object lifetimes.

You traded segfaults for latency spikes. Whether that’s a good trade depends on whether you understand what you gave up.

How Garbage Collectors Actually Work

Every garbage collector answers the same question: which objects are still reachable, and which can be reclaimed? The algorithms differ in how they answer it.

Reference Counting

The simplest approach. Every object carries a counter. When something points to the object, increment the counter. When a reference goes away, decrement it. When the counter hits zero, free the memory immediately.

import sys

a = []          # refcount: 1 (a points to it)
b = a           # refcount: 2 (a and b point to it)
print(sys.getrefcount(a))  # 3 — includes the temporary ref from getrefcount's argument

del b           # refcount: 1
del a           # refcount: 0 → freed immediately

CPython uses reference counting as its primary GC mechanism. The advantage is deterministic: objects are freed the instant they become unreachable. No pauses. No batching.

The fatal flaw: cycles. If object A references object B and object B references object A, both reference counts are at least 1 forever, even if nothing else in the program can reach either object.

class Node:
    def __init__(self):
        self.ref = None

a = Node()
b = Node()
a.ref = b   # a→b
b.ref = a   # b→a

del a        # a's refcount: 1 (b.ref still points to it)
del b        # b's refcount: 1 (a.ref still points to it)
# Both objects are unreachable but not freed. Leaked.

This is why CPython has a second garbage collector on top of reference counting — a cyclic GC that periodically scans for unreachable reference cycles.

Mark-and-Sweep

The classic tracing algorithm. Start from a set of roots — global variables, stack variables, CPU registers. Follow every reference. Mark every object you can reach. Then sweep through all allocated objects and free anything that isn’t marked.

The problem is the pause. While the collector is tracing references, the program’s object graph can’t be mutating, or the collector will miss references or free live objects. A naive mark-and-sweep stops the world — all application threads freeze while GC runs.

Generational Collection

An empirical observation called the generational hypothesis: most objects die young. A temporary string built for a log message, a loop variable, a short-lived HTTP request object — these are allocated and become garbage within milliseconds.

Generational collectors exploit this by dividing objects into generations:

Young generation (Gen 0): Newly allocated objects. Collected frequently and quickly, because most are already dead.
Old generation (Gen 1, Gen 2): Objects that survived multiple young collections. Collected rarely, because they’re likely long-lived.

This is a throughput optimization: instead of scanning all objects every time, you scan the small young generation frequently and the large old generation infrequently.

Java’s G1GC, Go’s collector, .NET’s GC, and Python’s cyclic collector all use generational strategies.

Reading Java’s G1GC Logs

Java’s G1 (Garbage-First) collector divides the heap into regions and collects the regions with the most garbage first. When things go wrong — and they will — the GC logs tell you exactly what happened.

Enable detailed GC logging:

java -Xlog:gc*:file=gc.log:time,uptime,level,tags -jar myapp.jar

Here’s what a young generation pause looks like:

[2026-02-27T10:15:32.451+0000][12.442s][info][gc] GC(14) Pause Young
    (Normal) (G1 Evacuation Pause) 512M->128M(2048M) 8.234ms

Translation: GC event #14, a young generation pause, reduced heap usage from 512 MB to 128 MB (heap size 2048 MB), and it took 8.234 milliseconds. Eight milliseconds where your application threads did nothing.

Here’s a mixed collection — when G1 also collects some old generation regions:

[2026-02-27T10:15:45.891+0000][25.882s][info][gc] GC(31) Pause Young
    (Mixed) (G1 Evacuation Pause) 1024M->384M(2048M) 42.891ms

Forty-three milliseconds. If your API latency target is 50 ms, a single GC pause just consumed 86% of your budget.

And here’s the event you never want to see:

[2026-02-27T10:16:01.234+0000][41.225s][info][gc] GC(45) Pause Full
    (G1 Evacuation Failure) 1920M->890M(2048M) 1842.567ms

A full GC. 1.8 seconds. Every application thread was frozen. Your users got timeouts. Your load balancer marked the instance unhealthy. This happens when the young generation can’t evacuate objects fast enough — there isn’t enough free space in the old generation to promote survivors.

The key metrics to watch:

Metric	What it means	Danger sign
Pause time	Duration of stop-the-world pause	>50ms for latency-sensitive apps
Allocation rate	MB/s of new objects	>1 GB/s can overwhelm the collector
Promotion rate	MB/s moved to old generation	High rate means old gen fills fast
Full GC events	Old generation ran out of space	Any full GC in production is a fire

Python’s Dual System

CPython runs two garbage collectors simultaneously, and most Python programmers don’t know either exists.

Reference counting handles the common case: objects freed immediately when their refcount drops to zero. Deterministic, fast, no pauses.

Cyclic GC runs periodically to find reference cycles that refcounting misses. It scans objects in three generations:

import gc

# See how many objects are in each generation
print(gc.get_count())  # e.g., (687, 8, 2)
# (gen0_count, gen1_count, gen2_count)

# See the thresholds that trigger collection
print(gc.get_threshold())  # (700, 10, 10)
# Gen 0 collected every 700 allocations
# Gen 1 collected every 10 Gen 0 collections
# Gen 2 collected every 10 Gen 1 collections

# Get collection statistics
stats = gc.get_stats()
for i, gen in enumerate(stats):
    print(f"Gen {i}: {gen['collections']} collections, "
          f"{gen['collected']} objects collected, "
          f"{gen['uncollectable']} uncollectable")

The cyclic collector is a stop-the-world mark-and-sweep limited to container objects (lists, dicts, classes — things that can hold references). It doesn’t scan integers or strings because they can’t form cycles.

You can measure the cost directly:

import gc
import time

gc.collect()  # Clear everything first
gc.disable()  # Disable automatic GC

# Allocate a lot of cyclic garbage
for _ in range(100_000):
    a = {}
    b = {}
    a['ref'] = b
    b['ref'] = a

start = time.perf_counter()
collected = gc.collect()
elapsed = time.perf_counter() - start
print(f"Collected {collected} objects in {elapsed*1000:.1f}ms")

gc.enable()

On a typical machine, collecting 200,000 cyclically-linked objects takes 15-40 ms. In a web server handling requests, that’s a 15-40 ms latency spike for some unlucky request.

The Memory Leak You’re Probably Writing Right Now

GC doesn’t prevent memory leaks. It prevents one kind of memory leak — forgetting to call free(). It does nothing about the more common kind in managed languages: holding a reference you forgot about.

Here’s a pattern that appears in almost every long-running Python application:

class RequestCache:
    """Cache recent API responses for deduplication."""

    def __init__(self):
        self._cache = {}

    def get_or_fetch(self, url):
        if url in self._cache:
            return self._cache[url]

        response = self._fetch(url)
        self._cache[url] = response  # Stored forever
        return response

    def _fetch(self, url):
        import urllib.request
        with urllib.request.urlopen(url) as r:
            return r.read()  # Could be megabytes

cache = RequestCache()  # Module-level — lives forever

This cache grows without bound. Every unique URL adds an entry that will never be evicted. The GC can see that every object in the cache is reachable — cache._cache points to all of them — so it correctly does nothing. The memory grows until the process is killed by the OOM killer or the container runtime.

The fix is straightforward, but requires you to think about object lifetimes — exactly the thing GC was supposed to free you from:

from collections import OrderedDict

class BoundedCache:
    def __init__(self, max_size=1000):
        self._cache = OrderedDict()
        self._max_size = max_size

    def get_or_fetch(self, url):
        if url in self._cache:
            self._cache.move_to_end(url)
            return self._cache[url]

        response = self._fetch(url)
        self._cache[url] = response
        if len(self._cache) > self._max_size:
            self._cache.popitem(last=False)  # Evict oldest
        return response

Or use functools.lru_cache, or cachetools.TTLCache, or any other bounded eviction strategy. The point is: you still have to think about when objects should die. GC only automates the mechanical act of freeing memory, not the design decision of when something should be freed.

Measuring GC Impact Across Languages

Every major runtime gives you tools to observe GC behavior. Use them before you start guessing.

Java:

# Detailed GC logging
java -Xlog:gc*,gc+phases=debug:file=gc.log:time,uptime,level,tags \
     -XX:+UseG1GC -jar app.jar

Python:

import gc

gc.set_debug(gc.DEBUG_STATS)  # Print GC stats to stderr on each collection

# Or programmatically
gc.callbacks.append(lambda phase, info: print(f"GC {phase}: {info}"))

Go:

GODEBUG=gctrace=1 ./myapp
# Output:
# gc 1 @0.012s 2%: 0.015+1.2+0.003 ms clock, 0.12+0.8/1.1/0+0.024 ms cpu,
#   4->4->1 MB, 5 MB goal, 8 P
# Translation: GC #1, 2% of CPU time, 1.2ms wall clock, heap went from 4MB to 1MB

The Go line is dense but revelatory. The format 4->4->1 MB means: heap size at GC start → heap size at GC end before sweeping → live data after sweeping. If that first number keeps climbing, you’re allocating faster than the collector can reclaim.

The Irony

Garbage collection was a genuine advance. It eliminated an entire category of catastrophic bugs — use-after-free, double-free, heap corruption — that caused real security vulnerabilities and real crashes.

But the abstraction created a false comfort. Engineers stopped thinking about memory lifetimes because they believed the problem was solved. It wasn’t solved. It was transformed. Instead of a segfault that crashes immediately and gives you a core dump, you get a slow memory leak that manifests as gradually increasing RAM usage over weeks, or a GC pause that causes a cascade of timeouts across your microservices at peak traffic.

The segfault was honest. It told you something was wrong, immediately, with a stack trace. The GC pause is subtle. It happens intermittently, correlates with allocation rate rather than any specific code path, and is invisible unless you’re actively monitoring for it.

You don’t need to write your own allocator. You don’t need to abandon garbage-collected languages. But you absolutely need to understand what the collector is doing, how to measure it, and when it’s working against you. The alternative is debugging production latency with a blindfold on.