JVM Memory in Containers: MaxRAMPercentage and OOM Prevention

The main chapter established that the JVM controls only heap memory, while the container cgroup enforces a limit on all memory: heap, metaspace, code cache, thread stacks, direct buffers, and native allocations combined. This section quantifies each memory region, shows how to measure them inside a running container, and demonstrates the configuration that prevents OOM kills for the content platform article service.

Symptom: Mysterious Container Restarts

The article service runs with -Xmx3g in a 4GB container. Heap usage after GC stays below 1.8GB. Grafana shows comfortable headroom. Then, twice per day, the pod restarts. Kubernetes reports OOMKilled. The JVM never threw OutOfMemoryError. No heap dump was generated.

The heap is not the problem.

Cause: Off-Heap Memory Exceeds the Container Limit

# JVM memory report using NativeMemoryTracking
# Add -XX:NativeMemoryTracking=summary to JVM flags
docker exec article-service jcmd 1 VM.native_memory summary

# Output (at the time of OOM kill):
Total: reserved=5765MB, committed=4312MB

  Heap:              reserved=3072MB, committed=2048MB    # -Xmx3g, used ~2GB
  Class (Metaspace): reserved=1088MB, committed=327MB     # Unbounded default
  Thread:            reserved=612MB,  committed=612MB     # 300 threads × 2MB
  Code:              reserved=256MB,  committed=198MB     # JIT compiled code
  GC:                reserved=185MB,  committed=185MB     # GC data structures
  Internal:          reserved=42MB,   committed=42MB      # JVM internal
  Symbol:            reserved=18MB,   committed=18MB      # String table, symbols
  Other:             reserved=492MB,  committed=882MB     # Direct buffers + mmap

# committed total: 4312MB > 4096MB container limit = OOM kill

The breakdown:

Container limit:    4096MB

JVM Heap:           2048MB (committed, not max)
Metaspace:           327MB (no cap set, grew past 300MB)
Thread stacks:       612MB (300 threads × 2MB default on Linux)
Code cache:          198MB (JIT compiled methods)
GC structures:       185MB (G1 region metadata, remembered sets)
Direct buffers:      320MB (Netty I/O buffers, file-mapped regions)
Native/internal:      60MB (JNI, JVM internal malloc)
OS overhead:         150MB (libc, mapped libraries, page tables)
────────────────────────────────────────────────────────────────
Total:             3900MB (fits... sometimes)

Spike scenario:
  Metaspace grows to 350MB (+23MB from lazy class loading)
  Thread count spikes to 350 (+100MB stacks from request burst)
  Direct buffer allocation for large response: +80MB
  New total: 4103MB > 4096MB → OOM KILL

The Complete Memory Map

Every JVM process inside a container consumes memory in these regions:

┌─────────────────────────────────────────────────────────────────┐
│ Container Memory Limit (memory.max)                             │
│                                                                 │
│  ┌──────────────────────────────────────────┐                   │
│  │ JVM Heap (-Xmx / MaxRAMPercentage)       │ GC-managed       │
│  │ Objects, arrays, string pool              │ Shrinks on GC    │
│  │ Size: 0 to Xmx                           │                  │
│  └──────────────────────────────────────────┘                   │
│                                                                 │
│  ┌──────────────────────────────────────────┐                   │
│  │ Metaspace (MaxMetaspaceSize)             │ Grows with        │
│  │ Class metadata, method bytecodes          │ loaded classes    │
│  │ Default: unbounded                        │                  │
│  └──────────────────────────────────────────┘                   │
│                                                                 │
│  ┌──────────────────────────────────────────┐                   │
│  │ Code Cache (ReservedCodeCacheSize)       │ JIT compiled      │
│  │ Compiled native methods                   │ code              │
│  │ Default: 240MB (tiered compilation)       │                  │
│  └──────────────────────────────────────────┘                   │
│                                                                 │
│  ┌──────────────────────────────────────────┐                   │
│  │ Thread Stacks (Xss × thread count)       │ Per-thread        │
│  │ Call frames, local variables              │ allocation        │
│  │ Default: 1MB (varies by platform)         │                  │
│  └──────────────────────────────────────────┘                   │
│                                                                 │
│  ┌──────────────────────────────────────────┐                   │
│  │ Direct ByteBuffers (MaxDirectMemorySize) │ Off-heap I/O      │
│  │ NIO channels, Netty buffers, mmap         │ Not GC-managed   │
│  └──────────────────────────────────────────┘                   │
│                                                                 │
│  ┌──────────────────────────────────────────┐                   │
│  │ GC Metadata                              │ Proportional     │
│  │ Remembered sets, card tables, bitmaps     │ to heap size     │
│  └──────────────────────────────────────────┘                   │
│                                                                 │
│  ┌──────────────────────────────────────────┐                   │
│  │ OS / Native                              │ Kernel + libc    │
│  │ glibc malloc arenas, mmap, page tables    │ overhead         │
│  └──────────────────────────────────────────┘                   │
└─────────────────────────────────────────────────────────────────┘

Fix: Capping Every Memory Region

Heap: MaxRAMPercentage

# SLOW: Fixed -Xmx that does not account for container size
java -Xmx3g -jar article-service.jar
# Heap: 3GB. Leaves only 1GB for everything else. Fragile.

# SLOW: MaxRAMPercentage too high
java -XX:MaxRAMPercentage=75.0 -jar article-service.jar
# Container=4GB → Heap=3GB. Same problem.

# FAST: MaxRAMPercentage=50 with explicit caps on everything else
java -XX:MaxRAMPercentage=50.0 -jar article-service.jar
# Container=4GB → Heap=2GB. Leaves 2GB for non-heap + OS.

MaxRAMPercentage reads the container memory limit (from memory.max in cgroup v2) and sets -Xmx as a percentage:

Container limit    MaxRAMPercentage    Resulting -Xmx
────────────────────────────────────────────────────
2GB                50%                 1GB
4GB                50%                 2GB
8GB                50%                 4GB
4GB                75%                 3GB (too aggressive)
4GB                25%                 1GB (too conservative)

Why 50% is the right default for Java services with significant non-heap usage (like Spring Boot with Netty):

4GB container, MaxRAMPercentage=50:
  Heap:            2048MB (50%)
  Metaspace:        256MB (capped)
  Code cache:       128MB (capped)
  Thread stacks:    100MB (200 threads × 512KB)
  Direct buffers:   256MB (capped)
  GC metadata:      120MB (proportional to 2GB heap)
  Native/JVM:        80MB
  OS overhead:      200MB
  ────────────────────────
  Total:           3188MB (78% of limit)
  Headroom:         908MB (22%)

22% headroom absorbs spikes without OOM kills. This is the safe configuration.

Metaspace: Capping Class Metadata

Metaspace stores class metadata: class definitions, method bytecodes, constant pools, annotations. It grows every time a new class is loaded. Without a cap, it grows until the container is killed.

# Monitor metaspace growth
docker exec article-service jcmd 1 VM.metaspace

# Output:
#   MaxMetaspaceSize: unlimited    ← problem
#   Used: 312MB
#   Committed: 327MB
#   Reserved: 1088MB

# Typical metaspace usage for a Spring Boot app:
#   After startup:     80-120MB (framework + app classes)
#   After warm-up:     120-180MB (lazy-loaded classes)
#   After 24 hours:    150-250MB (reflection, proxies, dynamic classes)
#   Memory leak:       300MB+ and growing (classloader leak)

# FAST: Cap metaspace to prevent unbounded growth
java -XX:MaxMetaspaceSize=256m -jar article-service.jar
# If metaspace hits 256MB: JVM throws OutOfMemoryError: Metaspace
# This is recoverable (dump, diagnose, restart) instead of
# silent cgroup OOM kill that destroys the heap dump.

Sources of metaspace growth:

Source                          Typical Size    Growth Pattern
───────────────────────────────────────────────────────────────
Spring Framework (core)         40MB            Fixed at startup
Spring Data JPA (proxies)       15MB            Fixed at startup
Jackson (type serializers)      10MB            Grows with new types
Hibernate (entity metamodel)    20MB            Fixed after entity scan
Reflection (Method handles)     5-15MB          Grows with endpoint count
Groovy/Script engines           10-50MB         Grows per script eval
CGLIB dynamic proxies           5-20MB          Grows with AOP usage
Classloader leaks               unbounded       Grows forever (bug)

For the content platform (no Hibernate, no Groovy), 256MB is generous. The actual steady-state usage is 160MB.

Thread Stacks: Controlling Per-Thread Memory

Each JVM thread allocates a fixed-size stack at creation. The default on 64-bit Linux is 1MB (-Xss1m). For 300 threads, that is 300MB consumed regardless of how deep the call stacks actually go.

# Measure actual stack depth
docker exec article-service jcmd 1 Thread.print | \
  grep -c "^\"" 
# Output: 287 threads

# Check max stack frame depth across all threads:
docker exec article-service jcmd 1 Thread.print | \
  awk '/^"/{count=0} /at /{count++} /^$/{if(count>max)max=count} END{print max}'
# Output: 42 frames

# 42 frames × ~100 bytes per frame = ~4KB actual usage per thread
# Default stack: 1MB (1024KB)
# Utilization: 4KB / 1024KB = 0.4%

# SLOW: Default 1MB stacks, 300 threads = 300MB
java -jar article-service.jar
# Thread stack memory: 300MB

# FAST: 512KB stacks (sufficient for 99.9% of Java workloads)
java -Xss512k -jar article-service.jar
# Thread stack memory: 150MB (saved 150MB)

# More aggressive: 256KB stacks (test thoroughly, deep recursion will fail)
java -Xss256k -jar article-service.jar
# Thread stack memory: 75MB (saved 225MB)
# Risk: StackOverflowError on deep call chains (>100 frames)

The content platform uses -Xss512k. No StackOverflowError in 6 months of production. The deepest observed call chain is 67 frames, well within the 512KB budget.

DirectByteBuffer: The Hidden Memory Consumer

Netty, NIO channels, and memory-mapped files allocate memory outside the heap using ByteBuffer.allocateDirect(). This memory is not visible in heap dumps and not managed by the garbage collector (it is freed when the ByteBuffer object is GC’d, but allocation is not bounded by heap limits).

// How Netty allocates direct memory for I/O
// Each Netty channel gets a pooled direct buffer allocator
// Default pool: 16MB per arena × nCPU arenas = significant memory

// SLOW: Default Netty allocator on 32-core host (JVM sees host cores)
// Netty arenas: 2 × 32 = 64 arenas (default: 2× nCPU)
// Per arena: 16MB
// Total direct buffer pool: 64 × 16MB = 1024MB (exceeds container limit alone)

// FAST: Limit Netty arenas to match container CPU
// In application.properties or system property:
// -Dio.netty.allocator.numDirectArenas=4
// -Dio.netty.allocator.numHeapArenas=4
// Total direct buffer pool: 4 × 16MB = 64MB

# Cap JVM-level direct memory
java -XX:MaxDirectMemorySize=256m -jar article-service.jar
# Any allocation beyond 256MB throws OutOfMemoryError: Direct buffer memory
# Recoverable error, not a silent OOM kill.

# Monitor direct buffer usage
docker exec article-service jcmd 1 VM.native_memory summary | grep -A2 "Other"
# Or via JMX:
# java.nio:type=BufferPool,name=direct → MemoryUsed

Diagnosing a direct buffer leak:

// Prometheus metric for direct buffer monitoring
@Component
public class DirectBufferMetrics {

    private final Gauge directBufferUsed = Gauge.build()
        .name("jvm_direct_buffer_used_bytes")
        .help("Direct buffer memory used")
        .register();

    private final Gauge directBufferCapacity = Gauge.build()
        .name("jvm_direct_buffer_capacity_bytes")
        .help("Direct buffer memory capacity")
        .register();

    @Scheduled(fixedRate = 10000)
    public void collect() {
        List<BufferPoolMXBean> pools = ManagementFactory
            .getPlatformMXBeans(BufferPoolMXBean.class);
        for (BufferPoolMXBean pool : pools) {
            if ("direct".equals(pool.getName())) {
                directBufferUsed.set(pool.getMemoryUsed());
                directBufferCapacity.set(pool.getTotalCapacity());
            }
        }
    }
}

Content platform direct buffer usage pattern:

  Startup:        12MB (initial NIO buffers)
  Under load:     80-120MB (Netty read/write buffers, pooled)
  Peak (burst):   180MB (concurrent large article responses)
  After GC:       drops to 60-80MB (unreferenced buffers reclaimed)

  MaxDirectMemorySize=256MB gives 76MB headroom above peak.

Full Memory Audit with NativeMemoryTracking

# Enable NativeMemoryTracking (adds ~5% overhead, use in staging)
java -XX:NativeMemoryTracking=detail -jar article-service.jar

# Take baseline after warm-up (10 minutes of traffic)
docker exec article-service jcmd 1 VM.native_memory baseline

# Wait 1 hour, then compare
docker exec article-service jcmd 1 VM.native_memory summary.diff

Native Memory Tracking diff (1 hour):

Total: reserved=4120MB +42MB, committed=3188MB +87MB

  Heap:       reserved=2048MB, committed=2048MB +0MB       ← stable
  Class:      reserved=1088MB, committed=178MB +12MB       ← growing (expected)
  Thread:     reserved=304MB +51MB, committed=304MB +51MB  ← thread count increased
  Code:       reserved=256MB, committed=142MB +8MB         ← JIT compiling
  GC:         reserved=120MB, committed=120MB +0MB         ← stable
  Internal:   reserved=38MB +2MB, committed=38MB +2MB      ← minor growth
  Other:      reserved=266MB, committed=358MB +14MB        ← direct buffers

Diagnosis:
  Thread growth: 200 → 250 threads (+51MB stacks)
    → Check for thread pool leak (cached thread pool not bounded)
  Metaspace growth: +12MB in 1 hour
    → Normal: lazy class loading from Spring DI
  Direct buffer growth: +14MB
    → Normal: connection pool warming under increasing traffic

Detecting Memory Leaks

# Run diff every hour. If committed total grows linearly:
# Hour 0: committed=3188MB
# Hour 1: committed=3275MB (+87MB)
# Hour 2: committed=3360MB (+85MB)
# Hour 3: committed=3448MB (+88MB)
# Linear growth = leak. Time to OOM: (4096-3188)/87 = 10.4 hours.

# Identify the leaking region from the diff.
# If Thread committed grows: thread pool leak.
# If Class committed grows: classloader leak.
# If Other committed grows: direct buffer or native memory leak.

The Final Configuration

The content platform article service, after six iterations of memory tuning:

# Dockerfile
FROM eclipse-temurin:21-jre-alpine

ENV JAVA_OPTS="\
  -XX:+UseContainerSupport \
  -XX:ActiveProcessorCount=2 \
  -XX:MaxRAMPercentage=50.0 \
  -XX:MaxMetaspaceSize=256m \
  -XX:ReservedCodeCacheSize=128m \
  -XX:MaxDirectMemorySize=256m \
  -XX:+UseG1GC \
  -XX:ParallelGCThreads=4 \
  -XX:ConcGCThreads=2 \
  -XX:CICompilerCount=2 \
  -Xss512k \
  -XX:NativeMemoryTracking=summary \
  -Xlog:gc*:file=/var/log/gc.log:time,uptime,level,tags:filecount=5,filesize=10m"

COPY article-service.jar /app/article-service.jar
ENTRYPOINT ["sh", "-c", "java $JAVA_OPTS -jar /app/article-service.jar"]

# Kubernetes deployment
resources:
  requests:
    cpu: "1"
    memory: "4Gi"
  limits:
    memory: "4Gi"
    # No CPU limit (Section 1)

Memory budget verification:

Container limit:        4096MB

  Heap (50%):            2048MB
  Metaspace (cap):        256MB
  Code cache (cap):       128MB
  Thread stacks:          100MB  (200 threads × 512KB)
  Direct buffers (cap):   256MB
  GC metadata:            120MB
  Native/JVM internal:     80MB
  OS overhead:            200MB
  ──────────────────────────────
  Total (max):           3188MB  (78% of limit)
  Headroom:               908MB  (22%)

Worst case (all caps hit simultaneously):
  Total:                 3448MB  (84% of limit)
  Headroom:               648MB  (16%)
  Still safe. No OOM kill.

Proof: OOM Kill Elimination

Before (Xmx=3g, no caps on metaspace/direct/threads):
  OOM kills per week:        3-5
  Time between restarts:     8-14 hours (varies with traffic)
  Memory at kill:            4090-4096MB (cgroup limit)
  JVM heap at kill:          1800-2400MB (not full)
  Metaspace at kill:         280-340MB (unbounded growth)
  Thread count at kill:      280-350 (unbounded pool)

After (MaxRAMPercentage=50, all caps set):
  OOM kills per week:        0
  Uptime record:             34 days (restarted for JDK update)
  Memory usage (P95):        3100MB (76% of limit)
  Memory usage (max):        3400MB (83% of limit)
  Headroom at peak:          696MB (17%)
  JVM heap after GC (P95):   1200MB (59% of Xmx)

Trade-off: The heap is 2GB instead of 3GB. The article service’s working set fits in 1.2GB. GC runs more frequently (every 45 seconds instead of every 90 seconds) but each GC pause is shorter (15ms instead of 25ms) because there is less garbage to collect. The net effect on latency is neutral. The net effect on stability is transformative.