JVM Memory in Containers: MaxRAMPercentage and OOM Prevention
JVM Memory in Containers: MaxRAMPercentage and OOM Prevention
The main chapter established that the JVM controls only heap memory, while the container cgroup enforces a limit on all memory: heap, metaspace, code cache, thread stacks, direct buffers, and native allocations combined. This section quantifies each memory region, shows how to measure them inside a running container, and demonstrates the configuration that prevents OOM kills for the content platform article service.
Symptom: Mysterious Container Restarts
The article service runs with -Xmx3g in a 4GB container. Heap usage after GC stays below 1.8GB. Grafana shows comfortable headroom. Then, twice per day, the pod restarts. Kubernetes reports OOMKilled. The JVM never threw OutOfMemoryError. No heap dump was generated.
The heap is not the problem.
Cause: Off-Heap Memory Exceeds the Container Limit
# JVM memory report using NativeMemoryTracking
# Add -XX:NativeMemoryTracking=summary to JVM flags
docker exec article-service jcmd 1 VM.native_memory summary
# Output (at the time of OOM kill):
Total: reserved=5765MB, committed=4312MB
Heap: reserved=3072MB, committed=2048MB # -Xmx3g, used ~2GB
Class (Metaspace): reserved=1088MB, committed=327MB # Unbounded default
Thread: reserved=612MB, committed=612MB # 300 threads × 2MB
Code: reserved=256MB, committed=198MB # JIT compiled code
GC: reserved=185MB, committed=185MB # GC data structures
Internal: reserved=42MB, committed=42MB # JVM internal
Symbol: reserved=18MB, committed=18MB # String table, symbols
Other: reserved=492MB, committed=882MB # Direct buffers + mmap
# committed total: 4312MB > 4096MB container limit = OOM kill
The breakdown:
Container limit: 4096MB
JVM Heap: 2048MB (committed, not max)
Metaspace: 327MB (no cap set, grew past 300MB)
Thread stacks: 612MB (300 threads × 2MB default on Linux)
Code cache: 198MB (JIT compiled methods)
GC structures: 185MB (G1 region metadata, remembered sets)
Direct buffers: 320MB (Netty I/O buffers, file-mapped regions)
Native/internal: 60MB (JNI, JVM internal malloc)
OS overhead: 150MB (libc, mapped libraries, page tables)
────────────────────────────────────────────────────────────────
Total: 3900MB (fits... sometimes)
Spike scenario:
Metaspace grows to 350MB (+23MB from lazy class loading)
Thread count spikes to 350 (+100MB stacks from request burst)
Direct buffer allocation for large response: +80MB
New total: 4103MB > 4096MB → OOM KILL
The Complete Memory Map
Every JVM process inside a container consumes memory in these regions:
┌─────────────────────────────────────────────────────────────────┐
│ Container Memory Limit (memory.max) │
│ │
│ ┌──────────────────────────────────────────┐ │
│ │ JVM Heap (-Xmx / MaxRAMPercentage) │ GC-managed │
│ │ Objects, arrays, string pool │ Shrinks on GC │
│ │ Size: 0 to Xmx │ │
│ └──────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────┐ │
│ │ Metaspace (MaxMetaspaceSize) │ Grows with │
│ │ Class metadata, method bytecodes │ loaded classes │
│ │ Default: unbounded │ │
│ └──────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────┐ │
│ │ Code Cache (ReservedCodeCacheSize) │ JIT compiled │
│ │ Compiled native methods │ code │
│ │ Default: 240MB (tiered compilation) │ │
│ └──────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────┐ │
│ │ Thread Stacks (Xss × thread count) │ Per-thread │
│ │ Call frames, local variables │ allocation │
│ │ Default: 1MB (varies by platform) │ │
│ └──────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────┐ │
│ │ Direct ByteBuffers (MaxDirectMemorySize) │ Off-heap I/O │
│ │ NIO channels, Netty buffers, mmap │ Not GC-managed │
│ └──────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────┐ │
│ │ GC Metadata │ Proportional │
│ │ Remembered sets, card tables, bitmaps │ to heap size │
│ └──────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────┐ │
│ │ OS / Native │ Kernel + libc │
│ │ glibc malloc arenas, mmap, page tables │ overhead │
│ └──────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Fix: Capping Every Memory Region
Heap: MaxRAMPercentage
# SLOW: Fixed -Xmx that does not account for container size
java -Xmx3g -jar article-service.jar
# Heap: 3GB. Leaves only 1GB for everything else. Fragile.
# SLOW: MaxRAMPercentage too high
java -XX:MaxRAMPercentage=75.0 -jar article-service.jar
# Container=4GB → Heap=3GB. Same problem.
# FAST: MaxRAMPercentage=50 with explicit caps on everything else
java -XX:MaxRAMPercentage=50.0 -jar article-service.jar
# Container=4GB → Heap=2GB. Leaves 2GB for non-heap + OS.
MaxRAMPercentage reads the container memory limit (from memory.max in cgroup v2) and sets -Xmx as a percentage:
Container limit MaxRAMPercentage Resulting -Xmx
────────────────────────────────────────────────────
2GB 50% 1GB
4GB 50% 2GB
8GB 50% 4GB
4GB 75% 3GB (too aggressive)
4GB 25% 1GB (too conservative)
Why 50% is the right default for Java services with significant non-heap usage (like Spring Boot with Netty):
4GB container, MaxRAMPercentage=50:
Heap: 2048MB (50%)
Metaspace: 256MB (capped)
Code cache: 128MB (capped)
Thread stacks: 100MB (200 threads × 512KB)
Direct buffers: 256MB (capped)
GC metadata: 120MB (proportional to 2GB heap)
Native/JVM: 80MB
OS overhead: 200MB
────────────────────────
Total: 3188MB (78% of limit)
Headroom: 908MB (22%)
22% headroom absorbs spikes without OOM kills. This is the safe configuration.
Metaspace: Capping Class Metadata
Metaspace stores class metadata: class definitions, method bytecodes, constant pools, annotations. It grows every time a new class is loaded. Without a cap, it grows until the container is killed.
# Monitor metaspace growth
docker exec article-service jcmd 1 VM.metaspace
# Output:
# MaxMetaspaceSize: unlimited ← problem
# Used: 312MB
# Committed: 327MB
# Reserved: 1088MB
# Typical metaspace usage for a Spring Boot app:
# After startup: 80-120MB (framework + app classes)
# After warm-up: 120-180MB (lazy-loaded classes)
# After 24 hours: 150-250MB (reflection, proxies, dynamic classes)
# Memory leak: 300MB+ and growing (classloader leak)
# FAST: Cap metaspace to prevent unbounded growth
java -XX:MaxMetaspaceSize=256m -jar article-service.jar
# If metaspace hits 256MB: JVM throws OutOfMemoryError: Metaspace
# This is recoverable (dump, diagnose, restart) instead of
# silent cgroup OOM kill that destroys the heap dump.
Sources of metaspace growth:
Source Typical Size Growth Pattern
───────────────────────────────────────────────────────────────
Spring Framework (core) 40MB Fixed at startup
Spring Data JPA (proxies) 15MB Fixed at startup
Jackson (type serializers) 10MB Grows with new types
Hibernate (entity metamodel) 20MB Fixed after entity scan
Reflection (Method handles) 5-15MB Grows with endpoint count
Groovy/Script engines 10-50MB Grows per script eval
CGLIB dynamic proxies 5-20MB Grows with AOP usage
Classloader leaks unbounded Grows forever (bug)
For the content platform (no Hibernate, no Groovy), 256MB is generous. The actual steady-state usage is 160MB.
Thread Stacks: Controlling Per-Thread Memory
Each JVM thread allocates a fixed-size stack at creation. The default on 64-bit Linux is 1MB (-Xss1m). For 300 threads, that is 300MB consumed regardless of how deep the call stacks actually go.
# Measure actual stack depth
docker exec article-service jcmd 1 Thread.print | \
grep -c "^\""
# Output: 287 threads
# Check max stack frame depth across all threads:
docker exec article-service jcmd 1 Thread.print | \
awk '/^"/{count=0} /at /{count++} /^$/{if(count>max)max=count} END{print max}'
# Output: 42 frames
# 42 frames × ~100 bytes per frame = ~4KB actual usage per thread
# Default stack: 1MB (1024KB)
# Utilization: 4KB / 1024KB = 0.4%
# SLOW: Default 1MB stacks, 300 threads = 300MB
java -jar article-service.jar
# Thread stack memory: 300MB
# FAST: 512KB stacks (sufficient for 99.9% of Java workloads)
java -Xss512k -jar article-service.jar
# Thread stack memory: 150MB (saved 150MB)
# More aggressive: 256KB stacks (test thoroughly, deep recursion will fail)
java -Xss256k -jar article-service.jar
# Thread stack memory: 75MB (saved 225MB)
# Risk: StackOverflowError on deep call chains (>100 frames)
The content platform uses -Xss512k. No StackOverflowError in 6 months of production. The deepest observed call chain is 67 frames, well within the 512KB budget.
DirectByteBuffer: The Hidden Memory Consumer
Netty, NIO channels, and memory-mapped files allocate memory outside the heap using ByteBuffer.allocateDirect(). This memory is not visible in heap dumps and not managed by the garbage collector (it is freed when the ByteBuffer object is GC’d, but allocation is not bounded by heap limits).
// How Netty allocates direct memory for I/O
// Each Netty channel gets a pooled direct buffer allocator
// Default pool: 16MB per arena × nCPU arenas = significant memory
// SLOW: Default Netty allocator on 32-core host (JVM sees host cores)
// Netty arenas: 2 × 32 = 64 arenas (default: 2× nCPU)
// Per arena: 16MB
// Total direct buffer pool: 64 × 16MB = 1024MB (exceeds container limit alone)
// FAST: Limit Netty arenas to match container CPU
// In application.properties or system property:
// -Dio.netty.allocator.numDirectArenas=4
// -Dio.netty.allocator.numHeapArenas=4
// Total direct buffer pool: 4 × 16MB = 64MB
# Cap JVM-level direct memory
java -XX:MaxDirectMemorySize=256m -jar article-service.jar
# Any allocation beyond 256MB throws OutOfMemoryError: Direct buffer memory
# Recoverable error, not a silent OOM kill.
# Monitor direct buffer usage
docker exec article-service jcmd 1 VM.native_memory summary | grep -A2 "Other"
# Or via JMX:
# java.nio:type=BufferPool,name=direct → MemoryUsed
Diagnosing a direct buffer leak:
// Prometheus metric for direct buffer monitoring
@Component
public class DirectBufferMetrics {
private final Gauge directBufferUsed = Gauge.build()
.name("jvm_direct_buffer_used_bytes")
.help("Direct buffer memory used")
.register();
private final Gauge directBufferCapacity = Gauge.build()
.name("jvm_direct_buffer_capacity_bytes")
.help("Direct buffer memory capacity")
.register();
@Scheduled(fixedRate = 10000)
public void collect() {
List<BufferPoolMXBean> pools = ManagementFactory
.getPlatformMXBeans(BufferPoolMXBean.class);
for (BufferPoolMXBean pool : pools) {
if ("direct".equals(pool.getName())) {
directBufferUsed.set(pool.getMemoryUsed());
directBufferCapacity.set(pool.getTotalCapacity());
}
}
}
}
Content platform direct buffer usage pattern:
Startup: 12MB (initial NIO buffers)
Under load: 80-120MB (Netty read/write buffers, pooled)
Peak (burst): 180MB (concurrent large article responses)
After GC: drops to 60-80MB (unreferenced buffers reclaimed)
MaxDirectMemorySize=256MB gives 76MB headroom above peak.
Full Memory Audit with NativeMemoryTracking
# Enable NativeMemoryTracking (adds ~5% overhead, use in staging)
java -XX:NativeMemoryTracking=detail -jar article-service.jar
# Take baseline after warm-up (10 minutes of traffic)
docker exec article-service jcmd 1 VM.native_memory baseline
# Wait 1 hour, then compare
docker exec article-service jcmd 1 VM.native_memory summary.diff
Native Memory Tracking diff (1 hour):
Total: reserved=4120MB +42MB, committed=3188MB +87MB
Heap: reserved=2048MB, committed=2048MB +0MB ← stable
Class: reserved=1088MB, committed=178MB +12MB ← growing (expected)
Thread: reserved=304MB +51MB, committed=304MB +51MB ← thread count increased
Code: reserved=256MB, committed=142MB +8MB ← JIT compiling
GC: reserved=120MB, committed=120MB +0MB ← stable
Internal: reserved=38MB +2MB, committed=38MB +2MB ← minor growth
Other: reserved=266MB, committed=358MB +14MB ← direct buffers
Diagnosis:
Thread growth: 200 → 250 threads (+51MB stacks)
→ Check for thread pool leak (cached thread pool not bounded)
Metaspace growth: +12MB in 1 hour
→ Normal: lazy class loading from Spring DI
Direct buffer growth: +14MB
→ Normal: connection pool warming under increasing traffic
Detecting Memory Leaks
# Run diff every hour. If committed total grows linearly:
# Hour 0: committed=3188MB
# Hour 1: committed=3275MB (+87MB)
# Hour 2: committed=3360MB (+85MB)
# Hour 3: committed=3448MB (+88MB)
# Linear growth = leak. Time to OOM: (4096-3188)/87 = 10.4 hours.
# Identify the leaking region from the diff.
# If Thread committed grows: thread pool leak.
# If Class committed grows: classloader leak.
# If Other committed grows: direct buffer or native memory leak.
The Final Configuration
The content platform article service, after six iterations of memory tuning:
# Dockerfile
FROM eclipse-temurin:21-jre-alpine
ENV JAVA_OPTS="\
-XX:+UseContainerSupport \
-XX:ActiveProcessorCount=2 \
-XX:MaxRAMPercentage=50.0 \
-XX:MaxMetaspaceSize=256m \
-XX:ReservedCodeCacheSize=128m \
-XX:MaxDirectMemorySize=256m \
-XX:+UseG1GC \
-XX:ParallelGCThreads=4 \
-XX:ConcGCThreads=2 \
-XX:CICompilerCount=2 \
-Xss512k \
-XX:NativeMemoryTracking=summary \
-Xlog:gc*:file=/var/log/gc.log:time,uptime,level,tags:filecount=5,filesize=10m"
COPY article-service.jar /app/article-service.jar
ENTRYPOINT ["sh", "-c", "java $JAVA_OPTS -jar /app/article-service.jar"]
# Kubernetes deployment
resources:
requests:
cpu: "1"
memory: "4Gi"
limits:
memory: "4Gi"
# No CPU limit (Section 1)
Memory budget verification:
Container limit: 4096MB
Heap (50%): 2048MB
Metaspace (cap): 256MB
Code cache (cap): 128MB
Thread stacks: 100MB (200 threads × 512KB)
Direct buffers (cap): 256MB
GC metadata: 120MB
Native/JVM internal: 80MB
OS overhead: 200MB
──────────────────────────────
Total (max): 3188MB (78% of limit)
Headroom: 908MB (22%)
Worst case (all caps hit simultaneously):
Total: 3448MB (84% of limit)
Headroom: 648MB (16%)
Still safe. No OOM kill.
Proof: OOM Kill Elimination
Before (Xmx=3g, no caps on metaspace/direct/threads):
OOM kills per week: 3-5
Time between restarts: 8-14 hours (varies with traffic)
Memory at kill: 4090-4096MB (cgroup limit)
JVM heap at kill: 1800-2400MB (not full)
Metaspace at kill: 280-340MB (unbounded growth)
Thread count at kill: 280-350 (unbounded pool)
After (MaxRAMPercentage=50, all caps set):
OOM kills per week: 0
Uptime record: 34 days (restarted for JDK update)
Memory usage (P95): 3100MB (76% of limit)
Memory usage (max): 3400MB (83% of limit)
Headroom at peak: 696MB (17%)
JVM heap after GC (P95): 1200MB (59% of Xmx)
Trade-off: The heap is 2GB instead of 3GB. The article service’s working set fits in 1.2GB. GC runs more frequently (every 45 seconds instead of every 90 seconds) but each GC pause is shorter (15ms instead of 25ms) because there is less garbage to collect. The net effect on latency is neutral. The net effect on stability is transformative.