JVM Profiling: BSON Parsing Overhead, GC Pauses, and Memory Leaks
JVM Profiling: BSON Parsing Overhead, GC Pauses, and Memory Leaks
The MongoDB Java Sync Driver runs inside a JVM. Every document returned from a query is deserialized from BSON wire format into Java objects. Every Java object consumes heap memory. Every heap allocation contributes to garbage collection pressure. When you read 10,000 documents per second, the allocation rate determines whether your GC pauses are 5ms or 500ms.
Senior engineers blame MongoDB when the application stalls for 200ms every few seconds. The database is not stalling. The JVM is pausing to collect the garbage created by BSON deserialization.
This diagram shows the memory amplification at each layer of document deserialization. A 1.2 KB BSON document on the wire becomes a 3.8 KB Document object in the JVM heap (3.2x bloat from Java object headers, String interning, and boxed numerics). With Spring Data mapping, the same document becomes 5.6 KB (4.7x bloat from reflection lookups, property accessors, and type conversion). At 10,000 documents per second, the raw driver allocates 38 MB/sec while Spring Data allocates 56 MB/sec, a 47% increase in GC pressure.
BSON Deserialization: Where the Bytes Go
A telemetry reading document on the wire is compact BSON:
{
"sensorId": "sensor-00042",
"ts": {"$date": "2026-05-29T14:30:00Z"},
"temp": 22.4,
"humidity": 45.2,
"pressure": 1013.25
}
Wire size: approximately 120 bytes. The MongoDB driver deserializes this into a Document object (which is a LinkedHashMap<String, Object> internally). The resulting Java object graph:
Documentobject header: 16 bytesLinkedHashMapinstance: 48 bytes- 5
Map.Entryobjects: 5 × 32 = 160 bytes - 5
Stringkeys: 5 × (~56 bytes each) = 280 bytes Stringvalue for sensorId: ~72 bytesDateobject: 24 bytes- 3
Doubleobjects (boxed): 3 × 24 = 72 bytes - HashMap internal array: 64 bytes
Total heap allocation: approximately 736 bytes. That is 6.1x the wire size. For every byte MongoDB sends, the JVM allocates 6 bytes.
At 10,000 documents per second, that is 7.2 MB/sec of heap allocation from document deserialization alone. Add query results, BSON encoding for writes, and driver internals, and the allocation rate reaches 30-50 MB/sec. With a 2 GB young generation, minor GC fires every 40-65 seconds. With G1GC, those pauses are typically 10-30ms. With a poorly tuned heap, they are 100-300ms.
Profiling with JFR
Java Flight Recorder captures allocation events without the overhead of a sampling profiler:
java -XX:+FlightRecorder \
-XX:StartFlightRecording=duration=5m,filename=mongo-alloc.jfr,settings=profile \
-jar telemetry-service.jar
After capture, analyze with jfr print:
jfr print --events jdk.ObjectAllocationInNewTLAB \
--stack-depth 10 mongo-alloc.jfr | \
grep -A5 "org.bson" | head -40
The output reveals the hotspot: org.bson.codecs.DocumentCodec.decode() and org.bson.BsonBinaryReader.readString() dominate allocation. The readString method creates a new String object for every field name and string value in every document. At 10,000 documents per second with 5 string fields each, that is 50,000 String allocations per second.