Skip to main content
unbound mongodb at scale

Connection Pool Mechanics: Sizing, Wait Times, and the Cost of Connection Churn

2 min read Chapter 10 of 72

Connection Pool Mechanics

The MongoDB Java Sync Driver maintains a connection pool per MongoClient instance. Every database operation checks out a connection from the pool, uses it for the operation, and returns it. When the pool is exhausted, threads wait in a queue. When the queue exceeds maxWaitTime, the operation fails with a MongoWaitQueueFullException.

The default pool size is 100 connections. This default is wrong for most production workloads. Too small and threads wait. Too large and the MongoDB server is overwhelmed by connection overhead. Correct sizing requires knowing your throughput and your average operation duration.

Connection pool state machine showing IDLE, CHECKED_OUT, WAIT_QUEUE, and TIMEOUT states with pool sizing impact at different maxPoolSize settings

This diagram shows the connection lifecycle. A connection starts IDLE in the pool, transitions to CHECKED_OUT when an operation needs it, and returns to IDLE after the operation completes. When all connections are checked out, new requests enter the WAIT_QUEUE. If a connection does not become available within maxWaitTime, the request receives a TIMEOUT error. The three pool sizing boxes show the trade-off: too small (10) causes 12% timeout rate, correct (50) keeps wait queues near zero, and too large (500) wastes server CPU maintaining idle connections.

The Pool Sizing Formula

The minimum pool size required to sustain a given throughput without queuing:

$$\text{minPoolSize} = \text{throughput} \times \text{avgOperationDuration}$$

For the telemetry platform at 1,000 writes per second with an average write duration of 15ms:

$$\text{minPoolSize} = 1000 \times 0.015 = 15$$

Fifteen connections can sustain 1,000 ops/sec if every operation takes exactly 15ms. But operations do not take exactly 15ms. The p99 write duration is 120ms. During a WiredTiger checkpoint, some operations take 500ms. The pool must handle the p99 case, not the average case.

A practical formula that accounts for tail latency:

$$\text{poolSize} = \text{throughput} \times \text{p99OperationDuration} \times 1.5$$

$$\text{poolSize} = 1000 \times 0.120 \times 1.5 = 180$$

The 1.5 multiplier provides headroom for burst traffic. Configure the pool accordingly:

// FAST: Connection pool sized for the workload
MongoClientSettings settings = MongoClientSettings.builder()
    .applyConnectionString(new ConnectionString("mongodb://mongo-primary:27017"))
    .applyToConnectionPoolSettings(builder -> builder
        .maxSize(180)                              // Sized for p99 throughput
        .minSize(20)                               // Keep warm connections ready
        .maxWaitTime(2, TimeUnit.SECONDS)           // Fail fast, don't queue forever
        .maxConnectionIdleTime(5, TimeUnit.MINUTES) // Prune idle connections
        .maxConnectionLifeTime(30, TimeUnit.MINUTES) // Rotate to avoid stale connections
    )
    .build();

MongoClient client = MongoClients.create(settings);