Connection Pool Tuning: JDBC, HTTP Clients, and the Pool Size That Is Never What You Think
Connection Pool Tuning: JDBC, HTTP Clients, and the Pool Size That Is Never What You Think
The content platform’s article service connects to PostgreSQL. The team set the connection pool size to 100. “More connections means more throughput,” they reasoned. PostgreSQL buckled. Context switching between 100 concurrent connections burned 40% of the database CPU. Query latency doubled. The team reduced the pool to 10 connections. Throughput increased 3x.
Connection pool sizing is counterintuitive. Developers assume more connections mean more parallelism, which means more throughput. The opposite is true past a threshold. Every database connection consumes server memory, competes for CPU time slices, and contends for shared resources (buffer pool, WAL writer, lock tables). The optimal pool size is small, often shockingly small relative to the number of application threads.
This chapter covers the content platform’s connection pool tuning across three layers: JDBC connections to PostgreSQL (HikariCP), HTTP client connections to upstream services, and the queuing theory that determines the right size for both.
PostgreSQL Connection Overhead
Each PostgreSQL connection costs resources on the server:
- Memory: Each connection allocates
work_mem(default 4 MB) for sort and hash operations, plus shared buffer mappings. At 100 connections with default settings: 400 MB ofwork_memalone. - Process: PostgreSQL uses a process-per-connection model. Each connection spawns an OS process with its own virtual memory space. 100 connections = 100 OS processes.
- Context switching: When connections exceed CPU cores, the OS context-switches between processes. Each switch flushes L1/L2 caches and TLB entries. On a 16-core server, 100 connections mean 6.25 processes per core, causing continuous context switching.
The measured overhead on the content platform’s PostgreSQL server (16 cores, 64 GB RAM):
Connections: 10 | CPU utilization: 35% | Query P50: 2.1ms | P99: 8.4ms
Connections: 25 | CPU utilization: 52% | Query P50: 2.3ms | P99: 9.1ms
Connections: 50 | CPU utilization: 71% | Query P50: 3.8ms | P99: 18.2ms
Connections: 100 | CPU utilization: 89% | Query P50: 7.2ms | P99: 45.3ms
Connections: 200 | CPU utilization: 97% | Query P50: 14.1ms | P99: 112ms
Going from 10 to 100 connections increases CPU utilization from 35% to 89%, but query latency increases 3.4x at P50 and 5.4x at P99. The marginal throughput from connections 11-100 is negative: the database processes more connection overhead and less actual query work.
The Pool Sizing Formula
The HikariCP wiki provides a formula attributed to the PostgreSQL team:
$$\text{pool size} = T_n \times (C_m - 1) + 1$$
Where:
- $T_n$ = number of threads that can execute queries simultaneously
- $C_m$ = maximum number of simultaneous connections a single transaction requires (usually 1 unless the transaction accesses multiple databases)
For most applications where each transaction uses exactly 1 connection: pool size $\approx T_n$.
But what is $T_n$? This is where developers guess wrong. $T_n$ is not the number of application threads. It is the number of threads that would be simultaneously waiting on a database response. This depends on the ratio of database wait time to total request processing time.
Little’s Law provides the connection:
$$L = \lambda \times W$$
Where:
- $L$ = average number of connections in use (the pool size you need)
- $\lambda$ = arrival rate of database requests (queries per second)
- $W$ = average time a connection is held (query time + application processing while holding the connection)
For the content platform’s article service:
- $\lambda$ = 15,000 req/s, each making 1.2 database queries on average = 18,000 queries/s
- $W$ = 2.1 ms average query time + 0.5 ms result processing = 2.6 ms = 0.0026 s
- $L$ = 18,000 * 0.0026 = 46.8
This suggests a pool of 47 connections. But the content platform runs on 8 nodes, so each node needs $47 / 8 \approx 6$ connections. Adding headroom for bursts: 10 connections per node, 80 total.
The practical formula that accounts for hardware:
$$\text{pool size per node} = \frac{\text{database cores} \times 2}{\text{application nodes}}$$
For a 16-core PostgreSQL server with 8 application nodes:
$$\text{pool size per node} = \frac{16 \times 2}{8} = 4$$
The multiplier of 2 accounts for disk I/O wait: while one query waits for disk, another can use the CPU. For SSDs with minimal I/O wait, use a multiplier of 1.5. For spinning disks with significant I/O wait, use 3-4.
The content platform uses 10 connections per node, slightly above the formula’s suggestion, to absorb request bursts without queuing.
HikariCP Configuration
HikariCP is the fastest JDBC connection pool for the JVM. Its configuration is minimal by design, but three settings determine performance:
// SLOW: Default/naive configuration
HikariConfig config = new HikariConfig();
config.setJdbcUrl("jdbc:postgresql://db:5432/content");
config.setMaximumPoolSize(100); // Way too many
config.setMinimumIdle(100); // Keeps all connections open
config.setConnectionTimeout(30000); // 30s wait before failure
// No leak detection, no metrics
// FAST: Tuned configuration
HikariConfig config = new HikariConfig();
config.setJdbcUrl("jdbc:postgresql://db:5432/content");
config.setMaximumPoolSize(10);
config.setMinimumIdle(10); // Fixed size: no pool resizing overhead
config.setConnectionTimeout(3000); // Fail fast: 3 seconds
config.setMaxLifetime(1800000); // 30 min: recycle before DB timeout
config.setIdleTimeout(0); // Never idle out (fixed pool)
config.setLeakDetectionThreshold(5000); // Warn if held > 5s
config.setMetricsTrackerFactory(new MicrometerMetricsTrackerFactory(registry));
config.addDataSourceProperty("cachePrepStmts", "true");
config.addDataSourceProperty("prepStmtCacheSize", "250");
config.addDataSourceProperty("prepStmtCacheSqlLimit", "2048");
Key decisions:
maximumPoolSize(10): Based on the formula. Resist the urge to “add more just in case.”minimumIdle(10)=maximumPoolSize: A fixed-size pool eliminates the overhead of creating and destroying connections under load. Connection creation takes 5-20 ms (TCP handshake + TLS + PostgreSQL authentication). Under burst traffic, a pool that needs to grow incurs this latency on the hot path.connectionTimeout(3000): Fail fast. If a connection is not available in 3 seconds, the request is already degraded. Waiting 30 seconds just delays the inevitable failure.maxLifetime(1800000): Recycle connections before the database’sidle_in_transaction_session_timeoutor network equipment’s idle timeout kills them. Set this 30 seconds shorter than PostgreSQL’sstatement_timeout.leakDetectionThreshold(5000): Logs a warning if a connection is checked out for more than 5 seconds. This catches connection leaks: code that acquires a connection but fails to return it (missingclose()or exception beforeclose()).- Prepared statement caching: PostgreSQL’s query planner caches execution plans for prepared statements. Enabling the cache at the JDBC driver level avoids re-sending the statement text on every execution.
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.SECONDS)
@Warmup(iterations = 5, time = 2)
@Measurement(iterations = 5, time = 2)
@Fork(2)
@State(Scope.Benchmark)
public class PoolSizeBenchmark {
@Param({"5", "10", "20", "50", "100"})
private int poolSize;
private HikariDataSource dataSource;
@Setup(Level.Trial)
public void setup() {
HikariConfig config = new HikariConfig();
config.setJdbcUrl("jdbc:postgresql://localhost:5432/content");
config.setMaximumPoolSize(poolSize);
config.setMinimumIdle(poolSize);
config.setConnectionTimeout(5000);
dataSource = new HikariDataSource(config);
}
@TearDown(Level.Trial)
public void tearDown() {
dataSource.close();
}
@Benchmark
@Threads(100) // 100 concurrent threads competing for connections
public Article queryArticle(ThreadState state) throws Exception {
try (Connection conn = dataSource.getConnection();
PreparedStatement ps = conn.prepareStatement(
"SELECT id, title, body, view_count FROM articles WHERE id = ?")) {
ps.setString(1, state.nextArticleId());
try (ResultSet rs = ps.executeQuery()) {
if (rs.next()) {
return new Article(
rs.getString("id"),
rs.getString("title"),
rs.getString("body"),
rs.getLong("view_count")
);
}
return null;
}
}
}
@State(Scope.Thread)
public static class ThreadState {
private int counter;
private final String[] articleIds = generateArticleIds(10_000);
public String nextArticleId() {
return articleIds[counter++ % articleIds.length];
}
}
}
Results against a 16-core PostgreSQL server:
| Pool Size | Throughput (ops/s) | Avg Latency | P99 Latency | DB CPU |
|---|---|---|---|---|
| 5 | 12,400 | 7.8 ms | 18 ms | 28% |
| 10 | 18,200 | 5.4 ms | 12 ms | 45% |
| 20 | 17,800 | 5.6 ms | 15 ms | 61% |
| 50 | 14,100 | 7.0 ms | 32 ms | 82% |
| 100 | 9,800 | 10.1 ms | 68 ms | 94% |
Peak throughput is at pool size 10. At 5, the pool is slightly undersized: threads queue for connections, adding latency. At 10, the pool matches the formula ($16 \times 2 / 8 \approx 4$ per node, but only one node in this benchmark, so $16 \times 2 = 32$, reduced because 100 threads cannot generate enough load to saturate 32 connections). Beyond 20, context switching on the database server degrades both throughput and latency.
The P99 latency column tells the clearest story. At pool size 10, P99 is 12 ms. At pool size 100, P99 is 68 ms. The 90 extra connections do not add throughput. They add 56 ms of tail latency.
Connection Leak Detection
Connection leaks are the silent killer of connection pools. A leak occurs when application code obtains a connection and does not return it:
// SLOW: Connection leak on exception path
public Article getArticle(String id) throws Exception {
Connection conn = dataSource.getConnection();
PreparedStatement ps = conn.prepareStatement(
"SELECT * FROM articles WHERE id = ?");
ps.setString(1, id);
ResultSet rs = ps.executeQuery();
// If an exception occurs here, conn is never closed
Article article = mapResultSet(rs);
conn.close(); // Never reached on exception
return article;
}
// FAST: Try-with-resources guarantees connection return
public Article getArticle(String id) throws Exception {
try (Connection conn = dataSource.getConnection();
PreparedStatement ps = conn.prepareStatement(
"SELECT * FROM articles WHERE id = ?")) {
ps.setString(1, id);
try (ResultSet rs = ps.executeQuery()) {
return mapResultSet(rs);
}
} // Connection returned to pool even on exception
}
HikariCP’s leakDetectionThreshold logs a stack trace when a connection is held longer than the threshold. This is invaluable for finding leaks in code paths that are exercised infrequently:
[HikariPool-1 housekeeper] WARN - Connection leak detection triggered
for connection PostgreSQL JDBC connection (pid=1234)
on thread http-nio-8080-exec-42,
stack trace follows:
java.lang.Exception: Apparent connection leak detected
at com.zaxxer.hikari.HikariDataSource.getConnection(...)
at com.contentplatform.ArticleService.getArticle(ArticleService.java:47)
at com.contentplatform.ArticleController.get(ArticleController.java:22)
The content platform sets leakDetectionThreshold to 5 seconds in production and 2 seconds in staging. Any database operation that legitimately takes longer than 5 seconds should be reviewed regardless of whether it leaks.
Pool Monitoring with Micrometer
HikariCP integrates with Micrometer to expose pool metrics:
HikariConfig config = new HikariConfig();
config.setMetricsTrackerFactory(
new MicrometerMetricsTrackerFactory(meterRegistry));
This exposes:
| Metric | Meaning | Alert Threshold |
|---|---|---|
hikaricp.connections.active | Connections in use | > 80% of max |
hikaricp.connections.idle | Available connections | < 2 |
hikaricp.connections.pending | Threads waiting for connection | > 0 sustained |
hikaricp.connections.timeout | Connection timeout count | > 0 |
hikaricp.connections.usage | Connection hold time | P99 > 100 ms |
hikaricp.connections.creation | Time to create new connection | P99 > 50 ms |
The critical metric is pending. If threads are consistently waiting for connections, the pool is undersized, or more likely, connections are being held too long. The fix is usually reducing hold time (faster queries, smaller transactions), not increasing the pool.
The content platform dashboards show these metrics alongside database CPU and query latency. A spike in pending threads correlates with a spike in P99 latency, confirming that pool exhaustion, not query slowness, is the bottleneck.
Statements and Result Sets: The Nested Pool
Connection pooling is the first layer. Prepared statement pooling is the second. Each prepared statement involves:
- Parsing the SQL text
- Planning the query (generating the execution plan)
- Executing the plan
Without statement caching, steps 1 and 2 repeat on every execution. PostgreSQL’s pgbouncer and JDBC driver-level caching skip re-parsing for repeated statements.
// PostgreSQL JDBC driver settings for statement caching
config.addDataSourceProperty("cachePrepStmts", "true");
config.addDataSourceProperty("prepStmtCacheSize", "250");
config.addDataSourceProperty("prepStmtCacheSqlLimit", "2048");
config.addDataSourceProperty("useServerPrepStmts", "true");
prepStmtCacheSize(250): Cache up to 250 distinct prepared statements per connection. The content platform uses approximately 80 distinct queries, so 250 provides ample headroom.prepStmtCacheSqlLimit(2048): Cache statements up to 2048 characters. Some queries with long IN clauses exceed this; increase if needed.useServerPrepStmts(true): Use server-side prepared statements, which cache the execution plan on the PostgreSQL server. This eliminates re-planning overhead for repeated queries.
The throughput impact on the content platform’s most frequent query (article lookup by ID):
| Setting | Throughput | Query P50 |
|---|---|---|
| No statement cache | 15,800 ops/s | 3.1 ms |
| Client-side cache only | 17,400 ops/s | 2.4 ms |
| Server-side prepared | 18,200 ops/s | 2.1 ms |
Server-side prepared statements improve throughput by 15% and reduce P50 by 32%. The gain comes from PostgreSQL reusing the cached execution plan instead of re-running the query planner on every execution. For simple key lookups, the planning phase is a significant fraction of total query time.
The trade-off: server-side prepared statements use memory on the PostgreSQL server per connection. With 80 connections and 80 distinct statements, that is 6,400 cached plans. Each plan is typically 1-10 KB, so the total memory cost is 6-64 MB. Negligible on a 64 GB server.