Skip to main content
spring internals

The Event Loop Model and Thread Economics

7 min read Chapter 47 of 78

The Abstraction

server.tomcat.threads.max=200

That property controls how many requests your Spring MVC application can handle concurrently. Not throughput. Concurrency. Once all 200 threads are occupied, request 201 waits in the acceptor queue. If the queue fills, the connection is refused.

Now look at a WebFlux application:

// No thread pool configuration needed.
// Netty uses Runtime.getRuntime().availableProcessors() threads.

On an 8-core machine, that is 8 threads. Eight threads doing the work of 200. Not through magic. Through a fundamentally different model of what a thread does during request processing.

The Mechanism: Thread-Per-Request Math

The Tomcat Model

A typical request in the SaaS backend’s notification endpoint:

Total wall-clock time: 100ms
  ├── Read request headers/body:   2ms (I/O)
  ├── Authenticate JWT:            3ms (CPU)
  ├── Query database:             85ms (I/O wait)
  ├── Serialize response:          5ms (CPU)
  └── Write response:              5ms (I/O)

CPU time: 8ms. I/O wait time: 92ms. The thread spends 92% of its life doing nothing. Parked. Waiting for bytes from the network or the database. Holding 1MB of stack memory while it waits.

The throughput calculation for 200 Tomcat threads:

$$\text{Max throughput} = \frac{\text{threads}}{\text{request duration}} = \frac{200}{0.1\text{s}} = 2{,}000 \text{ req/s}$$

That is the ceiling. To handle 4,000 req/s, you need 400 threads. To handle 10,000 req/s, you need 1,000 threads. Each thread consumes memory:

$$\text{Thread memory} = 1{,}000 \text{ threads} \times 1\text{MB stack} = 1\text{GB}$$

And this is just stack memory. Each blocked thread also holds a JDBC connection, a request buffer, response buffers, and whatever objects are on its call stack. Context switching between 1,000 threads adds overhead that the JVM and OS cannot hide.

Where the Time Goes

Visualize what a Tomcat thread does during that 100ms request:

Thread-42 timeline (100ms):
|--CPU--|----------- BLOCKED (DB I/O) -----------|--CPU--|
  8ms                   85ms                        7ms

For 85ms, Thread-42 is parked on java.net.SocketInputStream.read(). The OS scheduler removes it from the CPU. Another thread gets scheduled. That thread is also parked on I/O. The scheduler moves on. This is the overhead: thousands of context switches per second, all between threads that have nothing to do.

The Event Loop Model

Netty’s worker thread does not wait for I/O. It uses non-blocking I/O (java.nio) with a selector:

Worker-3 event loop iteration:
  1. selector.select()           → "conn-17 has data ready"
  2. Read bytes from conn-17     → 2ms CPU
  3. Process (auth, serialize)   → 8ms CPU
  4. Write response to conn-17   → non-blocking, buffered
  5. selector.select()           → "conn-42 has data ready"
  6. Read bytes from conn-42     → ...

The thread never parks. It never waits. When it issues a database query through a non-blocking driver (R2DBC), it registers a callback: “when data arrives on this connection, call this function.” Then it moves to the next ready connection.

The math changes completely:

$$\text{CPU time per request} = 8\text{ms}$$

$$\text{Max throughput per thread} = \frac{1{,}000\text{ms}}{8\text{ms}} = 125 \text{ req/s (CPU-limited)}$$

$$\text{Max throughput (8 threads)} = 8 \times 125 = 1{,}000 \text{ req/s (CPU-limited)}$$

But this is the CPU-limited ceiling. In practice, the CPU work is interleaved with I/O callbacks. While the database processes a query for connection 17, the event loop thread processes CPU work for connections 18 through 42. The effective throughput depends on the ratio of CPU time to I/O time and the number of concurrent connections.

For our SaaS notification endpoint (8ms CPU, 92ms I/O):

  • 200 Tomcat threads: 2,000 req/s ceiling, 200MB+ thread stacks
  • 8 Netty threads: 8,000-15,000 req/s ceiling depending on connection count, 8MB thread stacks

The event loop wins when requests are I/O-bound. The more I/O-bound your requests are, the larger the advantage.

The Debuggable Demonstration

The SaaS backend serves tenant notifications. Here is the blocking version under load:

// Blocking controller (Spring MVC on Tomcat)
@RestController
public class NotificationController {

    private final JdbcTemplate jdbcTemplate;

    @GetMapping("/api/tenants/{tenantId}/notifications")
    public List<Notification> getNotifications(@PathVariable String tenantId) {
        // Simulating a slow database query
        return jdbcTemplate.query(
            "SELECT * FROM notifications WHERE tenant_id = ? " +
            "ORDER BY created_at DESC LIMIT 50",
            (rs, rowNum) -> new Notification(
                rs.getString("id"),
                rs.getString("tenant_id"),
                rs.getString("message"),
                rs.getTimestamp("created_at").toInstant()
            ),
            tenantId
        );
    }
}

Load test with hey -n 10000 -c 500 http://localhost:8080/api/tenants/acme/notifications:

Tomcat (200 threads):
  Requests/sec:  1,847
  Avg latency:   267ms
  P99 latency:   1,203ms   ← thread pool exhaustion spikes
  Errors:        12         ← connection refused when queue fills

The reactive version:

// Reactive controller (Spring WebFlux on Netty)
@RestController
public class NotificationController {

    private final R2dbcEntityTemplate template;

    @GetMapping("/api/tenants/{tenantId}/notifications")
    public Flux<Notification> getNotifications(@PathVariable String tenantId) {
        return template.getDatabaseClient()
            .sql("SELECT * FROM notifications WHERE tenant_id = :tenantId " +
                 "ORDER BY created_at DESC LIMIT 50")
            .bind("tenantId", tenantId)
            .map(row -> new Notification(
                row.get("id", String.class),
                row.get("tenant_id", String.class),
                row.get("message", String.class),
                row.get("created_at", Instant.class)
            ))
            .all();
    }
}

Same load test:

Netty (8 event loop threads):
  Requests/sec:  8,234
  Avg latency:   59ms
  P99 latency:   142ms    ← stable, no thread pool to exhaust
  Errors:        0

4.5x throughput. 8.5x better p99 latency. 25x fewer threads. The difference grows as concurrency increases because the event loop model does not have a fixed thread pool ceiling.

The Failure Mode

// BROKEN: Running WebFlux on Tomcat instead of Netty
// pom.xml
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-webflux</artifactId>
    <exclusions>
        <exclusion>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-reactor-netty</artifactId>
        </exclusion>
    </exclusions>
</dependency>
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-tomcat</artifactId>
</dependency>

This compiles. This runs. Your controller methods return Mono and Flux. You believe you have a reactive application. You do not.

When WebFlux runs on Tomcat, each request still occupies a servlet thread. The reactive types are resolved on that thread. You get the programming model of reactive code with the runtime characteristics of blocking code. The thread-per-request model remains. The event loop does not exist.

You can verify this in a debugger. Set a breakpoint in your controller method and inspect Thread.currentThread().getName():

// On Netty: "reactor-http-nio-3"     ← event loop thread
// On Tomcat: "http-nio-8080-exec-7"  ← servlet thread pool

If you see http-nio-*-exec-*, you are running on Tomcat’s thread pool. Your Flux return types are decoration.

The Correct Pattern

// CORRECT: Using the default Netty runtime
// pom.xml - just include webflux, Netty comes automatically
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-webflux</artifactId>
</dependency>
<!-- No exclusions. No Tomcat. Netty is the default. -->

Verify at startup:

Netty started on port 8080
Started SaasApplication in 2.1 seconds

If the log says Tomcat started on port 8080, check your dependency tree: mvn dependency:tree | grep tomcat. A transitive dependency on spring-boot-starter-web (not webflux) pulls in Tomcat and overrides Netty.

// CORRECT: Application configuration confirming Netty
@SpringBootApplication
public class SaasApplication {

    public static void main(String[] args) {
        SpringApplication app = new SpringApplication(SaasApplication.class);
        // Explicitly set reactive web server (useful when both web and webflux are on classpath)
        app.setWebApplicationType(WebApplicationType.REACTIVE);
        app.run(args);
    }
}

Two rules for the thread economics to work:

  1. Run on Netty. Not Tomcat, not Jetty, not Undertow in blocking mode. Netty is the only runtime that gives you the event loop model by default.

  2. Never block the event loop thread. The math only works if 8ms of CPU time is the only time a thread spends per request. The moment a thread blocks on JDBC, Thread.sleep(), or a synchronized lock, the event loop stalls and the throughput model collapses to thread-per-request with 8 threads instead of 200. That is worse than Tomcat.

The event loop model is an all-or-nothing proposition. Every I/O operation in the request path must be non-blocking. One blocking call in one service method undoes the entire benefit. The next section covers how to detect and handle the cases where blocking is unavoidable.