Skip to main content
kotlin in depth advanced patterns for java engineers

Continuations and State Machines — How Suspension Works

9 min read Chapter 5 of 21

Deconstructing a Suspend Function

You’ve seen the high-level picture: the compiler turns suspend functions into state machines. Now let’s trace through the transformation with enough detail that you could reconstruct the generated code by hand.

Take this function — it fetches a user, retrieves their recent orders, and computes a loyalty score. Three suspension points, two local variables that cross suspension boundaries:

suspend fun computeLoyaltyScore(userId: String): LoyaltyResult {
    val user = fetchUser(userId)                    // suspension point 1
    val orders = fetchRecentOrders(user.id, 90)     // suspension point 2
    val score = calculateScore(orders)              // suspension point 3
    return LoyaltyResult(user.name, score, orders.size)
}

Each call to a suspend function is a potential suspension point. “Potential” because a suspend function may return a result immediately (the fast path). The generated code must handle both cases.

Step 1: The Continuation Subclass

The compiler generates an inner class that extends ContinuationImpl. This class serves double duty — it’s both the callback object and the storage for local state:

// Compiler-generated (simplified)
final class ComputeLoyaltyScoreContinuation extends ContinuationImpl {
    int label = 0;              // current state
    Object result;              // result from last suspension point

    // Saved local variables — only those live across suspension boundaries
    String userId;
    User user;
    List<Order> orders;

    ComputeLoyaltyScoreContinuation(Continuation<? super LoyaltyResult> completion) {
        super(completion);
    }

    @Override
    protected Object invokeSuspend(Result<?> outcome) {
        this.result = outcome;
        // Re-enter the function's state machine
        return computeLoyaltyScore(this.userId, this);
    }
}

Notice what’s stored as fields: userId, user, and orders. The compiler analyzes which variables are live across at least one suspension point and hoists them into the continuation object. Variables that exist entirely within one state (like score, which is computed and consumed between suspension points 3 and the return) stay as local variables — they don’t need to survive suspension.

Step 2: The State Machine Switch

The function body becomes a when/switch dispatch on the label field. Here’s the full transformation:

Object computeLoyaltyScore(String userId, Continuation<? super LoyaltyResult> cont) {

    ComputeLoyaltyScoreContinuation sm;

    // Reuse the continuation if we're resuming, create a new one if first call
    if (cont instanceof ComputeLoyaltyScoreContinuation
            && (((ComputeLoyaltyScoreContinuation) cont).label & Integer.MIN_VALUE) != 0) {
        sm = (ComputeLoyaltyScoreContinuation) cont;
        sm.label -= Integer.MIN_VALUE;  // clear the reuse flag
    } else {
        sm = new ComputeLoyaltyScoreContinuation(cont);
    }

    switch (sm.label) {
        case 0: {
            sm.userId = userId;
            sm.label = 1;
            Object r = fetchUser(userId, sm);
            if (r == COROUTINE_SUSPENDED) return COROUTINE_SUSPENDED;
            // Fast path: fetchUser returned immediately
            sm.result = r;
        }
        // fall-through

        case 1: {
            // Check for exception from previous suspension
            if (sm.result instanceof Result.Failure) {
                throw ((Result.Failure) sm.result).exception;
            }
            User user = (User) sm.result;
            sm.user = user;             // save for later states
            sm.label = 2;
            Object r = fetchRecentOrders(user.getId(), 90, sm);
            if (r == COROUTINE_SUSPENDED) return COROUTINE_SUSPENDED;
            sm.result = r;
        }

        case 2: {
            if (sm.result instanceof Result.Failure) {
                throw ((Result.Failure) sm.result).exception;
            }
            List<Order> orders = (List<Order>) sm.result;
            sm.orders = orders;
            sm.label = 3;
            Object r = calculateScore(orders, sm);
            if (r == COROUTINE_SUSPENDED) return COROUTINE_SUSPENDED;
            sm.result = r;
        }

        case 3: {
            if (sm.result instanceof Result.Failure) {
                throw ((Result.Failure) sm.result).exception;
            }
            int score = (int) sm.result;
            // No more suspension points — return normally
            return new LoyaltyResult(sm.user.getName(), score, sm.orders.size());
        }

        default:
            throw new IllegalStateException("call to 'resume' before 'invoke' with coroutine");
    }
}

Walk through the flow for state 0: we save userId to the continuation, advance the label to 1, and call fetchUser() passing sm (the continuation itself) as the callback. If fetchUser needs to perform I/O, it returns COROUTINE_SUSPENDED. Our function also returns COROUTINE_SUSPENDED, unwinding back up the call stack. No thread is blocked.

Later, when the network response arrives, the dispatcher calls sm.invokeSuspend(), which re-enters this method. The label is now 1, so execution jumps to case 1, extracts the user from sm.result, and proceeds.

The COROUTINE_SUSPENDED Sentinel

COROUTINE_SUSPENDED is a singleton object defined in the coroutines library:

// kotlin.coroutines.intrinsics
internal val COROUTINE_SUSPENDED: Any = CoroutineSingletons.COROUTINE_SUSPENDED

It serves one purpose: distinguish between “this function completed synchronously” (returned an actual result) and “this function suspended” (will resume later via the continuation). Every generated state machine checks for this value after each suspension-point call.

This design means a suspend function that happens to have all its data cached can execute through its entire state machine in a single pass — no thread switching, no scheduling overhead. The state machine degrades gracefully to a normal function call when nothing actually suspends.

How Local Variables Survive Suspension

Compare with what happens on the JVM normally: local variables live on the stack frame. When a method returns, its frame is popped and those variables are gone. Since a suspended coroutine returns COROUTINE_SUSPENDED up the entire call stack, every frame is popped.

The compiler solves this by spilling live locals into the continuation object’s fields before each suspension point, then restoring them after resumption. The pattern in the generated code:

// Before suspension: save
sm.user = user;
sm.label = 2;
Object r = fetchRecentOrders(user.getId(), 90, sm);

// After resumption (case 2): restore
User user = sm.user;  // implicitly via sm.result for the suspended call's return value

This is analogous to how a compiler handles register allocation across function calls — caller-saved registers are spilled to the stack before a call and reloaded after. Here, the “stack” is the continuation object on the heap.

The implication for memory: each suspended coroutine holds one continuation object containing its saved locals. For the function above, that’s roughly 40 bytes (object header + 3 reference fields + label int). Compare with a Java thread’s stack: 512KB to 1MB. You can have millions of suspended coroutines for the memory cost of a few thousand threads.

Contrast: CompletableFuture Chain

Here’s the same logic expressed as Java CompletableFuture composition:

CompletableFuture<LoyaltyResult> computeLoyaltyScore(String userId) {
    return fetchUserAsync(userId)
        .thenCompose(user ->
            fetchRecentOrdersAsync(user.getId(), 90)
                .thenCompose(orders ->
                    calculateScoreAsync(orders)
                        .thenApply(score ->
                            new LoyaltyResult(user.getName(), score, orders.size())
                        )
                )
        );
}

Notice the nesting. Each thenCompose creates a new lambda, and the lambdas must capture variables from enclosing scopes (user, orders). In the Kotlin coroutine version, those captures are the continuation’s fields — but you never see them. The structural advantage: as a suspend function grows from 3 suspension points to 15, the Kotlin code scales linearly (sequential statements). The CompletableFuture version nests 15 levels deep or requires intermediate variables that break the flow.

The performance characteristics are comparable — both avoid blocking threads, both use heap-allocated closures/continuations. The difference is entirely in readability and maintainability.

Bridging Callback APIs: suspendCoroutine and suspendCancellableCoroutine

When you need to integrate with callback-based Java libraries — OkHttp callbacks, CompletableFuture, Vert.x handlers — you use suspendCoroutine or suspendCancellableCoroutine to convert them into suspend functions.

suspend fun <T> CompletableFuture<T>.await(): T =
    suspendCancellableCoroutine { cont ->
        // Register a callback on the CompletableFuture
        this.whenComplete { result, exception ->
            if (exception != null) {
                cont.resumeWithException(exception)
            } else {
                cont.resume(result)
            }
        }

        // Handle coroutine cancellation → cancel the future
        cont.invokeOnCancellation {
            this.cancel(true)
        }
    }

Let’s break down what happens:

  1. suspendCancellableCoroutine captures the current continuation and passes it to your lambda as cont.
  2. Your lambda registers callbacks on the external API, storing cont for later use.
  3. The lambda returns, and the coroutine suspends (the function returns COROUTINE_SUSPENDED up the stack).
  4. Later, the callback fires and calls cont.resume(result) or cont.resumeWithException(exception).
  5. The dispatcher schedules the coroutine for resumption on the appropriate thread.

The difference between suspendCoroutine and suspendCancellableCoroutine: the cancellable variant gives you invokeOnCancellation to clean up resources if the coroutine is cancelled while waiting. Always prefer the cancellable version in production code — failing to cancel the underlying operation when the coroutine is cancelled leads to resource leaks.

Here’s an OkHttp example that demonstrates the pattern with a real network call:

suspend fun OkHttpClient.executeSuspend(request: Request): Response =
    suspendCancellableCoroutine { cont ->
        val call = this.newCall(request)

        call.enqueue(object : Callback {
            override fun onResponse(call: Call, response: Response) {
                cont.resume(response)
            }

            override fun onFailure(call: Call, e: IOException) {
                cont.resumeWithException(e)
            }
        })

        cont.invokeOnCancellation {
            call.cancel()  // Propagate cancellation to the HTTP call
        }
    }

Runnable Example: Watching Suspension Happen

Here’s a complete program that makes the suspension/resumption cycle visible:

import kotlin.coroutines.*
import kotlin.coroutines.intrinsics.*

fun main() {
    var continuation: Continuation<String>? = null

    // Define a suspend function using low-level intrinsics
    val block: suspend () -> Unit = {
        println("[coroutine] Before suspension, thread: ${Thread.currentThread().name}")

        val result = suspendCoroutineUninterceptedOrReturn<String> { cont ->
            println("[coroutine] Suspending... saving continuation")
            continuation = cont
            COROUTINE_SUSPENDED  // Actually suspend
        }

        println("[coroutine] Resumed with: '$result', thread: ${Thread.currentThread().name}")
    }

    // Start the coroutine
    println("[main] Starting coroutine")
    block.startCoroutine(Continuation(EmptyCoroutineContext) { result ->
        println("[main] Coroutine completed: $result")
    })

    // At this point, the coroutine is suspended and main continues
    println("[main] Coroutine suspended, doing other work...")
    println("[main] Resuming coroutine now")

    // Resume from wherever we are — could be a different thread in real code
    continuation?.resume("data from callback")
    println("[main] Done")
}

Output:

[main] Starting coroutine
[coroutine] Before suspension, thread: main
[coroutine] Suspending... saving continuation
[main] Coroutine suspended, doing other work...
[main] Resuming coroutine now
[coroutine] Resumed with: 'data from callback', thread: main
[main] Coroutine completed: Success(kotlin.Unit)
[main] Done

Notice that the entire execution happens on a single thread. The coroutine suspends (control returns to main), and when we call resume, execution enters the coroutine’s state machine at the next label — still on the same thread. No thread pool, no context switch. The dispatcher decides whether to resume on the same or a different thread; here, with EmptyCoroutineContext, there’s no dispatcher at all.

Performance Accounting

Per suspended coroutine, you pay:

  • One heap object (the continuation) — typically 50–200 bytes depending on local variable count
  • No stack frames — the stack is unwound on suspension
  • One virtual dispatch on resumption — the invokeSuspend call

Compare with one Java thread:

  • 512KB–1MB stack (default, configurable)
  • OS kernel thread — subject to scheduler overhead and context switch cost (~1–10μs)
  • Thread-local storage — HashMap per thread

And one Loom virtual thread:

  • ~1KB initial stack (grows as needed, stack frames stored on heap)
  • No OS thread while suspended — mounted/unmounted from carrier threads
  • Continuation objects internally — Loom uses a similar mechanism, but at the JVM level

Kotlin coroutines and Loom virtual threads converge on the same underlying idea: heap-allocated continuations instead of pinned stack memory. The difference is where the transformation happens. Loom does it in the JVM runtime (transparent to all JVM languages), Kotlin does it at compile time (works on any JVM version ≥ 1.8). When targeting JVM 21+, you can even run Kotlin coroutines on Dispatchers.IO backed by virtual threads — getting both the compiler-level structured concurrency and the JVM-level lightweight threading.