Skip to main content
fast by design

Instrumenting the Critical Path

8 min read Chapter 86 of 90

Instrumenting the Critical Path

The main chapter showed how auto-instrumentation with the OpenTelemetry Java Agent captures HTTP calls, JDBC queries, and Redis commands. That covers roughly 70% of the time spent in the content platform’s request lifecycle. The other 30% is invisible: in-memory processing, cache serialization, template rendering, business logic that calls no external systems. Auto-instrumentation cannot see inside your code. You need manual spans to fill the gaps.

What Auto-Instrumentation Captures (and What It Misses)

The Java Agent intercepts library calls at well-known extension points. Here is exactly what it sees and does not see in the content platform’s article rendering path:

Request: GET /articles/distributed-tracing-guide

Auto-instrumented (visible in traces):
├── Spring MVC: ArticleController.getArticle()     12ms total span
│   ├── Redis: GET article:cache:dt-guide           2ms
│   ├── PostgreSQL: SELECT * FROM articles           8ms
│   └── HTTP: GET recommendation-service/related    45ms

NOT instrumented (invisible):
├── Markdown parsing (commonmark-java)               28ms
├── Syntax highlighting (Prism)                       35ms
├── Table of contents extraction                       4ms
├── Related article score calculation                 12ms
├── Template rendering (Thymeleaf)                    18ms
└── Response serialization (Jackson)                   6ms

Total auto-instrumented: 67ms
Total uninstrumented: 103ms
Actual request duration: 170ms
Gap: 103ms (61% of request time is invisible)

The auto-instrumented spans account for 39% of the request duration. A performance engineer looking at this trace sees a 170ms request with 67ms of explained time and assumes the remaining 103ms is “framework overhead.” It is not. It is application code doing real work, and some of that work is slow.

Adding Manual Spans

Manual spans require the OpenTelemetry API dependency and a Tracer instance:

<!-- pom.xml -->
<dependency>
    <groupId>io.opentelemetry</groupId>
    <artifactId>opentelemetry-api</artifactId>
</dependency>

The API dependency has zero transitive dependencies and adds no runtime overhead when no agent is attached. Your code compiles and runs with or without the agent present.

// SLOW: No visibility into rendering pipeline
public RenderedArticle renderArticle(Article article) {
    String html = markdownParser.parse(article.getContent());
    String highlighted = syntaxHighlighter.highlight(html);
    TableOfContents toc = tocExtractor.extract(highlighted);
    return new RenderedArticle(highlighted, toc);
}
// Trace shows: renderArticle() = 85ms. Why? No idea.

// FAST: Manual spans expose the breakdown
private static final Tracer tracer = GlobalOpenTelemetry.getTracer("article-service");

public RenderedArticle renderArticle(Article article) {
    Span renderSpan = tracer.spanBuilder("article.render")
        .setAttribute("article.id", article.getId())
        .setAttribute("article.content.length", article.getContent().length())
        .startSpan();

    try (Scope scope = renderSpan.makeCurrent()) {
        String html;
        Span parseSpan = tracer.spanBuilder("article.render.markdown-parse").startSpan();
        try (Scope parseScope = parseSpan.makeCurrent()) {
            html = markdownParser.parse(article.getContent());
            parseSpan.setAttribute("output.length", html.length());
        } finally {
            parseSpan.end();
        }

        String highlighted;
        Span highlightSpan = tracer.spanBuilder("article.render.syntax-highlight").startSpan();
        try (Scope highlightScope = highlightSpan.makeCurrent()) {
            highlighted = syntaxHighlighter.highlight(html);
            highlightSpan.setAttribute("code.blocks.count", countCodeBlocks(html));
        } finally {
            highlightSpan.end();
        }

        TableOfContents toc;
        Span tocSpan = tracer.spanBuilder("article.render.toc-extract").startSpan();
        try (Scope tocScope = tocSpan.makeCurrent()) {
            toc = tocExtractor.extract(highlighted);
            tocSpan.setAttribute("toc.entries", toc.getEntries().size());
        } finally {
            tocSpan.end();
        }

        return new RenderedArticle(highlighted, toc);
    } finally {
        renderSpan.end();
    }
}

// Trace now shows:
// article.render (85ms)
// ├── article.render.markdown-parse (28ms)  content.length=14200
// ├── article.render.syntax-highlight (35ms) code.blocks.count=12
// └── article.render.toc-extract (4ms)       toc.entries=8
// Missing: 18ms (template rendering, in a different method)

The trace now reveals that syntax highlighting takes 35ms and scales with the number of code blocks. An article with 12 code blocks takes 35ms. An article with 2 code blocks takes 6ms. That correlation is visible because the code.blocks.count attribute is attached to the span.

Custom Attributes That Matter

Attributes turn a span from “something took 35ms” into “syntax highlighting 12 code blocks in a 14,200-character article took 35ms.” The right attributes make the span self-documenting for performance analysis.

Rules for choosing attributes:

  1. Add dimensions you will filter or group by. If you want to answer “are large articles slower?”, you need article.content.length. If you want to answer “which category is slowest?”, you need article.category.

  2. Add counts of items processed. Batch operations should record the batch size. Loop operations should record the iteration count. This lets you calculate per-item cost.

  3. Add cache hit/miss status. Every cache lookup span should record whether it hit or missed. This is the single most useful attribute for performance debugging.

  4. Never add high-cardinality unbounded values as attributes. User IDs with millions of distinct values explode storage. Article IDs with tens of thousands are fine. Query strings are fine if you parameterize them. Full request bodies are never fine.

// Effective attributes for the content platform
span.setAttribute("article.id", articleId);              // medium cardinality, essential for debugging
span.setAttribute("article.category", category);          // low cardinality, good for grouping
span.setAttribute("article.content.length", length);      // numeric, good for correlation
span.setAttribute("cache.hit", cacheResult != null);      // boolean, essential for cache analysis
span.setAttribute("batch.size", items.size());             // numeric, essential for batch operations
span.setAttribute("search.results.count", results.size());// numeric, good for relevance tuning

// Avoid these
span.setAttribute("user.session.id", sessionId);          // high cardinality, rarely useful
span.setAttribute("request.body", requestBody);            // unbounded size, never do this
span.setAttribute("article.content", fullContent);         // unbounded, use length instead

Span Events for Sub-Operation Timing

Span events are timestamped annotations within a span. They are lighter than child spans and better suited for marking points in time rather than durations. Use events when you need to know when something happened within a span but creating a child span would add noise.

public List<Article> searchArticles(String query) {
    Span span = tracer.spanBuilder("article.search")
        .setAttribute("search.query.length", query.length())
        .startSpan();

    try (Scope scope = span.makeCurrent()) {
        // Event: query parsing complete
        ParsedQuery parsed = queryParser.parse(query);
        span.addEvent("query.parsed", Attributes.of(
            AttributeKey.longKey("terms.count"), (long) parsed.getTerms().size(),
            AttributeKey.booleanKey("has.filters"), parsed.hasFilters()
        ));

        // Event: embedding generated
        float[] embedding = embeddingService.embed(parsed.getText());
        span.addEvent("embedding.generated", Attributes.of(
            AttributeKey.longKey("embedding.dimensions"), (long) embedding.length
        ));

        // Event: vector search complete
        List<ScoredPoint> points = qdrantClient.search(embedding, 20);
        span.addEvent("vector.search.complete", Attributes.of(
            AttributeKey.longKey("results.count"), (long) points.size(),
            AttributeKey.doubleKey("top.score"), points.isEmpty() ? 0.0 : points.get(0).getScore()
        ));

        // Event: article hydration complete
        List<Article> articles = articleRepository.findByIds(
            points.stream().map(ScoredPoint::getId).toList()
        );
        span.addEvent("articles.hydrated", Attributes.of(
            AttributeKey.longKey("articles.count"), (long) articles.size()
        ));

        span.setAttribute("search.results.count", articles.size());
        return articles;
    } finally {
        span.end();
    }
}

The resulting span has a single duration (say 145ms) with four events marking internal milestones. In Jaeger, these appear as dots on the span’s timeline:

article.search (145ms)
│ 0ms    query.parsed          terms.count=3, has.filters=true
│ 12ms   embedding.generated   embedding.dimensions=384
│ 108ms  vector.search.complete results.count=20, top.score=0.89
│ 142ms  articles.hydrated     articles.count=20

Breakdown: parsing=12ms, embedding=96ms, vector search=34ms, hydration=3ms
Bottleneck: embedding generation (96ms, 66% of search time)

Events are cheaper than spans in both code complexity and storage. Each event is a single timestamp plus attributes. A span requires start/end timestamps, a span ID, a parent reference, and status. Use events for sequential phases within a single logical operation. Use child spans for operations that call external systems or that you want to independently query and aggregate.

Instrumenting Async and Batch Operations

The content platform’s analytics pipeline processes page views asynchronously via Kafka. Each consumer batch contains 50-200 events. The auto-instrumented Kafka consumer span covers the entire poll-process-commit cycle, but you need to see individual event processing:

// SLOW: Single span for entire batch, no per-event visibility
@KafkaListener(topics = "page-views", batch = "true")
public void processPageViews(List<ConsumerRecord<String, PageViewEvent>> records) {
    for (ConsumerRecord<String, PageViewEvent> record : records) {
        analyticsRepository.save(record.value());
    }
}
// Trace: kafka.consume (340ms) batch.size=150. Which events are slow? Unknown.

// FAST: Per-event spans linked to the producer context
@KafkaListener(topics = "page-views", batch = "true")
public void processPageViews(List<ConsumerRecord<String, PageViewEvent>> records) {
    Span batchSpan = tracer.spanBuilder("analytics.batch-process")
        .setAttribute("batch.size", records.size())
        .startSpan();

    try (Scope scope = batchSpan.makeCurrent()) {
        int errors = 0;
        for (ConsumerRecord<String, PageViewEvent> record : records) {
            Span eventSpan = tracer.spanBuilder("analytics.process-event")
                .setAttribute("event.article.id", record.value().getArticleId())
                .setAttribute("event.type", record.value().getType())
                .startSpan();
            try (Scope eventScope = eventSpan.makeCurrent()) {
                analyticsRepository.save(record.value());
            } catch (Exception e) {
                eventSpan.setStatus(StatusCode.ERROR, e.getMessage());
                eventSpan.recordException(e);
                errors++;
            } finally {
                eventSpan.end();
            }
        }
        batchSpan.setAttribute("batch.errors", errors);
    } finally {
        batchSpan.end();
    }
}

For async operations using CompletableFuture, you must propagate the context explicitly. The OpenTelemetry context is stored in a ThreadLocal. When execution moves to a different thread, the context is lost:

// SLOW: Context lost when crossing thread boundaries
CompletableFuture.supplyAsync(() -> {
    // Span.current() returns a no-op span here
    // Any spans created are orphaned (no parent)
    return recommendationService.getRelated(articleId);
});

// FAST: Wrap the executor to propagate context
ExecutorService tracingExecutor = Context.taskWrapping(Executors.newFixedThreadPool(10));

CompletableFuture.supplyAsync(() -> {
    // Context is propagated, Span.current() returns the correct parent
    return recommendationService.getRelated(articleId);
}, tracingExecutor);

The Context.taskWrapping() wrapper captures the current context at submission time and restores it when the task executes. This is essential for any application that uses thread pools for parallel execution. Without it, async spans are disconnected from their parent traces.

Trade-Offs

Instrumentation granularity vs noise. Every manual span adds code complexity and trace storage. Instrument operations that take more than 5ms or that you actively need to debug. Remove instrumentation from operations you have already optimized and that are stable. Dead instrumentation is dead code.

Attribute cardinality vs query flexibility. More distinct attribute values means more storage in Jaeger’s backend and slower queries. The content platform has roughly 15,000 article IDs. That is manageable. If you have millions of user IDs, store them as span attributes only on sampled traces, not as metric dimensions.

Event density vs readability. A span with 50 events is harder to read than one with 5. Group related work into phases and emit one event per phase. If individual items within a phase need visibility, use a child span with the item count as an attribute.

Context propagation overhead. Each injected header adds bytes to every request. The traceparent header is 55 bytes. Custom baggage headers for propagating business context (user tier, experiment ID) add more. On the content platform’s internal gRPC calls (average 200-byte payload), trace headers add 27% overhead to the wire format. On HTTP calls with multi-KB payloads, the overhead is negligible. Measure before optimizing.