Search at Depth
Search at Depth
OpenSearch Internals, Relevance Engineering, and Production Search on the JVM.
This book targets senior Java developers who have used Elasticsearch or OpenSearch as a black box and watched it misbehave under load, return irrelevant results, or fall over during a reindex. Never explains what an HTTP request is. Never defines JSON from scratch. The reader has a running search cluster. This book explains what is happening inside it.
Every chapter uses the same domain: a multi-tenant technical documentation search engine. Multiple clients (tenants) each with versioned documentation, code snippets, API references, and changelog entries. Users search across their tenant's content with relevance expectations shaped by developer experience: exact matches on method names, fuzzy matches on concepts, code-aware tokenization, and faceted filtering by version and content type. Every index design decision, query example, relevance scenario, and failure case refers to this platform.
Four opinions run through every chapter:
OpenSearch is the default. Elasticsearch is referenced where the two diverge meaningfully or where community content is predominantly Elasticsearch-specific. Every code example, cluster configuration, and API call uses OpenSearch 2.x syntax and the OpenSearch Java client. The licensing difference is stated once in chapter 1 and not repeated.
Index design is the decision you cannot undo. Mapping mistakes, wrong field types, and poor shard strategies cannot be fixed without a full reindex. Every chapter that touches index configuration treats it as a permanent architectural decision, not a setting to tune later.
Relevance is an engineering problem, not a configuration problem. Boosting fields randomly and hoping results improve is not relevance tuning. Real relevance engineering requires a test set, a scoring metric, and a repeatable evaluation process. The book builds this discipline from chapter 8 onward and applies it consistently.
Semantic search complements lexical search, it does not replace it. kNN vector search produces confidently wrong results when used alone on a technical documentation domain. The correct architecture combines BM25 and dense vector scoring. The book defends this with benchmark numbers, not opinions.
Code examples use Java 21, Spring Boot 3, and the OpenSearch Java client. Integration tests use Testcontainers. Kafka handles the indexing pipeline. Prometheus and Grafana provide observability. Every chapter follows the same structure: the symptom, the internals, the implementation, the measurement, and the decision rule.
This book was generated using AI assistance.