Combining OpenSearch with Application-Layer Ranking
Combining OpenSearch with Application-Layer Ranking
The Symptom
The documentation platform’s ranking logic grows complex. The function_score query now includes: recency decay, content type boosting, view count popularity, tenant-specific field weights, A/B test variant scoring, and a Painless script that checks if the document is bookmarked by the current user. The query DSL is 200 lines long. Changes require redeploying the OpenSearch index template. Testing requires a running OpenSearch instance. The scoring logic is untestable in unit tests.
The Internals
OpenSearch is a candidate retrieval system, not a complete ranking system. Its strength is efficiently finding the top 100-1000 relevant documents from millions. The final ranking—which of those 100 candidates to show first—often depends on signals that OpenSearch does not have: user preferences, session context, A/B test variants, and business rules.
The two-phase architecture:
-
Phase 1: Candidate Retrieval (OpenSearch). BM25 + field boosting retrieves the top N candidates (N = 100-500). This phase uses the inverted index efficiently.
-
Phase 2: Reranking (Application Layer). The application applies business-specific scoring to the N candidates. This phase runs in application memory with access to user context, feature stores, and business rules.
The Implementation
Two-Phase Search Service
public class TwoPhaseSearchService {
private final OpenSearchClient client;
private final ApplicationRanker ranker;
private static final int CANDIDATE_POOL_SIZE = 200;
public TwoPhaseSearchService(OpenSearchClient client,
ApplicationRanker ranker) {
this.client = client;
this.ranker = ranker;
}
public List<RankedResult> search(SearchContext context) throws IOException {
// Phase 1: Retrieve candidates from OpenSearch
// Keep the query simple: BM25 relevance + filters only
var response = client.search(s -> s
.index("docs-" + context.tenantId())
.query(q -> q.bool(b -> b
.must(mu -> mu.multiMatch(mm -> mm
.query(context.query())
.fields("title^3", "body", "code_snippets")
.type(TextQueryType.CrossFields)
))
.filter(f -> f.term(t -> t
.field("status").value("published")))
))
.source(src -> src.filter(f -> f.includes(
"title", "slug", "content_type", "version",
"published_date", "view_count", "tags")))
.size(CANDIDATE_POOL_SIZE),
DocPage.class
);
List<Candidate> candidates = response.hits().hits().stream()
.map(hit -> new Candidate(
hit.source(),
hit.score() // BM25 score from Phase 1
))
.toList();
// Phase 2: Application-layer reranking
return ranker.rerank(candidates, context);
}
}
Application-Layer Ranker
public class ApplicationRanker {
private final UserPreferenceService userPreferences;
public ApplicationRanker(UserPreferenceService userPreferences) {
this.userPreferences = userPreferences;
}
public record Candidate(DocPage doc, double bm25Score) {}
public record RankedResult(DocPage doc, double finalScore,
Map<String, Double> scoreBreakdown) {}
public List<RankedResult> rerank(List<Candidate> candidates,
SearchContext context) {
UserPreferences prefs = userPreferences.get(context.userId());
return candidates.stream()
.map(candidate -> score(candidate, context, prefs))
.sorted(Comparator.comparingDouble(RankedResult::finalScore)
.reversed())
.limit(context.pageSize())
.toList();
}
private RankedResult score(Candidate candidate, SearchContext context,
UserPreferences prefs) {
Map<String, Double> breakdown = new LinkedHashMap<>();
DocPage doc = candidate.doc();
// Signal 1: BM25 text relevance (normalized to 0-1)
double textScore = Math.min(candidate.bm25Score() / 20.0, 1.0);
breakdown.put("text_relevance", textScore);
// Signal 2: Content type preference
double typeScore = prefs.contentTypeWeight(doc.contentType());
breakdown.put("content_type_pref", typeScore);
// Signal 3: Version match (boost if user's preferred version)
double versionScore = doc.version().equals(prefs.preferredVersion())
? 1.0 : 0.5;
breakdown.put("version_match", versionScore);
// Signal 4: Recency (0-1, 1.0 for today, 0.5 for 90 days ago)
long daysOld = ChronoUnit.DAYS.between(
doc.publishedDate(), Instant.now());
double recencyScore = Math.max(0, 1.0 - (daysOld / 180.0));
breakdown.put("recency", recencyScore);
// Signal 5: Popularity (log-normalized view count)
double popularityScore = Math.log1p(doc.viewCount()) /
Math.log1p(10000);
breakdown.put("popularity", Math.min(popularityScore, 1.0));
// Weighted combination
double finalScore =
textScore * 0.40 +
typeScore * 0.15 +
versionScore * 0.20 +
recencyScore * 0.10 +
popularityScore * 0.15;
return new RankedResult(doc, finalScore, breakdown);
}
}
The score breakdown in RankedResult is critical for debugging. When a user reports “why is this irrelevant document ranked first?”, the breakdown shows exactly which signal promoted it: high popularity? version match? recency?
Unit Testing the Ranker
// HARDENED: Application-layer ranking is fully unit-testable
// No OpenSearch instance required
@Test
void preferredVersionBoostedAboveOtherVersions() {
var prefs = new UserPreferences("v3.0",
Map.of("guide", 1.0, "api-ref", 0.8));
var v3Doc = new Candidate(docWithVersion("v3.0", 100), 10.0);
var v2Doc = new Candidate(docWithVersion("v2.0", 100), 10.0);
var context = new SearchContext("tenant-1", "user-1",
"authentication", 10);
var results = ranker.rerank(List.of(v3Doc, v2Doc), context);
assertTrue(results.get(0).doc().version().equals("v3.0"),
"v3.0 doc should rank higher due to version preference");
}
The Measurement
Query complexity and testability comparison:
| Aspect | All-in-OpenSearch | Two-Phase |
|---|---|---|
| Query DSL complexity | 200 lines | 15 lines |
| Scoring signals | Limited to indexed fields | Any data source |
| Unit testable | No (requires cluster) | Yes |
| Deploy scoring changes | Index template update | Application deploy |
| Personalization | Painful (script per user) | Native (user context) |
| Latency (p50) | 35ms (1 phase) | 28ms (retrieval) + 2ms (rerank) = 30ms |
The Decision Rule
Move ranking logic from OpenSearch to the application layer when the scoring depends on user context (preferences, history, session), A/B test variants, or signals from external services (feature store, recommendation engine).
Keep the OpenSearch query simple: BM25 text matching plus filters. The inverted index is optimized for text matching. Business logic scoring in Painless scripts fights against this optimization.
Retrieve 10-20x more candidates than the final result page size. A candidate pool of 200 for a page size of 10 provides sufficient diversity for the reranker to reorder without missing relevant documents.