Combining OpenSearch with Application-Layer Ranking

The Symptom

The documentation platform’s ranking logic grows complex. The function_score query now includes: recency decay, content type boosting, view count popularity, tenant-specific field weights, A/B test variant scoring, and a Painless script that checks if the document is bookmarked by the current user. The query DSL is 200 lines long. Changes require redeploying the OpenSearch index template. Testing requires a running OpenSearch instance. The scoring logic is untestable in unit tests.

The Internals

OpenSearch is a candidate retrieval system, not a complete ranking system. Its strength is efficiently finding the top 100-1000 relevant documents from millions. The final ranking—which of those 100 candidates to show first—often depends on signals that OpenSearch does not have: user preferences, session context, A/B test variants, and business rules.

The two-phase architecture:

Phase 1: Candidate Retrieval (OpenSearch). BM25 + field boosting retrieves the top N candidates (N = 100-500). This phase uses the inverted index efficiently.
Phase 2: Reranking (Application Layer). The application applies business-specific scoring to the N candidates. This phase runs in application memory with access to user context, feature stores, and business rules.

The Implementation

Two-Phase Search Service

public class TwoPhaseSearchService {

    private final OpenSearchClient client;
    private final ApplicationRanker ranker;
    private static final int CANDIDATE_POOL_SIZE = 200;

    public TwoPhaseSearchService(OpenSearchClient client,
            ApplicationRanker ranker) {
        this.client = client;
        this.ranker = ranker;
    }

    public List<RankedResult> search(SearchContext context) throws IOException {
        // Phase 1: Retrieve candidates from OpenSearch
        // Keep the query simple: BM25 relevance + filters only
        var response = client.search(s -> s
            .index("docs-" + context.tenantId())
            .query(q -> q.bool(b -> b
                .must(mu -> mu.multiMatch(mm -> mm
                    .query(context.query())
                    .fields("title^3", "body", "code_snippets")
                    .type(TextQueryType.CrossFields)
                ))
                .filter(f -> f.term(t -> t
                    .field("status").value("published")))
            ))
            .source(src -> src.filter(f -> f.includes(
                "title", "slug", "content_type", "version",
                "published_date", "view_count", "tags")))
            .size(CANDIDATE_POOL_SIZE),
            DocPage.class
        );

        List<Candidate> candidates = response.hits().hits().stream()
            .map(hit -> new Candidate(
                hit.source(),
                hit.score()  // BM25 score from Phase 1
            ))
            .toList();

        // Phase 2: Application-layer reranking
        return ranker.rerank(candidates, context);
    }
}

Application-Layer Ranker

public class ApplicationRanker {

    private final UserPreferenceService userPreferences;

    public ApplicationRanker(UserPreferenceService userPreferences) {
        this.userPreferences = userPreferences;
    }

    public record Candidate(DocPage doc, double bm25Score) {}

    public record RankedResult(DocPage doc, double finalScore,
            Map<String, Double> scoreBreakdown) {}

    public List<RankedResult> rerank(List<Candidate> candidates,
            SearchContext context) {

        UserPreferences prefs = userPreferences.get(context.userId());

        return candidates.stream()
            .map(candidate -> score(candidate, context, prefs))
            .sorted(Comparator.comparingDouble(RankedResult::finalScore)
                .reversed())
            .limit(context.pageSize())
            .toList();
    }

    private RankedResult score(Candidate candidate, SearchContext context,
            UserPreferences prefs) {

        Map<String, Double> breakdown = new LinkedHashMap<>();
        DocPage doc = candidate.doc();

        // Signal 1: BM25 text relevance (normalized to 0-1)
        double textScore = Math.min(candidate.bm25Score() / 20.0, 1.0);
        breakdown.put("text_relevance", textScore);

        // Signal 2: Content type preference
        double typeScore = prefs.contentTypeWeight(doc.contentType());
        breakdown.put("content_type_pref", typeScore);

        // Signal 3: Version match (boost if user's preferred version)
        double versionScore = doc.version().equals(prefs.preferredVersion())
            ? 1.0 : 0.5;
        breakdown.put("version_match", versionScore);

        // Signal 4: Recency (0-1, 1.0 for today, 0.5 for 90 days ago)
        long daysOld = ChronoUnit.DAYS.between(
            doc.publishedDate(), Instant.now());
        double recencyScore = Math.max(0, 1.0 - (daysOld / 180.0));
        breakdown.put("recency", recencyScore);

        // Signal 5: Popularity (log-normalized view count)
        double popularityScore = Math.log1p(doc.viewCount()) /
            Math.log1p(10000);
        breakdown.put("popularity", Math.min(popularityScore, 1.0));

        // Weighted combination
        double finalScore =
            textScore * 0.40 +
            typeScore * 0.15 +
            versionScore * 0.20 +
            recencyScore * 0.10 +
            popularityScore * 0.15;

        return new RankedResult(doc, finalScore, breakdown);
    }
}

The score breakdown in RankedResult is critical for debugging. When a user reports “why is this irrelevant document ranked first?”, the breakdown shows exactly which signal promoted it: high popularity? version match? recency?

Unit Testing the Ranker

// HARDENED: Application-layer ranking is fully unit-testable
// No OpenSearch instance required

@Test
void preferredVersionBoostedAboveOtherVersions() {
    var prefs = new UserPreferences("v3.0",
        Map.of("guide", 1.0, "api-ref", 0.8));

    var v3Doc = new Candidate(docWithVersion("v3.0", 100), 10.0);
    var v2Doc = new Candidate(docWithVersion("v2.0", 100), 10.0);

    var context = new SearchContext("tenant-1", "user-1",
        "authentication", 10);

    var results = ranker.rerank(List.of(v3Doc, v2Doc), context);

    assertTrue(results.get(0).doc().version().equals("v3.0"),
        "v3.0 doc should rank higher due to version preference");
}

The Measurement

Query complexity and testability comparison:

Aspect	All-in-OpenSearch	Two-Phase
Query DSL complexity	200 lines	15 lines
Scoring signals	Limited to indexed fields	Any data source
Unit testable	No (requires cluster)	Yes
Deploy scoring changes	Index template update	Application deploy
Personalization	Painful (script per user)	Native (user context)
Latency (p50)	35ms (1 phase)	28ms (retrieval) + 2ms (rerank) = 30ms

The Decision Rule

Move ranking logic from OpenSearch to the application layer when the scoring depends on user context (preferences, history, session), A/B test variants, or signals from external services (feature store, recommendation engine).

Keep the OpenSearch query simple: BM25 text matching plus filters. The inverted index is optimized for text matching. Business logic scoring in Painless scripts fights against this optimization.

Retrieve 10-20x more candidates than the final result page size. A candidate pool of 200 for a page size of 10 provides sufficient diversity for the reranker to reorder without missing relevant documents.