Function Score Patterns for Documentation Search

The Symptom

The documentation platform returns a three-year-old tutorial above a recently-published guide covering the same topic with current API versions. BM25 scores are nearly identical because both documents use similar terminology. The older document has slightly higher term frequency for the relevant keywords, giving it a marginal BM25 advantage. The user sees outdated information first.

The Internals

BM25 scores documents based on text relevance alone. It does not consider when the document was written, how many users found it useful, or whether it is a reference guide or a changelog entry. Function score modifies the BM25 score using document metadata, injecting business signals into the ranking.

Function score operates in three steps:

Execute the wrapped query and compute the BM25 score
Evaluate each scoring function and produce a function-specific score
Combine the function scores (using score_mode) and then combine with the BM25 score (using boost_mode)

score_mode determines how multiple functions combine: multiply, sum, avg, first, max, min.

boost_mode determines how the combined function score interacts with the query score: multiply, replace, sum, avg, max, min.

The most common pattern is score_mode: sum and boost_mode: multiply. The function scores sum to a single modifier, which multiplies the BM25 score. A modifier of 1.0 leaves the score unchanged. A modifier above 1.0 boosts the document. A modifier below 1.0 demotes it.

The Implementation

Recency Boost with Decay Function

// HARDENED: Exponential decay on document age
// Documents updated within 30 days get full score.
// Score decays by 50% every 90 days beyond the 30-day offset.

FunctionScore recencyBoost = FunctionScore.of(fn -> fn
    .filter(f -> f.exists(e -> e.field("updated_at")))
    .exp(d -> d
        .field("updated_at")
        .placement(p -> p
            .origin(JsonData.of("now"))
            .offset(JsonData.of("30d"))
            .scale(JsonData.of("90d"))
            .decay(0.5)
        )
    )
    .weight(2.0)
);

The offset parameter creates a flat zone: documents updated within the last 30 days all receive the full weight. Beyond 30 days, the exponential decay reduces the score. This prevents yesterday’s update from ranking above today’s update due to a tiny recency difference.

Content Type Priority

// HARDENED: Weight by content type relevance
// API references and guides score higher than changelogs

FunctionScore contentTypePriority = FunctionScore.of(fn -> fn
    .filter(f -> f.terms(t -> t
        .field("content_type")
        .terms(tv -> tv.value(List.of(
            FieldValue.of("api_reference"),
            FieldValue.of("guide")
        )))
    ))
    .weight(1.5)
);

FunctionScore changelogDemotion = FunctionScore.of(fn -> fn
    .filter(f -> f.term(t -> t
        .field("content_type")
        .value("changelog")
    ))
    .weight(0.3)
);

Popularity Signal with Diminishing Returns

// HARDENED: View count boost with logarithmic scaling
// Prevents viral pages from permanently occupying top results

FunctionScore popularityBoost = FunctionScore.of(fn -> fn
    .fieldValueFactor(fvf -> fvf
        .field("view_count")
        .modifier(FieldValueFactorModifier.Log1p)
        .factor(0.1)
        .missing(1.0)
    )
    .weight(0.5)
);

The Log1p modifier applies $\log(1 + \text{factor} \times \text{field_value})$. With factor: 0.1:

0 views: $\log(1 + 0) = 0$
100 views: $\log(1 + 10) = 2.4$
10,000 views: $\log(1 + 1000) = 6.9$

The ratio between 0 and 100 views (2.4x difference) is much larger than the ratio between 100 and 10,000 views (2.9x additional). This prevents a single popular page from dominating all queries.

Complete Function Score Query

// HARDENED: Production query combining all scoring signals

Query productionQuery = Query.of(q -> q
    .functionScore(fs -> fs
        .query(fq -> fq
            .bool(b -> b
                .filter(f -> f.term(t -> t.field("tenant_id").value(tenantId)))
                .filter(f -> f.term(t -> t.field("version").value(version)))
                .must(mu -> mu.multiMatch(mm -> mm
                    .query(userQuery)
                    .fields("title^3", "body", "code_snippets^0.5", "api_method^5")
                    .type(TextQueryType.CrossFields)
                ))
            )
        )
        .functions(recencyBoost, contentTypePriority, changelogDemotion, popularityBoost)
        .scoreMode(FunctionScoreMode.Sum)
        .boostMode(FunctionBoostMode.Multiply)
    )
);

The Measurement

Run the query test set before and after enabling function score:

Category	NDCG@5 (BM25 only)	NDCG@5 (with function score)	Change
Method name	0.82	0.84	+0.02
Concept	0.71	0.78	+0.07
Error message	0.68	0.69	+0.01
Config key	0.79	0.80	+0.01
How-to	0.65	0.73	+0.08
Overall	0.73	0.77	+0.04

The largest improvements are in concept and how-to queries, where recency and content type signals disambiguate between multiple BM25-similar results. Method name and config key queries are already precise from BM25 alone and show minimal change.

The Decision Rule

Apply function score only when BM25 alone produces ambiguous rankings for a measurable portion of queries. If BM25 produces clear winners for 90% of the query test set, function score adds complexity without measurable benefit.

Use decay functions for recency when the documentation platform serves actively-maintained software with frequent updates. Use offset to create a flat zone that prevents irrelevant day-to-day score variations.

Always test function score changes against the query test set. A function score that improves concept queries but degrades method name queries is not an improvement; it is a trade-off that must be evaluated holistically.