Nested vs Object Fields and the Query Cost of Each
Nested vs Object Fields and the Query Cost of Each
The Symptom
The documentation platform stores metadata about each code example within a document: the language, the framework version it targets, and whether it has been verified. A search for “Java code examples targeting Spring Boot 3.2” returns documents that have Java code examples targeting Spring Boot 2.7 and other examples targeting Spring Boot 3.2 in Python. The fields from different array elements are cross-matched.
The Internals
OpenSearch stores JSON objects in two fundamentally different ways, and the choice determines whether array elements maintain their internal associations.
Object fields flatten nested JSON into dot-notation key-value pairs. Given:
{
"code_examples": [
{ "language": "java", "framework_version": "3.2" },
{ "language": "python", "framework_version": "2.7" }
]
}
OpenSearch internally stores this as:
{
"code_examples.language": ["java", "python"],
"code_examples.framework_version": ["3.2", "2.7"]
}
The association between language: java and framework_version: 3.2 is lost. A query for documents where code_examples.language = java AND code_examples.framework_version = 2.7 matches this document, even though no single code example has that combination.
Nested fields store each array element as a hidden Lucene document, maintaining the association between fields within each element. The parent document and its nested documents are stored in the same Lucene block, and a nested query can match against individual array elements independently.
// FRAGILE: Object field for structured array data
// Cross-matching between array elements produces false positives.
.properties("code_examples", p -> p.object(o -> o
.properties("language", pp -> pp.keyword(k -> k))
.properties("framework_version", pp -> pp.keyword(k -> k))
.properties("verified", pp -> pp.boolean_(b -> b))
))
// HARDENED: Nested field preserves per-element associations
// Each code example is queryable independently.
.properties("code_examples", p -> p.nested(n -> n
.properties("language", pp -> pp.keyword(k -> k))
.properties("framework_version", pp -> pp.keyword(k -> k))
.properties("verified", pp -> pp.boolean_(b -> b))
))
The Implementation
Querying nested fields requires the nested query wrapper:
// HARDENED: Nested query targeting a specific array element combination
SearchRequest request = SearchRequest.of(s -> s
.index("docs-v1")
.query(q -> q
.bool(b -> b
.must(mu -> mu.match(m -> m.field("body").query(userQuery)))
.filter(f -> f
.nested(n -> n
.path("code_examples")
.query(nq -> nq
.bool(nb -> nb
.must(nm -> nm.term(t -> t
.field("code_examples.language").value("java")))
.must(nm -> nm.term(t -> t
.field("code_examples.framework_version").value("3.2")))
)
)
)
)
)
)
);
/*
Equivalent JSON:
{
"query": {
"bool": {
"must": { "match": { "body": "user query" } },
"filter": {
"nested": {
"path": "code_examples",
"query": {
"bool": {
"must": [
{ "term": { "code_examples.language": "java" } },
{ "term": { "code_examples.framework_version": "3.2" } }
]
}
}
}
}
}
}
}
*/
The Measurement
The hidden cost of nested documents is in document count. Each nested object creates a hidden Lucene document. A documentation page with 15 code examples produces 16 Lucene documents (1 parent + 15 nested). An index with 100,000 documentation pages, each averaging 10 code examples, contains 1,100,000 Lucene documents, not 100,000.
| Metric | Object Field | Nested Field (10 elements avg) |
|---|---|---|
| Lucene doc count | 100,000 | 1,100,000 |
| Segment size | ~5GB | ~8GB |
| Simple match query latency | 12ms | 14ms |
| Nested filter query latency | N/A | 22ms |
| Heap per shard (field data) | 200MB | 350MB |
The nested query adds approximately 8-10ms of latency because it must join the nested document matches back to their parent documents. The segment size increase is proportional to the number of nested elements.
The Decision Rule
Use nested when array elements have multiple fields that must be queried in combination and false cross-matches would produce incorrect results. The code examples use case is a clear fit: users filter by language AND framework version, and cross-matches return wrong results.
Use object when array elements have a single field or when cross-matching is acceptable. A tags array of strings, for example, does not need nested because there is no internal structure to cross-match.
Avoid nested fields when the average number of nested elements per document exceeds 50. At that scale, the hidden document count inflates segment sizes and query latency beyond what most applications can tolerate. Consider denormalizing into a keyword array with concatenated values (e.g., "java:3.2") as an alternative.