Index Design: Mappings, Field Types, and the Mistakes You Cannot Fix Without Reindexing
Index Design
A mapping change deployed on a Tuesday breaks search on Wednesday. The team added a new field tags to documentation pages. They did not define it in the mapping. OpenSearch’s dynamic mapping inferred the field type from the first document indexed: a string value, mapped as both text and keyword. By Thursday, a bulk import sends an array of strings for tags. OpenSearch rejects the documents with a mapper_parsing_exception because the dynamic mapping created a single-value text field, not an array-compatible structure.
The mapping was never designed. It was inferred. Inference is not design. Inference creates production debt.
Dynamic Mapping Is a Development Convenience, Not a Production Strategy
OpenSearch dynamically infers field types from the first document indexed. A string becomes text with a keyword sub-field. A number becomes long. A boolean becomes boolean. An ISO-8601 string becomes date. This is convenient for development and catastrophic for production.
The problems with dynamic mapping:
- Type conflicts. The first document determines the type. If document 1 sends
"version": "3.0"(text/keyword) and document 2 sends"version": 3(long), document 2 is rejected. - Wasted resources. Dynamic mapping creates both text and keyword sub-fields for every string. For fields that will never be searched (like internal IDs), the text field wastes inverted index space. For fields that will never be aggregated, the keyword sub-field wastes doc values space.
- Mapping explosion. Log-like documents with dynamic keys (e.g., HTTP headers, arbitrary metadata) create a new field mapping for every unique key. At 10,000 unique field names, mapping updates become a cluster-level bottleneck.
// FRAGILE: Dynamic mapping enabled (the default)
// Every new field in the document creates a mapping entry automatically.
// The 500th unique field name triggers the default field limit of 1000.
CreateIndexRequest request = CreateIndexRequest.of(idx -> idx
.index("docs-v1")
.mappings(m -> m
.dynamic(DynamicMapping.True) // default, shown explicitly
.properties("title", p -> p.text(t -> t.analyzer("standard")))
// No other fields defined. Everything else is inferred.
)
);
// HARDENED: Dynamic mapping disabled with strict enforcement
// Unknown fields are rejected at index time, not silently inferred.
CreateIndexRequest request = CreateIndexRequest.of(idx -> idx
.index("docs-v1")
.mappings(m -> m
.dynamic(DynamicMapping.Strict)
.properties("tenant_id", p -> p.keyword(k -> k))
.properties("title", p -> p.text(t -> t
.analyzer("code_analyzer")
.fields("exact", f -> f.keyword(k -> k.ignoreAbove(512)))
))
.properties("body", p -> p.text(t -> t.analyzer("code_analyzer")))
.properties("code_snippets", p -> p.text(t -> t.analyzer("whitespace")))
.properties("api_method", p -> p.keyword(k -> k))
.properties("version", p -> p.keyword(k -> k))
.properties("content_type", p -> p.keyword(k -> k))
.properties("tags", p -> p.keyword(k -> k))
.properties("created_at", p -> p.date(d -> d
.format("strict_date_optional_time||epoch_millis")
))
.properties("metadata", p -> p.object(o -> o
.dynamic(DynamicMapping.False) // Accept metadata, don't index it
))
)
);
With DynamicMapping.Strict, any document containing a field not defined in the mapping is rejected with an error. The developer must add the field to the mapping first. This makes mapping changes intentional, reviewable, and testable.
The metadata field uses DynamicMapping.False: OpenSearch stores the content in _source but does not index it. This is appropriate for fields that need to be returned in search results but never searched or aggregated.
What Can Be Changed Without Reindexing
The mapping API (PUT /{index}/_mapping) allows adding new fields to an existing mapping. It does not allow:
- Changing the type of an existing field
- Changing the analyzer of an existing field
- Removing a field from the mapping
- Changing the number of primary shards
Changes that require reindexing:
| Change | Reindex Required | Reason |
|---|---|---|
| Add new field | No | New field only affects new documents |
| Change field type | Yes | Existing tokens/doc values are type-specific |
| Change analyzer | Yes | Existing tokens were produced by old analyzer |
| Add sub-field to existing field | No | New sub-field indexes alongside existing |
Change ignore_above on keyword | No | Only affects new documents |
Change number_of_shards | Yes | Routing formula changes |
Change number_of_replicas | No | Replicas are copies of existing shards |
Change refresh_interval | No | Setting applies to new refreshes |
Index Templates
The documentation platform creates new indices regularly (for zero-downtime reindexing, for new tenants in an index-per-tenant model). Index templates ensure every new index gets the correct mapping:
// HARDENED: Index template with component templates for reuse
// Component template for the analysis configuration
client.cluster().putComponentTemplate(ct -> ct
.name("docs-analysis")
.template(t -> t
.settings(s -> s
.analysis(a -> a
.analyzer("code_analyzer", an -> an
.custom(c -> c
.tokenizer("code_tokenizer")
.filter("lowercase", "camel_case_split")
)
)
.tokenizer("code_tokenizer", tok -> tok
.definition(d -> d.pattern(p -> p.pattern("[.\\s(){}\\[\\];,<>]")))
)
.filter("camel_case_split", f -> f
.definition(d -> d.wordDelimiterGraph(w -> w
.generateWordParts(true)
.splitOnCaseChange(true)
.preserveOriginal(true)
))
)
)
)
)
);
// Index template composing the component template
client.indices().putIndexTemplate(it -> it
.name("docs-template")
.indexPatterns("docs-*")
.composedOf("docs-analysis")
.template(t -> t
.settings(s -> s
.numberOfShards("3")
.numberOfReplicas("1")
.refreshInterval(ri -> ri.time("5s"))
)
.mappings(m -> m
.dynamic(DynamicMapping.Strict)
.properties("tenant_id", p -> p.keyword(k -> k))
.properties("title", p -> p.text(tx -> tx
.analyzer("code_analyzer")
.fields("exact", f -> f.keyword(k -> k.ignoreAbove(512)))
))
.properties("body", p -> p.text(tx -> tx.analyzer("code_analyzer")))
.properties("version", p -> p.keyword(k -> k))
.properties("content_type", p -> p.keyword(k -> k))
)
)
);
Any index created with a name matching docs-* automatically inherits the analysis settings, mappings, and shard configuration. No manual mapping creation required. No inconsistency between indices.
The field type decision tree starts with one question: will this field be searched as full text? If yes, use text with the appropriate analyzer and add a keyword sub-field if aggregation or sorting is also needed. If no, the next question is whether the field needs exact matching, aggregation, or sorting, which leads to keyword, integer/long, date, or boolean. For structured objects that should be queryable independently, use nested. For structured objects that are stored but not independently queryable, use object with dynamic mapping disabled.
The Decision Rule
Disable dynamic mapping (dynamic: strict) in every production index. Treat mapping changes as schema migrations: define them explicitly, test them with Testcontainers, and deploy them through the same review process as code changes.
Use index templates for any index that will be created more than once. The template is the single source of truth for the index schema.
Prefer keyword over text for fields that are filtered, sorted, or aggregated but never full-text searched. Prefer text with a keyword sub-field for fields that need both full-text search and exact match or aggregation capability. Never use text alone for a field that will be aggregated, as this forces fielddata into heap memory.