Skip to main content
search at depth

Index Design: Mappings, Field Types, and the Mistakes You Cannot Fix Without Reindexing

6 min read Chapter 13 of 60

Index Design

A mapping change deployed on a Tuesday breaks search on Wednesday. The team added a new field tags to documentation pages. They did not define it in the mapping. OpenSearch’s dynamic mapping inferred the field type from the first document indexed: a string value, mapped as both text and keyword. By Thursday, a bulk import sends an array of strings for tags. OpenSearch rejects the documents with a mapper_parsing_exception because the dynamic mapping created a single-value text field, not an array-compatible structure.

The mapping was never designed. It was inferred. Inference is not design. Inference creates production debt.

Dynamic Mapping Is a Development Convenience, Not a Production Strategy

OpenSearch dynamically infers field types from the first document indexed. A string becomes text with a keyword sub-field. A number becomes long. A boolean becomes boolean. An ISO-8601 string becomes date. This is convenient for development and catastrophic for production.

The problems with dynamic mapping:

  1. Type conflicts. The first document determines the type. If document 1 sends "version": "3.0" (text/keyword) and document 2 sends "version": 3 (long), document 2 is rejected.
  2. Wasted resources. Dynamic mapping creates both text and keyword sub-fields for every string. For fields that will never be searched (like internal IDs), the text field wastes inverted index space. For fields that will never be aggregated, the keyword sub-field wastes doc values space.
  3. Mapping explosion. Log-like documents with dynamic keys (e.g., HTTP headers, arbitrary metadata) create a new field mapping for every unique key. At 10,000 unique field names, mapping updates become a cluster-level bottleneck.
// FRAGILE: Dynamic mapping enabled (the default)
// Every new field in the document creates a mapping entry automatically.
// The 500th unique field name triggers the default field limit of 1000.

CreateIndexRequest request = CreateIndexRequest.of(idx -> idx
    .index("docs-v1")
    .mappings(m -> m
        .dynamic(DynamicMapping.True)  // default, shown explicitly
        .properties("title", p -> p.text(t -> t.analyzer("standard")))
        // No other fields defined. Everything else is inferred.
    )
);
// HARDENED: Dynamic mapping disabled with strict enforcement
// Unknown fields are rejected at index time, not silently inferred.

CreateIndexRequest request = CreateIndexRequest.of(idx -> idx
    .index("docs-v1")
    .mappings(m -> m
        .dynamic(DynamicMapping.Strict)
        .properties("tenant_id", p -> p.keyword(k -> k))
        .properties("title", p -> p.text(t -> t
            .analyzer("code_analyzer")
            .fields("exact", f -> f.keyword(k -> k.ignoreAbove(512)))
        ))
        .properties("body", p -> p.text(t -> t.analyzer("code_analyzer")))
        .properties("code_snippets", p -> p.text(t -> t.analyzer("whitespace")))
        .properties("api_method", p -> p.keyword(k -> k))
        .properties("version", p -> p.keyword(k -> k))
        .properties("content_type", p -> p.keyword(k -> k))
        .properties("tags", p -> p.keyword(k -> k))
        .properties("created_at", p -> p.date(d -> d
            .format("strict_date_optional_time||epoch_millis")
        ))
        .properties("metadata", p -> p.object(o -> o
            .dynamic(DynamicMapping.False)  // Accept metadata, don't index it
        ))
    )
);

With DynamicMapping.Strict, any document containing a field not defined in the mapping is rejected with an error. The developer must add the field to the mapping first. This makes mapping changes intentional, reviewable, and testable.

The metadata field uses DynamicMapping.False: OpenSearch stores the content in _source but does not index it. This is appropriate for fields that need to be returned in search results but never searched or aggregated.

What Can Be Changed Without Reindexing

The mapping API (PUT /{index}/_mapping) allows adding new fields to an existing mapping. It does not allow:

  • Changing the type of an existing field
  • Changing the analyzer of an existing field
  • Removing a field from the mapping
  • Changing the number of primary shards

Changes that require reindexing:

ChangeReindex RequiredReason
Add new fieldNoNew field only affects new documents
Change field typeYesExisting tokens/doc values are type-specific
Change analyzerYesExisting tokens were produced by old analyzer
Add sub-field to existing fieldNoNew sub-field indexes alongside existing
Change ignore_above on keywordNoOnly affects new documents
Change number_of_shardsYesRouting formula changes
Change number_of_replicasNoReplicas are copies of existing shards
Change refresh_intervalNoSetting applies to new refreshes

Index Templates

The documentation platform creates new indices regularly (for zero-downtime reindexing, for new tenants in an index-per-tenant model). Index templates ensure every new index gets the correct mapping:

// HARDENED: Index template with component templates for reuse

// Component template for the analysis configuration
client.cluster().putComponentTemplate(ct -> ct
    .name("docs-analysis")
    .template(t -> t
        .settings(s -> s
            .analysis(a -> a
                .analyzer("code_analyzer", an -> an
                    .custom(c -> c
                        .tokenizer("code_tokenizer")
                        .filter("lowercase", "camel_case_split")
                    )
                )
                .tokenizer("code_tokenizer", tok -> tok
                    .definition(d -> d.pattern(p -> p.pattern("[.\\s(){}\\[\\];,<>]")))
                )
                .filter("camel_case_split", f -> f
                    .definition(d -> d.wordDelimiterGraph(w -> w
                        .generateWordParts(true)
                        .splitOnCaseChange(true)
                        .preserveOriginal(true)
                    ))
                )
            )
        )
    )
);

// Index template composing the component template
client.indices().putIndexTemplate(it -> it
    .name("docs-template")
    .indexPatterns("docs-*")
    .composedOf("docs-analysis")
    .template(t -> t
        .settings(s -> s
            .numberOfShards("3")
            .numberOfReplicas("1")
            .refreshInterval(ri -> ri.time("5s"))
        )
        .mappings(m -> m
            .dynamic(DynamicMapping.Strict)
            .properties("tenant_id", p -> p.keyword(k -> k))
            .properties("title", p -> p.text(tx -> tx
                .analyzer("code_analyzer")
                .fields("exact", f -> f.keyword(k -> k.ignoreAbove(512)))
            ))
            .properties("body", p -> p.text(tx -> tx.analyzer("code_analyzer")))
            .properties("version", p -> p.keyword(k -> k))
            .properties("content_type", p -> p.keyword(k -> k))
        )
    )
);

Any index created with a name matching docs-* automatically inherits the analysis settings, mappings, and shard configuration. No manual mapping creation required. No inconsistency between indices.

Decision tree for field type selection showing the path from data characteristics to the correct OpenSearch field type

The field type decision tree starts with one question: will this field be searched as full text? If yes, use text with the appropriate analyzer and add a keyword sub-field if aggregation or sorting is also needed. If no, the next question is whether the field needs exact matching, aggregation, or sorting, which leads to keyword, integer/long, date, or boolean. For structured objects that should be queryable independently, use nested. For structured objects that are stored but not independently queryable, use object with dynamic mapping disabled.

The Decision Rule

Disable dynamic mapping (dynamic: strict) in every production index. Treat mapping changes as schema migrations: define them explicitly, test them with Testcontainers, and deploy them through the same review process as code changes.

Use index templates for any index that will be created more than once. The template is the single source of truth for the index schema.

Prefer keyword over text for fields that are filtered, sorted, or aggregated but never full-text searched. Prefer text with a keyword sub-field for fields that need both full-text search and exact match or aggregation capability. Never use text alone for a field that will be aggregated, as this forces fielddata into heap memory.