Skip to main content
search at depth

Write Path Internals: Lucene Segments, the Translog, and Why Bulk Indexing Beats Single Writes

6 min read Chapter 16 of 60

Write Path Internals

The documentation platform onboards a new enterprise tenant with 2 million pages of documentation. The team indexes them one document at a time using the single-document API. After 18 hours, 400,000 documents are indexed. The cluster’s CPU is at 90%. The translog is 12GB. Search queries against the partially-indexed data take 800ms instead of the usual 20ms. The remaining 1.6 million documents will take another 72 hours at this rate.

The problem is not cluster capacity. The problem is that single-document indexing forces a translog fsync, an analysis pass, and a segment buffer allocation per document. Bulk indexing amortizes these costs across hundreds or thousands of documents per request.

The Segment Lifecycle

Lucene segments are the physical storage unit of the inverted index. Understanding their lifecycle explains most write-side performance characteristics.

Creation. When the in-memory indexing buffer is flushed (during a refresh), its contents are written as a new segment to the filesystem. The segment contains a complete inverted index, doc values, stored fields, and term vectors for the documents it holds.

Immutability. Once written, a segment is never modified. Document updates create a new version in a new segment and mark the old version as deleted in the original segment’s liveDocs bitset. Document deletes only set the liveDocs bit; the deleted document’s data remains in the segment until merge.

Merge. The tiered merge policy periodically selects small segments and merges them into larger ones. During merge, deleted documents are physically removed, the inverted index is rebuilt for the merged data, and the new segment replaces the originals. Merge is CPU and I/O intensive.

Deletion. After a successful merge, the original segments are deleted. The disk space is reclaimed. The new merged segment takes their place in the shard’s segment list.

The tiered merge policy in OpenSearch uses these parameters:

ParameterDefaultEffect
max_merge_at_once10Maximum segments merged in one operation
max_merged_segment5GBSegments above this size are never merged
segments_per_tier10Target segment count per size tier
floor_segment2MBSegments smaller than this are rounded up for merge consideration

Bulk Indexing

// HARDENED: Bulk indexing with optimal batch size and error handling

public class BulkDocumentIndexer {

    private final OpenSearchClient client;
    private static final int BATCH_SIZE = 500;

    public BulkDocumentIndexer(OpenSearchClient client) {
        this.client = client;
    }

    public BulkIndexResult indexDocuments(String index, String routing,
            List<DocPage> documents) throws IOException {

        int totalSuccess = 0;
        int totalFailed = 0;
        List<String> errors = new ArrayList<>();

        for (int i = 0; i < documents.size(); i += BATCH_SIZE) {
            List<DocPage> batch = documents.subList(
                i, Math.min(i + BATCH_SIZE, documents.size()));

            BulkRequest.Builder bulkBuilder = new BulkRequest.Builder()
                .index(index)
                .routing(routing)
                .refresh(Refresh.False);

            for (DocPage page : batch) {
                bulkBuilder.operations(op -> op
                    .index(idx -> idx
                        .id(page.tenantId() + ":" + page.slug())
                        .document(page)
                    )
                );
            }

            BulkResponse response = client.bulk(bulkBuilder.build());

            if (response.errors()) {
                for (BulkResponseItem item : response.items()) {
                    if (item.error() != null) {
                        totalFailed++;
                        errors.add(item.id() + ": " + item.error().reason());
                    } else {
                        totalSuccess++;
                    }
                }
            } else {
                totalSuccess += batch.size();
            }
        }

        return new BulkIndexResult(totalSuccess, totalFailed, errors);
    }

    public record BulkIndexResult(
        int successCount,
        int failedCount,
        List<String> errors
    ) {}
}
// FRAGILE: Single-document indexing in a loop
// Each iteration: 1 HTTP request, 1 translog write, 1 analysis pass.
// 2 million documents = 2 million HTTP round trips.

for (DocPage page : documents) {
    client.index(i -> i
        .index("docs-v1")
        .id(page.tenantId() + ":" + page.slug())
        .document(page)
    );
}

Optimal Batch Size

The optimal batch size balances several constraints:

  • Request body size. Bulk requests exceeding 100MB trigger HTTP client timeouts and memory pressure on the coordinating node. Aim for 5-15MB per bulk request.
  • Document count. 500 to 2,000 documents per batch is typical. The exact number depends on document size.
  • Thread pool saturation. The write thread pool has a finite queue (default: 10,000). If bulk requests arrive faster than they can be processed, requests are rejected with HTTP 429.

Refresh Interval Tuning

The refresh interval controls how frequently new segments are created from the in-memory buffer, making recently-indexed documents searchable.

// HARDENED: Tuned refresh interval for bulk import
// Reduce segment creation during high-write phases.
// Reset to production value after import completes.

// During bulk import: extend refresh interval
client.indices().putSettings(ps -> ps
    .index("docs-v1")
    .settings(s -> s.refreshInterval(ri -> ri.time("30s")))
);

// Perform bulk indexing...
bulkIndexer.indexDocuments("docs-v1", tenantId, allDocuments);

// After import: force refresh and reset interval
client.indices().refresh(r -> r.index("docs-v1"));
client.indices().putSettings(ps -> ps
    .index("docs-v1")
    .settings(s -> s.refreshInterval(ri -> ri.time("5s")))
);

During normal operation with refresh_interval: 1s, each second creates a new segment. During a bulk import of 2 million documents at 2,000 docs/second, that means 1,000 segments created in ~17 minutes of indexing, before any merges. Setting the refresh interval to 30s during import reduces segment creation to ~33 segments, dramatically reducing merge overhead.

Segment lifecycle diagram showing document write to translog, buffer flush to segment, and tiered merge consolidation

The diagram traces a document from write request to durable, searchable state. The document first enters the translog (durable, not searchable), then the in-memory buffer (durable via translog, not searchable), then a new segment on refresh (searchable, durable via translog), and finally a fsynced segment on flush (searchable, durable on disk, translog truncated). Merge operations consolidate multiple small segments into fewer large ones, physically removing deleted documents in the process.

The Decision Rule

Use bulk indexing for any operation that indexes more than 10 documents. The per-document HTTP, translog, and analysis overhead of single-document indexing is measurable above this threshold.

Set refresh_interval to 30s or -1 (disabled) during bulk imports. Reset to the production value (1s to 5s) after the import completes and force a refresh. The search visibility delay during import is acceptable; the segment explosion from per-second refreshes during high write volume is not.

Size bulk requests to 5-15MB. Below 5MB, the per-request overhead is not fully amortized. Above 15MB, coordinating node memory pressure and HTTP timeout risk increase.