Skip to main content
unbound mongodb at scale

Write Concern Interaction with Retryable Writes

5 min read Chapter 60 of 72

Write Concern Interaction with Retryable Writes

The Symptom

During a primary election (triggered by a rolling restart), the telemetry platform’s ingestion layer throws MongoNotPrimaryException errors. After the election completes (15-30 seconds), some sensor readings appear twice in the database.

The Cause

When the primary steps down, in-flight writes receive a NotPrimaryError. The driver retries these writes on the new primary. But the old primary may have already applied the write before stepping down. The write exists on the old primary (now a secondary) and will replicate to the new primary. The retried write creates a duplicate.

MongoDB 4.2+ includes retryable writes by default. Retryable writes use a server-side transaction ID to ensure that a retried operation is idempotent: if the server already applied a write with the same transaction ID, it returns success without applying it again.

However, retryable writes only work for specific operations:

OperationRetryableNotes
insertOneYesIdempotent via transaction ID
insertMany (ordered)YesRetries from the failed point
updateOneYesSame update applied once
deleteOneYesSame delete applied once
insertMany (unordered)PartialMay retry already-succeeded inserts
updateManyNoCannot guarantee idempotency
deleteManyNoCannot guarantee idempotency
bulkWrite (mixed)PartialEach op retried individually
// Retryable writes are enabled by default in MongoDB Java Driver 4.0+
// Verify in connection string
MongoClientSettings settings = MongoClientSettings.builder()
    .applyConnectionString(new ConnectionString(
        "mongodb://host1,host2,host3/?retryWrites=true"))
    .build();

The duplicate readings occurred because the application used insertMany with ordered: false:

// SLOW: Unordered insertMany can produce duplicates during failover
collection.insertMany(readings, new InsertManyOptions().ordered(false));
// If the primary steps down after inserting 3 of 5 documents,
// the retry inserts all 5 documents on the new primary.
// Documents 1-3 exist on both old and new primary -> duplicates after replication.

The Benchmark

ScenarioDuplicates during failoverThroughputLatency during election
insertMany unordered, no dedup50-500 per failover22,000/s15-30s errors, then recovery
insertOne loop (retryable)018,000/s15-30s errors, auto-retry
insertMany ordered (retryable)020,000/s15-30s errors, retry from failure point
insertMany unordered + unique index0 (deduped by index)21,500/s15-30s errors, DuplicateKeyException on dupes

The Fix

Option 1: Use ordered insertMany (retryable).

// FAST: Ordered insertMany retries correctly during failover
public void ingestBatch(List<Document> readings) {
    try {
        collection.insertMany(readings, new InsertManyOptions().ordered(true));
    } catch (MongoBulkWriteException e) {
        // Some documents may have succeeded before the error
        int insertedCount = e.getWriteResult().getInsertedCount();
        if (insertedCount < readings.size()) {
            // Retry remaining documents
            List<Document> remaining = readings.subList(insertedCount, readings.size());
            collection.insertMany(remaining, new InsertManyOptions().ordered(true));
        }
    }
}

Option 2: Add a unique compound index for natural deduplication.

// FAST: Unique index prevents duplicates regardless of retry behavior
collection.createIndex(
    Indexes.compoundIndex(
        Indexes.ascending("sensorId"),
        Indexes.ascending("ts"),
        Indexes.ascending("seqNum")  // sequence number within same timestamp
    ),
    new IndexOptions().unique(true)
);

// Insert with duplicate handling
public void ingestBatch(List<Document> readings) {
    try {
        collection.insertMany(readings,
            new InsertManyOptions().ordered(false));
    } catch (MongoBulkWriteException e) {
        // Filter out duplicate key errors (expected during retry)
        long realErrors = e.getWriteErrors().stream()
            .filter(err -> err.getCode() != 11000)  // 11000 = DuplicateKey
            .count();
        if (realErrors > 0) {
            throw e;  // Non-duplicate errors are real failures
        }
        // All errors are duplicates -> safe to ignore
    }
}

Option 3: Use the driver’s built-in retry with insertOne in a loop.

// FAST: Individual retryable inserts (lower throughput but zero duplicates)
public void ingestBatch(List<Document> readings) {
    for (Document reading : readings) {
        try {
            collection.insertOne(reading);
        } catch (MongoException e) {
            if (e.hasErrorLabel(MongoException.TRANSIENT_TRANSACTION_ERROR_LABEL)) {
                // Transient error, driver will retry automatically
                throw e;
            }
            logger.error("Insert failed for sensor {}: {}",
                reading.getString("sensorId"), e.getMessage());
        }
    }
}

The Proof

After adding the unique compound index and handling duplicate key errors:

MetricBeforeAfter
Duplicates per failover50-5000
Ingestion throughput22,000/s21,500/s (2.3% reduction)
Failover recovery time15-30s15-30s (unchanged)
Storage waste from duplicates0.02% per failover0%

The Trade-off

The unique index adds write overhead: every insert must check uniqueness before inserting. On a collection with 500 million documents, the unique index check adds approximately 0.5ms per insert. For the telemetry platform at 50,000 inserts/second, this is 25 CPU-seconds per second of additional index checking across the cluster.

Ordered insertMany is slower than unordered because it serializes inserts. The driver cannot batch and parallelize. For the telemetry platform’s batch size of 100 documents, ordered inserts are 10-15% slower than unordered.

The natural deduplication approach (unique index + ignore duplicate key errors) is the most robust. It handles duplicates from any source: driver retries, application retries, or upstream duplicate messages. The cost is the unique index overhead, which is negligible for the telemetry workload where the compound key {sensorId, ts, seqNum} already exists for query performance.

updateMany and deleteMany remain non-retryable. For these operations, the application must implement its own idempotency. Use a “processed” flag or a separate tracking collection to record which operations completed, and check before re-executing.