Write Concern Interaction with Retryable Writes

The Symptom

During a primary election (triggered by a rolling restart), the telemetry platform’s ingestion layer throws MongoNotPrimaryException errors. After the election completes (15-30 seconds), some sensor readings appear twice in the database.

The Cause

When the primary steps down, in-flight writes receive a NotPrimaryError. The driver retries these writes on the new primary. But the old primary may have already applied the write before stepping down. The write exists on the old primary (now a secondary) and will replicate to the new primary. The retried write creates a duplicate.

MongoDB 4.2+ includes retryable writes by default. Retryable writes use a server-side transaction ID to ensure that a retried operation is idempotent: if the server already applied a write with the same transaction ID, it returns success without applying it again.

However, retryable writes only work for specific operations:

Operation	Retryable	Notes
`insertOne`	Yes	Idempotent via transaction ID
`insertMany` (ordered)	Yes	Retries from the failed point
`updateOne`	Yes	Same update applied once
`deleteOne`	Yes	Same delete applied once
`insertMany` (unordered)	Partial	May retry already-succeeded inserts
`updateMany`	No	Cannot guarantee idempotency
`deleteMany`	No	Cannot guarantee idempotency
`bulkWrite` (mixed)	Partial	Each op retried individually

// Retryable writes are enabled by default in MongoDB Java Driver 4.0+
// Verify in connection string
MongoClientSettings settings = MongoClientSettings.builder()
    .applyConnectionString(new ConnectionString(
        "mongodb://host1,host2,host3/?retryWrites=true"))
    .build();

The duplicate readings occurred because the application used insertMany with ordered: false:

// SLOW: Unordered insertMany can produce duplicates during failover
collection.insertMany(readings, new InsertManyOptions().ordered(false));
// If the primary steps down after inserting 3 of 5 documents,
// the retry inserts all 5 documents on the new primary.
// Documents 1-3 exist on both old and new primary -> duplicates after replication.

The Benchmark

Scenario	Duplicates during failover	Throughput	Latency during election
`insertMany` unordered, no dedup	50-500 per failover	22,000/s	15-30s errors, then recovery
`insertOne` loop (retryable)	0	18,000/s	15-30s errors, auto-retry
`insertMany` ordered (retryable)	0	20,000/s	15-30s errors, retry from failure point
`insertMany` unordered + unique index	0 (deduped by index)	21,500/s	15-30s errors, DuplicateKeyException on dupes

The Fix

Option 1: Use ordered insertMany (retryable).

// FAST: Ordered insertMany retries correctly during failover
public void ingestBatch(List<Document> readings) {
    try {
        collection.insertMany(readings, new InsertManyOptions().ordered(true));
    } catch (MongoBulkWriteException e) {
        // Some documents may have succeeded before the error
        int insertedCount = e.getWriteResult().getInsertedCount();
        if (insertedCount < readings.size()) {
            // Retry remaining documents
            List<Document> remaining = readings.subList(insertedCount, readings.size());
            collection.insertMany(remaining, new InsertManyOptions().ordered(true));
        }
    }
}

Option 2: Add a unique compound index for natural deduplication.

// FAST: Unique index prevents duplicates regardless of retry behavior
collection.createIndex(
    Indexes.compoundIndex(
        Indexes.ascending("sensorId"),
        Indexes.ascending("ts"),
        Indexes.ascending("seqNum")  // sequence number within same timestamp
    ),
    new IndexOptions().unique(true)
);

// Insert with duplicate handling
public void ingestBatch(List<Document> readings) {
    try {
        collection.insertMany(readings,
            new InsertManyOptions().ordered(false));
    } catch (MongoBulkWriteException e) {
        // Filter out duplicate key errors (expected during retry)
        long realErrors = e.getWriteErrors().stream()
            .filter(err -> err.getCode() != 11000)  // 11000 = DuplicateKey
            .count();
        if (realErrors > 0) {
            throw e;  // Non-duplicate errors are real failures
        }
        // All errors are duplicates -> safe to ignore
    }
}

Option 3: Use the driver’s built-in retry with insertOne in a loop.

// FAST: Individual retryable inserts (lower throughput but zero duplicates)
public void ingestBatch(List<Document> readings) {
    for (Document reading : readings) {
        try {
            collection.insertOne(reading);
        } catch (MongoException e) {
            if (e.hasErrorLabel(MongoException.TRANSIENT_TRANSACTION_ERROR_LABEL)) {
                // Transient error, driver will retry automatically
                throw e;
            }
            logger.error("Insert failed for sensor {}: {}",
                reading.getString("sensorId"), e.getMessage());
        }
    }
}

The Proof

After adding the unique compound index and handling duplicate key errors:

Metric	Before	After
Duplicates per failover	50-500	0
Ingestion throughput	22,000/s	21,500/s (2.3% reduction)
Failover recovery time	15-30s	15-30s (unchanged)
Storage waste from duplicates	0.02% per failover	0%

The Trade-off

The unique index adds write overhead: every insert must check uniqueness before inserting. On a collection with 500 million documents, the unique index check adds approximately 0.5ms per insert. For the telemetry platform at 50,000 inserts/second, this is 25 CPU-seconds per second of additional index checking across the cluster.

Ordered insertMany is slower than unordered because it serializes inserts. The driver cannot batch and parallelize. For the telemetry platform’s batch size of 100 documents, ordered inserts are 10-15% slower than unordered.

The natural deduplication approach (unique index + ignore duplicate key errors) is the most robust. It handles duplicates from any source: driver retries, application retries, or upstream duplicate messages. The cost is the unique index overhead, which is negligible for the telemetry workload where the compound key {sensorId, ts, seqNum} already exists for query performance.

updateMany and deleteMany remain non-retryable. For these operations, the application must implement its own idempotency. Use a “processed” flag or a separate tracking collection to record which operations completed, and check before re-executing.