Skip to main content
unbound mongodb at scale

Ticket Count Tuning and Write-Heavy Workload Balancing

4 min read Chapter 42 of 72

Ticket Count Tuning and Write-Heavy Workload Balancing

The Symptom

The telemetry platform runs a mixed workload: 70% reads (dashboard queries) and 30% writes (sensor ingestion). During peak ingestion (batch imports from offline sensors), the write ratio jumps to 80%. During these batch imports, read ticket availability drops even though reads are using fewer tickets than writes.

Checking ticket utilization during batch import:

db.serverStatus().wiredTiger.concurrentTransactions
{
  "write": { "out": 125, "available": 3, "totalTickets": 128 },
  "read": { "out": 40, "available": 88, "totalTickets": 128 }
}

Write tickets are nearly exhausted (125 out of 128), but read tickets have ample headroom (40 out of 128). The 3 available write tickets mean new write operations wait in a queue. The batch import’s write throughput is limited by the write ticket pool.

The Cause

The default 128/128 split assumes a balanced workload. During batch imports, 128 write tickets are insufficient. Each bulk write operation holds a write ticket while WiredTiger processes the write, updates indexes, and journals the change. At the ingestion rate of 5,000 writes/sec during batch import, with each write taking 3ms, the required concurrent write capacity is:

$$\text{requiredWriteTickets} = 5000 \times 0.003 = 15$$

Fifteen tickets should be sufficient. But the batch import uses bulkWrite with batches of 10,000 documents. Each bulkWrite holds a single write ticket for the entire batch duration (200ms). With 20 concurrent batch writers:

$$\text{ticketsInUse} = 20 \times 1 = 20$$

But each bulkWrite call internally processes sub-batches, and WiredTiger checkpoint and eviction threads also consume write tickets. The effective ticket consumption during batch import includes internal operations.

The Benchmark

Test different ticket configurations under mixed workload:

// k6 test: mixed workload with varying ticket configurations
// Run separately against MongoDB instances configured with different ticket counts

export const options = {
  scenarios: {
    reads: {
      executor: 'constant-arrival-rate',
      rate: 500,
      timeUnit: '1s',
      duration: '3m',
      preAllocatedVUs: 50,
      exec: 'dashboardRead',
    },
    writes: {
      executor: 'constant-arrival-rate',
      rate: 5000,
      timeUnit: '1s',
      duration: '3m',
      preAllocatedVUs: 200,
      exec: 'batchIngest',
    },
  },
};

Results at different ticket configurations:

ConfigurationRead p99Write p99Write throughput
R:128 / W:128 (default)25ms180ms3,200 ops/s
R:128 / W:25628ms45ms4,800 ops/s
R:64 / W:25635ms38ms5,000 ops/s
R:256 / W:25622ms42ms4,900 ops/s

The Fix

For the write-heavy batch import scenario, increase write tickets:

# mongod.conf - asymmetric ticket configuration for write-heavy workload
setParameter:
  wiredTigerConcurrentReadTransactions: 128
  wiredTigerConcurrentWriteTransactions: 256

This can be changed at runtime without restart:

db.adminCommand({ setParameter: 1, wiredTigerConcurrentWriteTransactions: 256 })

For mixed workloads, consider a dynamic approach: increase write tickets during batch import windows and revert after:

// FAST: Dynamic ticket adjustment for batch import
public void startBatchImport() {
    database.runCommand(new Document("setParameter", 1)
        .append("wiredTigerConcurrentWriteTransactions", 256));
}

public void endBatchImport() {
    database.runCommand(new Document("setParameter", 1)
        .append("wiredTigerConcurrentWriteTransactions", 128));
}

The Proof

After increasing write tickets to 256 during batch import:

MetricW:128W:256
Batch import duration45 min28 min
Write p99 during import180ms45ms
Read p99 during import25ms28ms
Write ticket exhaustion events2,300/min0/min

The Trade-off

More write tickets mean more concurrent writes accessing WiredTiger simultaneously. This increases contention on internal WiredTiger structures: the B-tree page split lock, the eviction lock, and the cache management structures. On a 4-core server, 256 concurrent write threads cause excessive context switching. On a 16-core server with fast NVMe storage, 256 tickets are sustainable.

The guideline: total tickets (read + write) should not exceed 10x the CPU core count. On a 16-core server: max 160 tickets total. On a 32-core server: max 320. Beyond this, the overhead of context switching and lock contention outweighs the benefit of additional concurrency.

MongoDB 5.0+ introduced an adaptive ticket mechanism (throughputProbing) that automatically adjusts ticket counts based on observed throughput. When enabled, it monitors operation latency and adjusts tickets up or down to maximize throughput. This is disabled by default but recommended for workloads with variable read/write ratios:

# mongod.conf - enable adaptive ticket control
setParameter:
  storageEngineConcurrencyAdjustmentAlgorithm: "throughputProbing"