Skip to main content

On This Page

When Your Database Goes Down for 25 Minutes: Building a Survival Cache

2 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

The Problem Nobody Talks About

Most caching tutorials focus on optimal performance during normal operations, neglecting the critical scenario of prolonged database outages. A 5-minute TTL on cached data combined with a 25-minute outage can quickly lead to service failure and availability dropping to zero.

Why This Matters

Traditional caching solutions often fail during extended outages because they rely on a functioning database for cache repopulation. This creates a single point of failure, making services vulnerable to even short-lived database issues, which can translate to significant financial losses and user disruption.

Key Insights

  • RocksDB compression: Achieves 5.6x compression of config data using LZ4, reducing disk space and I/O.
  • Cache eviction persistence: Writing evicted cache entries to disk allows serving stale data during outages, maintaining service availability.
  • Tiered cache architecture: Utilizing L1 (memory), L2 (database), and L3 (disk) provides a layered defense against database failures.

Working Example

public class RocksDBDiskStore implements AutoCloseable {
private final RocksDB db;
private final ObjectMapper mapper;
public RocksDBDiskStore(String path) throws RocksDBException {
RocksDB.loadLibrary();
Options options = new Options()
.setCreateIfMissing(true)
.setCompressionType(CompressionType.LZ4_COMPRESSION)
.setMaxOpenFiles(256)
.setWriteBufferSize(8 * 1024 * 1024); // 8MB buffer
this.db = RocksDB.open(options, path);
this.mapper = new ObjectMapper();
}
}

Practical Applications

  • Config Service at Netflix: Uses a similar strategy of local caching with a disk persistence layer to maintain service availability during regional outages.
  • Pitfall: Relying solely on cache TTL without a disk persistence layer can lead to complete service failure during prolonged database outages.

References:

Continue reading

Next article

Why Enterprises Are Replacing Traditional Hiring with Dedicated Development Teams in 2026

Related Content