Capacity Estimation and Back-of-Envelope Calculations

Capacity estimation provides a quantitative foundation for scaling systems, transforming abstract requirements into concrete resource demands. This section defines core metrics and demonstrates rapid, approximate calculations to assess queries per second, storage, bandwidth, and latency, equipping engineers to validate architectural feasibility under realistic constraints.

Key Definitions and Metrics

Precise terminology anchors estimation processes. Daily Active Users measure unique system engagements within 24 hours, serving as a baseline for request volume. Queries Per Second derive from daily activity: QPS = (DAU × actions per day) / 86,400 seconds, with a peak factor of 2-3x applied to model traffic spikes. Storage estimation multiplies objects per day by size per object and retention period, then incorporates a replication factor for redundancy. Bandwidth splits into ingress (incoming) and egress (outgoing) flows, calculated as QPS × average response size. Latency numbers approximate access times: RAM ~100 nanoseconds, SSD ~100 microseconds, network within a datacenter ~500 microseconds, cross-region network ~50 milliseconds. Power-of-two approximations simplify unit conversions: 1KB ≈ 10³ bytes, 1MB ≈ 10⁶ bytes, 1GB ≈ 10⁹ bytes, 1TB ≈ 10¹² bytes, and daily seconds ≈ 100,000 seconds. Back-of-envelope calculation denotes this rough, quick estimation technique, balancing speed with accuracy for initial sizing.

Back-of-Envelope Calculation Techniques

Implement estimation logic using Java 21+ Records for immutability and clarity. The following code encapsulates key formulas, with time and space complexity O(1) for all methods due to direct arithmetic operations.

import java.util.concurrent.atomic.AtomicInteger;

public record CapacityEstimation(long dau, int actionsPerDay, double peakFactor) {
    public double calculateAverageQPS() {
        return dau * actionsPerDay / 86400.0;
    }
    public double calculatePeakQPS() {
        return calculateAverageQPS() * peakFactor;
    }
    public long calculateStorage(long objectsPerDay, long sizePerObject, int retentionDays, int replicationFactor) {
        return objectsPerDay * sizePerObject * retentionDays * replicationFactor;
    }
    public double calculateBandwidth(double qps, long averageResponseSize) {
        return qps * averageResponseSize;
    }
    // Time Complexity: O(1) for all methods, Space Complexity: O(1) auxiliary space.
}

// Example usage for Instagram photo storage
public class InstagramStorageEstimator {
    public static void main(String[] args) {
        CapacityEstimation est = new CapacityEstimation(1_000_000_000L, 1, 2.5); // 1B DAU, 1 action per day, peak factor 2.5
        double avgQPS = est.calculateAverageQPS();
        double peakQPS = est.calculatePeakQPS();
        long storage = est.calculateStorage(100_000_000L, 1_000_000L, 1825, 3); // 100M photos/day, 1MB each, 5 years, replication 3
        System.out.println("Average QPS: " + avgQPS);
        System.out.println("Peak QPS: " + peakQPS);
        System.out.println("Storage needed: " + storage + " bytes");
    }
}

Complexity analysis confirms efficiency, as shown in this table:

Operation	Time Complexity	Space Complexity	Notes
DAU to QPS Calculation	O(1)	O(1)	Simple arithmetic, constant time and space.
Storage Estimation	O(1)	O(1)	Multiplication of constants, no loops.
Bandwidth Calculation	O(1)	O(1)	Direct multiplication.
Latency-based Behavior Estimation	O(1) per lookup	O(1)	Using predefined latency numbers in hash maps.
Peak Factor Application	O(1)	O(1)	Multiplier applied to QPS.

Memory layout for the CapacityEstimation Record in Java 21+ optimizes storage: components (dau, actionsPerDay, peakFactor) reside directly in the object header with a fixed layout, reducing overhead compared to plain old Java objects. Each instance consumes approximately 24 bytes (assuming long, int, double) plus the object header, with no auxiliary arrays. When using virtual threads for concurrent estimations—such as in simulations from earlier chapters like ParallelFileProcessor—each thread allocates ~2KB on-heap for its stack, enhancing throughput for I/O-bound tasks without excessive memory cost.

Latency Benchmarks and System Behavior

Latency numbers dictate performance boundaries. RAM access at ~100ns enables rapid in-memory caches, while SSD reads at ~100µs suit warm storage layers. Network latency within a datacenter (~500µs) and cross-region (~50ms) informs distributed system design, such as partitioning strategies referenced in UrlShortenerAPI. Applying these benchmarks, engineers estimate that a single-server system cannot handle 1 billion QPS; instead, horizontal scaling via load balancers and database shards becomes necessary, as discussed in the parent context on scaling trade-offs.

Example Calculations: Instagram Photo Storage and URL Shortener

Concrete examples illustrate application. For Instagram photo storage, assume 1 billion DAU, 100 million photos per day, average size 1MB, retention period 5 years (1,825 days), and replication factor 3. Using the code above, raw storage approximates 182.5 petabytes before replication, highlighting massive data volumes. Peak QPS scales from average ~1,157 QPS to ~2,893 QPS with a 2.5x factor.

For a URL shortener, based on hard facts: 100 million DAU with 1 billion daily shortenings yields average QPS ≈ 11,574, peak up to ~34,722 QPS with a 3x factor. Storage: 1 billion shortenings per day, 100 bytes per record, 1-year retention (365 days), replication factor 2 results in ~73 TB. Bandwidth: at 11,574 QPS and 500 bytes per response, egress bandwidth is ~5.8 MB/s. These numbers align with the interview template’s emphasis on deriving specific metrics.

Trade-Offs and Failure Modes

Explicit trade-offs guide decision-making. Compare scaling strategies and estimation techniques:

Aspect	Vertical Scaling	Horizontal Scaling
Complexity	Low (add resources to single server)	High (manage multiple servers, load balancing)
Cost	Higher per server, but simpler setup	Lower per server, but higher operational cost
Scalability	Limited by single server capacity	Highly scalable by adding more servers
Fault Tolerance	Single point of failure	Improved with redundancy
Best Use Case	Small to medium loads, simple systems	Large-scale, distributed systems

Estimation Technique	Pros	Cons
Power-of-two Approximations	Fast, simple, good for back-of-envelope	Less precise, may over/underestimate
Exact Calculations	Accurate, reliable for detailed planning	Slower, requires more data and computation
Using Latency Numbers	Realistic performance insights	Depends on specific hardware and network conditions

Common pitfalls necessitate vigilance. This checklist enumerates failure modes:

Forgetting replication factor in storage calculations.
Ignoring peak load factors, leading to under-provisioning.
Using unrealistic latency numbers (e.g., assuming all accesses are RAM-speed).
Not accounting for network bandwidth limits in distributed systems.
Overlooking retention period in storage estimates.
Misapplying power-of-two approximations (e.g., using 1024 instead of 1000 inconsistently).
Assuming linear scalability without considering bottlenecks like database locks.
Not validating assumptions with real-world data (e.g., user behavior changes).
Failing to include error margins in estimates.
Using mutable fields in estimation Records, violating immutability best practices.

Mitigation involves thorough validation and leveraging Java 21+ features—such as Records for immutability, as seen in LockFreeCounter—to enforce data integrity.

Integrating Estimation into System Design Interviews

A structured template systematizes capacity estimation within broader interview contexts, building on the methodology from the parent chapter:

Understand Requirements: Clarify DAU, actions per day, retention, replication, and performance goals.
Derive QPS: Use formula QPS = (DAU × actions) / 86400, apply peak factor.
Estimate Storage: Calculate objects per day × size × retention × replication.
Estimate Bandwidth: Compute QPS × average response size for ingress/egress.
Check Hardware Feasibility: Compare with latency numbers (RAM, SSD, network) and server limits.
Identify Trade-offs: State pros and cons of scaling strategies (vertical vs horizontal).
Test Edge Cases: Consider scenarios like traffic spikes, data growth, and failure modes.
Summarize: Provide concise estimates with assumptions and next steps for detailed design.

Implement with Java 21+ features: Use Records for estimation models, virtual threads for concurrent simulations—inspired by ThreadBenchmark—and pattern matching for error handling, ensuring alignment with modern practices.

Verification Exercise

Apply these techniques to estimate Instagram photo storage within 5 minutes. Steps: define parameters (1B DAU, 100M photos/day, 1MB size, 5-year retention, replication 3), compute using the CapacityEstimation Record, and validate against latency benchmarks. For instance, storage ≈ 182.5 PB implies distributed storage solutions, while peak QPS ~2,893 suggests load balancing across servers. This exercise reinforces rapid, evidence-based calculation, a core skill for system design interviews.

By mastering these definitions, techniques, and examples, engineers transform vague scalability concerns into actionable hardware requirements, ensuring resilient and efficient system architectures.