Part I — System Design at Scale
SummaryA structured approach to system design interviews: requirements...
A structured approach to system design interviews: requirements...
A structured approach to system design interviews: requirements gathering, capacity estimation, high-level design, deep dives, and bottleneck analysis.
Why System Design Interviews Exist
System design interviews test something algorithms rounds cannot: your ability to think at scale. When an interviewer asks you to “design Twitter” or “build a notification system,” they’re evaluating four things simultaneously:
- Structured thinking — Can you break an ambiguous problem into concrete, solvable pieces?
- Technical depth — Do you understand databases, caching, message queues, and networking well enough to make real trade-offs?
- Communication — Can you drive a conversation, explain your reasoning, and respond to pushback?
- Prioritization — Given limited time, do you focus on what matters most?
The biggest misconception about system design interviews is that they have a single “correct” answer. They do not. The interviewer cares far more about how you navigate the problem than whether your final architecture matches some reference solution. Two candidates can propose completely different designs and both receive strong-hire signals — as long as each candidate demonstrates clear reasoning about trade-offs.
The 5-Step Framework
Every system design problem, from a URL shortener to a distributed search engine, responds well to the same five-step structure. Internalize this framework, and you’ll never freeze in front of a whiteboard again.
Step 1: Requirements Clarification (3–5 minutes)
Before drawing a single box, ask questions. Lots of them. The interviewer deliberately leaves the problem vague to see whether you’ll charge ahead or seek clarity.
Split requirements into two buckets:
- Functional requirements — What does the system do? What are the core user-facing features?
- Non-functional requirements — What quality attributes matter? Latency targets? Availability SLAs? Consistency guarantees?
Write these down explicitly. This list becomes your north star for every decision that follows.
Pro tip: Interviewers often nod approvingly when you proactively ask about scale. “Are we designing for thousands of users or hundreds of millions?” instantly signals senior-level thinking.
Step 2: Capacity Estimation (3–5 minutes)
Back-of-the-envelope math transforms a vague problem into a concrete engineering challenge. You’re not looking for exact numbers — you’re establishing orders of magnitude that inform your architecture.
Target these three quantities:
- Requests per second (QPS) — determines how many servers you need and whether you need load balancing.
- Storage — determines your database strategy and whether sharding is necessary.
- Bandwidth — determines whether you need CDNs, compression, or chunked transfers.
The estimation section below provides a cheat sheet you can memorize.
Step 3: High-Level Design (5–10 minutes)
Draw the 30,000-foot view. Boxes represent services. Arrows represent data flow. At this stage, you’re answering: “What are the major components, and how do they communicate?”
A typical high-level design includes:
- Clients (web, mobile)
- API Gateway or Load Balancer
- Application servers (stateless)
- Database(s) — SQL, NoSQL, or both
- Cache layer
- Message queue (if async processing is needed)
- Blob storage (if handling files or media)
Don’t over-engineer at this stage. Get the skeleton on the board, confirm with the interviewer that it looks reasonable, then dive deeper.
Step 4: Deep Dive (10–15 minutes)
This is where you differentiate yourself. The interviewer will pick one or two components from your high-level design and ask you to go deeper. Alternatively, you can proactively identify the most interesting or challenging component and lead the conversation there.
Strong deep dives include:
- Database schema design and sharding strategy
- Key generation algorithms and collision handling
- Caching policies and invalidation strategies
- Consistency models (strong vs. eventual)
- Data replication and failover mechanisms
Show that you understand the why behind your choices. “I chose a write-through cache because our read-to-write ratio is 100:1” is far more compelling than “I’d add Redis here.”
Step 5: Bottlenecks & Trade-offs (5 minutes)
Every system has failure modes. Proactively identifying them before the interviewer asks demonstrates engineering maturity.
Walk through these scenarios:
- What happens if the database goes down?
- What if a single service becomes a hot spot?
- How does the system handle a 10x traffic spike?
- Where are the single points of failure?
For each bottleneck, propose a mitigation: replication, circuit breakers, auto-scaling, rate limiting, or graceful degradation.
Back-of-the-Envelope Estimation Cheat Sheet
Memorize these reference numbers. They’ll save you precious minutes during the interview and make your estimates feel credible.
Time References
| Operation | Latency |
|---|---|
| L1 cache reference | 1 ns |
| L2 cache reference | 4 ns |
| Main memory reference | 100 ns |
| SSD random read | 150 μs |
| HDD random read | 10 ms |
| Send 1 KB over 1 Gbps network | 10 μs |
| Read 1 MB sequentially from SSD | 1 ms |
| Round trip within same datacenter | 500 μs |
| Round trip across continents | 150 ms |
Scale References
| Metric | Value |
|---|---|
| Seconds in a day | ~86,400 ≈ 10^5 |
| Seconds in a month | ~2.6 million ≈ 2.5 × 10^6 |
| Seconds in a year | ~31.5 million ≈ 3 × 10^7 |
| 1 million requests/day | ~12 QPS |
| 100 million requests/day | ~1,200 QPS |
| 1 billion requests/day | ~12,000 QPS |
Storage Rules of Thumb
| Data Type | Typical Size |
|---|---|
| A single character (ASCII) | 1 byte |
| A URL | ~100 bytes |
| A tweet-sized text | ~300 bytes |
| A user profile record | ~1 KB |
| A high-res image | ~2 MB |
| A 1-minute video | ~50 MB |
| 1 million records × 1 KB each | ~1 GB |
Example calculation: Design a system that stores 100 million URLs for 5 years.
- Storage per URL: ~1 KB (URL + metadata)
- Total: 100M × 1 KB = 100 GB per year → 500 GB over 5 years
- With replication factor of 3: ~1.5 TB total
That number tells you a single beefy database server could handle the storage, but you’d still shard for write throughput and availability.
Common Pitfalls
These mistakes derail more candidates than lack of technical knowledge:
Diving into details too early. You start talking about database indexes before establishing what the system even does. Always start with requirements.
Ignoring non-functional requirements. A design that handles 100 users per second looks nothing like one that handles 100,000. Scale changes everything — your database choice, caching strategy, and deployment topology all shift.
Treating the interview as a monologue. System design is a collaborative conversation. Check in with the interviewer: “Does this direction make sense?” “Should I go deeper on the caching layer or move to the database?” This isn’t weakness — it’s the same skill you’d use leading a real design review.
Over-engineering. Adding Kafka, Kubernetes, and a service mesh to a problem that could be solved with a single server and a PostgreSQL database sends the wrong signal. Start with the simplest architecture that meets the requirements, then scale up as needed.
Not quantifying trade-offs. Saying “NoSQL is better for this” without explaining why — what consistency you’re sacrificing, what read/write pattern you’re optimizing for — leaves the interviewer unconvinced.
What the Chapters Ahead Cover
The system design section of this book walks through ten real interview problems, each following the five-step framework. Every chapter stands alone, so you can study them in any order, but they’re arranged from foundational to advanced:
- URL Shortener — Key generation, hashing, database sharding, and caching fundamentals.
- Rate Limiter — Token bucket, sliding window algorithms, and distributed coordination.
- Consistent Hashing — The technique that powers virtually every distributed system.
- Key-Value Store — Partitioning, replication, conflict resolution, and gossip protocols.
- Notification System — Push vs. pull, fan-out strategies, and delivery guarantees.
- News Feed — Fan-out on write vs. fan-out on read, ranking algorithms, and caching layers.
- Chat System — WebSockets, message ordering, presence detection, and group chat at scale.
- Search Autocomplete — Trie data structures, distributed ranking, and latency optimization.
- Web Crawler — Politeness policies, URL frontier management, and deduplication at scale.
- Video Streaming Platform — CDN architecture, adaptive bitrate, and transcoding pipelines.
Each chapter includes working Java 25 code examples that demonstrate key concepts using modern language features — records, pattern matching, sealed interfaces, virtual threads, and structured concurrency. The code isn’t production-ready (no system design interview expects that), but it’s concrete enough to show the interviewer you can translate architecture into implementation.
Let’s start with the most classic system design question of them all: the URL shortener.