Design a URL Shortener
SummaryA complete URL shortener design addressing base62 encoding,...
A complete URL shortener design addressing base62 encoding,...
A complete URL shortener design addressing base62 encoding, collision-free key generation with Snowflake IDs, consistent hashing for sharding, Redis caching for hot URLs, and HTTP 301/302 redirect strategies.
A URL shortener is the “Hello, World” of system design interviews. It appears deceptively straightforward — take a long URL, return a short one, redirect when someone clicks it. But beneath that surface lie real engineering challenges: generating globally unique keys at massive scale, sharding a database that grows by millions of rows per day, caching hot URLs without stale redirects, and choosing redirect semantics that balance analytics with performance.
This chapter walks through the complete design, following the five-step framework from Part I.
Requirements
Functional Requirements
- Shorten URL — Given a long URL, produce a unique short URL (e.g.,
https://short.ly/aB3x7Kq). - Redirect — When a user visits a short URL, redirect them to the original long URL.
- Analytics — Track click counts, referrer, geographic location, and timestamp per short URL.
- Custom aliases — Allow users to choose a custom short code (e.g.,
short.ly/my-brand). - Expiration — URLs expire after a configurable TTL (default: 5 years). Expired URLs return 404.
Non-Functional Requirements
| Requirement | Target |
|---|---|
| Read latency (redirect) | < 100 ms p99 |
| Write latency (shorten) | < 200 ms p99 |
| Availability | 99.99% (52 minutes downtime/year) |
| Scale | 100 million new URLs per day |
| Read-to-write ratio | 10:1 (1 billion reads/day) |
| Durability | No URL lost once created |
The 10:1 read-to-write ratio tells us this is a read-heavy system. That insight drives two major decisions: aggressive caching and read replicas on the database tier.
Capacity Estimation
Start with the daily write volume and derive everything else.
Write QPS:
- 100 million URLs/day ÷ 86,400 seconds/day ≈ 1,160 writes/sec
- Peak (assume 3x average): ~3,500 writes/sec
Read QPS:
- 10:1 ratio → 1 billion reads/day ÷ 86,400 ≈ 11,600 reads/sec
- Peak: ~35,000 reads/sec
Storage (5-year horizon):
- Each URL record: short code (7 bytes) + long URL (avg 200 bytes) + metadata (created_at, expiry, user_id ≈ 100 bytes) = ~307 bytes → round to 500 bytes with indexes.
- Daily: 100M × 500 bytes = 50 GB/day
- 5 years: 50 GB × 365 × 5 = ~91 TB
- With replication factor 3: ~273 TB
Bandwidth:
- Write: 1,160 req/sec × 500 bytes ≈ 580 KB/s (negligible)
- Read: 11,600 req/sec × 500 bytes ≈ 5.8 MB/s (easily handled)
Takeaway: Storage is the dominant scaling concern, not bandwidth. We need a sharding strategy from day one.
High-Level Design
┌──────────┐ ┌──────────────┐ ┌───────────────┐ ┌─────────────┐
│ Client │────▶│ API Gateway │────▶│ URL Service │────▶│ Database │
│ (Browser) │ │ (Rate Limit) │ │ (Stateless) │ │ (Sharded) │
└──────────┘ └──────────────┘ └───────┬───────┘ └─────────────┘
│
▼
┌──────────────┐
│ Redis Cache │
│ (Hot URLs) │
└──────────────┘
│
┌───────────────┼───────────────┐
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Analytics │ │ Key Gen │ │ Expiration │
│ Service │ │ Service │ │ Worker │
└─────────────┘ └─────────────┘ └─────────────┘
API Design
Create Short URL
POST /api/v1/shorten
Content-Type: application/json
{
"longUrl": "https://example.com/very/long/path?query=value",
"customAlias": "my-brand", // optional
"expiresAt": "2031-02-10T00:00:00Z" // optional
}
Response 201:
{
"shortUrl": "https://short.ly/aB3x7Kq",
"shortCode": "aB3x7Kq",
"expiresAt": "2031-02-10T00:00:00Z"
}
Redirect
GET /{shortCode}
Response 302:
Location: https://example.com/very/long/path?query=value
Get Analytics
GET /api/v1/analytics/{shortCode}
Response 200:
{
"totalClicks": 142857,
"clicksByDay": [...],
"topReferrers": [...]
}
Deep Dive
Key Generation Strategy
The short code is the heart of the system. It must be unique, compact, and fast to generate at scale. With a 7-character code using Base62 encoding (a-z, A-Z, 0-9), we get 62^7 = 3.52 trillion possible combinations — more than enough for 100M URLs/day over decades.
Here’s a clean Base62 encoder in Java 25:
public record Base62Encoder(String alphabet) {
private static final String DEFAULT_ALPHABET =
"0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
public Base62Encoder() {
this(DEFAULT_ALPHABET);
}
public String encode(long value) {
if (value == 0) return String.valueOf(alphabet.charAt(0));
var result = new StringBuilder();
long remaining = Math.abs(value);
while (remaining > 0) {
result.append(alphabet.charAt((int) (remaining % 62)));
remaining /= 62;
}
return result.reverse().toString();
}
public long decode(String encoded) {
long result = 0;
for (char c : encoded.toCharArray()) {
result = result * 62 + alphabet.indexOf(c);
}
return result;
}
}
The question is: what numeric value do we feed into this encoder? Three approaches exist, each with distinct trade-offs.
Approach 1: Hash and Truncate
Take the MD5 or SHA-256 hash of the long URL, then extract the first 7 Base62 characters.
- Pro: Deterministic — same URL always produces the same short code (natural deduplication).
- Con: Collision risk. MD5 produces 128 bits, but we’re truncating to ~41 bits (7 Base62 chars). By the birthday paradox, collisions become probable around √(62^7) ≈ 1.87 million URLs — far below our daily volume.
Approach 2: Pre-Generated Key Ranges (Key Generation Service)
A dedicated Key Generation Service (KGS) pre-generates batches of unique keys and stores them in a database. When the URL Service needs a key, it fetches a batch (e.g., 1,000 keys) from KGS and uses them locally until depleted.
- Pro: Zero collision risk. Keys are guaranteed unique at generation time.
- Con: Added infrastructure complexity. The KGS itself becomes a critical dependency that needs its own availability guarantees and coordination logic.
Approach 3: Snowflake ID → Base62
Generate a globally unique 64-bit Snowflake ID (timestamp + datacenter ID + machine ID + sequence number), then Base62-encode it.
- Pro: No coordination needed between servers. Each machine generates unique IDs independently. The timestamp component provides natural ordering.
- Con: The resulting short code is slightly longer (11 characters for a 64-bit number in Base62). You can truncate to 7, but that reintroduces collision risk.
Recommended approach: Use the KGS for maximum reliability. Pre-generate keys in the background, store them in a dedicated unused_keys table, and move them to a used_keys table upon assignment. The URL Service fetches keys in bulk, minimizing round trips to KGS.
Collision Handling
Regardless of the key generation strategy, you need a safety net. A Bloom filter provides a probabilistic, space-efficient way to check whether a short code already exists — catching the vast majority of collisions without hitting the database.
public record BloomFilter(BitSet bitSet, int size, int hashCount) {
public static BloomFilter create(int expectedInsertions, double falsePositiveRate) {
int size = optimalSize(expectedInsertions, falsePositiveRate);
int hashCount = optimalHashCount(size, expectedInsertions);
return new BloomFilter(new BitSet(size), size, hashCount);
}
public void add(String element) {
for (int i = 0; i < hashCount; i++) {
int hash = computeHash(element, i);
bitSet.set(Math.floorMod(hash, size));
}
}
public boolean mightContain(String element) {
for (int i = 0; i < hashCount; i++) {
int hash = computeHash(element, i);
if (!bitSet.get(Math.floorMod(hash, size))) {
return false; // Definitely not present
}
}
return true; // Possibly present — verify against DB
}
private int computeHash(String element, int seed) {
return element.hashCode() * 31 + seed * 17;
}
private static int optimalSize(int n, double p) {
return (int) (-n * Math.log(p) / (Math.log(2) * Math.log(2)));
}
private static int optimalHashCount(int m, int n) {
return Math.max(1, (int) Math.round((double) m / n * Math.log(2)));
}
}
The collision resolution flow:
- Generate a candidate short code.
- Check the Bloom filter. If
mightContainreturnsfalse, the code is guaranteed unique — insert it. - If
mightContainreturnstrue, query the database to confirm. The Bloom filter has a small false-positive rate (typically configured at 0.1%), so most “true” results are actual collisions. - On a confirmed collision, append a random salt to the original URL and regenerate the code. Repeat until unique.
Database Design & Sharding
Schema:
CREATE TABLE url_mappings (
short_code VARCHAR(7) PRIMARY KEY,
long_url TEXT NOT NULL,
user_id BIGINT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
expires_at TIMESTAMP,
click_count BIGINT DEFAULT 0
);
CREATE INDEX idx_long_url ON url_mappings(long_url);
CREATE INDEX idx_expires_at ON url_mappings(expires_at);
At 91 TB over 5 years, a single database server cannot hold all the data. Sharding is mandatory.
Range-based sharding partitions by the first character of the short code (e.g., shard A handles codes starting with 0-9, shard B handles a-m, etc.). The problem: uneven distribution. Some character ranges will be “hotter” than others, creating load imbalance.
Hash-based sharding applies a hash function to the short code and maps the result to a shard. Produces uniform distribution, but adding or removing shards forces a massive data migration — every key needs to be rehashed.
Consistent hashing solves the migration problem. Each shard occupies a position on a virtual ring. A key hashes to a position on the ring and maps to the nearest shard clockwise. Adding a shard only redistributes keys from its immediate neighbor, not from every shard.
import java.util.SortedMap;
import java.util.TreeMap;
import java.util.List;
public record ConsistentHashRing<T>(
TreeMap<Integer, T> ring,
int virtualNodes
) {
public static <T> ConsistentHashRing<T> create(List<T> nodes, int virtualNodes) {
var ring = new TreeMap<Integer, T>();
for (T node : nodes) {
for (int i = 0; i < virtualNodes; i++) {
int hash = hash(node.toString() + "#" + i);
ring.put(hash, node);
}
}
return new ConsistentHashRing<>(ring, virtualNodes);
}
public T getNode(String key) {
if (ring.isEmpty()) {
throw new IllegalStateException("Hash ring is empty");
}
int hash = hash(key);
SortedMap<Integer, T> tailMap = ring.tailMap(hash);
int targetHash = tailMap.isEmpty() ? ring.firstKey() : tailMap.firstKey();
return ring.get(targetHash);
}
public ConsistentHashRing<T> addNode(T node) {
var newRing = new TreeMap<>(ring);
for (int i = 0; i < virtualNodes; i++) {
int hash = hash(node.toString() + "#" + i);
newRing.put(hash, node);
}
return new ConsistentHashRing<>(newRing, virtualNodes);
}
public ConsistentHashRing<T> removeNode(T node) {
var newRing = new TreeMap<>(ring);
for (int i = 0; i < virtualNodes; i++) {
int hash = hash(node.toString() + "#" + i);
newRing.remove(hash);
}
return new ConsistentHashRing<>(newRing, virtualNodes);
}
private static int hash(String key) {
// MurmurHash-inspired simple hash for demonstration
int h = key.hashCode();
h ^= (h >>> 16);
h *= 0x85ebca6b;
h ^= (h >>> 13);
return h & Integer.MAX_VALUE; // Ensure positive
}
}
Virtual nodes (typically 100–200 per physical shard) ensure even distribution. Without them, a small number of physical nodes create large gaps on the ring, leading to unbalanced load.
Caching
With a 10:1 read-to-write ratio and 35,000 peak reads/sec, caching is essential. Redis sits between the URL Service and the database, storing frequently accessed URL mappings.
Cache sizing: Following the 80/20 rule (20% of URLs generate 80% of traffic), cache the top 20% of daily URLs:
- Daily read requests: 1 billion
- Unique URLs accessed: ~200 million
- Top 20%: 40 million entries × 500 bytes = 20 GB — fits comfortably in a single Redis instance or a small cluster.
Pattern: Cache-Aside with Write-Through
On writes, store the mapping in both the database and the cache simultaneously. On reads, check the cache first; on a miss, fetch from the database and populate the cache.
import java.time.Duration;
import java.util.Optional;
public sealed interface CacheResult<T> {
record Hit<T>(T value) implements CacheResult<T> {}
record Miss<T>() implements CacheResult<T> {}
}
public record UrlCacheService(RedisClient redis, UrlRepository repository) {
private static final Duration CACHE_TTL = Duration.ofHours(24);
public Optional<String> resolveShortCode(String shortCode) {
// Step 1: Check cache
CacheResult<String> cached = getFromCache(shortCode);
return switch (cached) {
case CacheResult.Hit<String> hit -> Optional.of(hit.value());
case CacheResult.Miss<String> _ -> {
// Step 2: Cache miss — query database
Optional<String> longUrl = repository.findByShortCode(shortCode);
// Step 3: Populate cache on hit
longUrl.ifPresent(url -> putInCache(shortCode, url));
yield longUrl;
}
};
}
public void shortenAndCache(String shortCode, String longUrl) {
// Write-through: persist to DB and cache atomically
repository.save(shortCode, longUrl);
putInCache(shortCode, longUrl);
}
private CacheResult<String> getFromCache(String shortCode) {
String value = redis.get("url:" + shortCode);
return value != null
? new CacheResult.Hit<>(value)
: new CacheResult.Miss<>();
}
private void putInCache(String shortCode, String longUrl) {
redis.setex("url:" + shortCode, CACHE_TTL.toSeconds(), longUrl);
}
}
Eviction policy: Use LRU (Least Recently Used). Redis supports this natively with maxmemory-policy allkeys-lru. When the cache reaches its memory limit, the least-recently-accessed URLs get evicted first — exactly the behavior we want for a read-heavy workload.
Redirection
The redirect HTTP status code seems like a minor detail, but it has significant implications.
301 Moved Permanently:
- The browser caches the redirect. Subsequent visits to the same short URL never hit your servers — the browser redirects locally.
- Pro: Drastically reduces server load. For viral URLs clicked millions of times by the same users, this saves enormous bandwidth.
- Con: You lose analytics data. If the browser handles the redirect, your click counter never increments.
302 Found (Temporary Redirect):
- The browser does NOT cache the redirect. Every click hits your servers.
- Pro: Full analytics visibility. Every single click gets tracked.
- Con: Higher server load since no client-side caching occurs.
Recommendation: Default to 302 for most URL shortener products. Analytics is a core feature, and the infrastructure you’ve built (Redis cache, database sharding) can handle the load. Offer 301 as an opt-in for users who prioritize performance over analytics.
There’s a nuanced middle ground: return 302 for the redirect but set a short-lived Cache-Control header (e.g., max-age=300). This lets browsers cache the redirect for 5 minutes, batching rapid repeat clicks while still capturing analytics at a reasonable granularity.
Bottlenecks & Scaling
Database Write Bottleneck
At peak, the system processes 3,500 writes/sec. A single relational database handles this comfortably, but what happens during a traffic spike (e.g., a Super Bowl ad with a short URL)?
Solution: Introduce an asynchronous write path. The URL Service publishes write events to a message queue (Kafka or SQS). A pool of consumer workers drains the queue and writes to the database at a controlled rate. The short code is returned to the user immediately — the write is eventual but guaranteed by the durable queue.
This decouples write latency from database throughput. The user gets their short URL in milliseconds; the database catches up in the background.
Cache Stampede
When a hot URL’s cache entry expires, hundreds of concurrent requests simultaneously miss the cache and slam the database with identical queries — a cache stampede.
Solution: Probabilistic Early Recomputation. Before the cache entry expires, each request has a small probability of proactively refreshing it. The probability increases as the TTL approaches zero. This ensures that one “lucky” request refreshes the cache just before expiration, and subsequent requests find a warm cache.
An alternative is a mutex lock: the first request on a cache miss acquires a lock, queries the database, and populates the cache. All other requests wait on the lock (or return a stale value). This prevents thundering herd but adds latency for waiting requests.
Geographic Distribution
Users distributed globally experience high redirect latency if all servers sit in a single region.
Solution: Deploy read replicas and Redis caches in multiple regions (US-East, EU-West, AP-Southeast). Use DNS-based routing (e.g., AWS Route 53 latency-based routing) to direct users to the nearest region. Writes still flow to the primary region and replicate asynchronously to read replicas.
For the redirect path (the latency-critical operation), the local cache and read replica handle the vast majority of requests without cross-region hops.
Interviewer Tips
System design interviews for URL shorteners frequently go beyond the initial design. Here are the most common follow-up questions and how to address them:
“How do you handle URL deduplication?”
If the same long URL is submitted twice, should it return the same short code? This depends on the product. If yes, maintain a reverse index (long URL → short code) and check it before generating a new key. Store this in a separate hash-based lookup table or use the database’s unique constraint on long_url.
“How do you prevent abuse?” Rate limiting at the API Gateway layer — per-user and per-IP. Token bucket or sliding window algorithms prevent a single bad actor from exhausting your key space. For anonymous users, apply stricter limits and require CAPTCHA after a threshold.
“What if the Key Generation Service goes down?” Each URL Service instance pre-fetches a batch of keys (e.g., 10,000) and stores them in local memory. If KGS goes down, the service continues operating until its local key pool is depleted. This provides a buffer of several minutes to hours, depending on traffic. Deploy KGS with at least three replicas behind a load balancer for high availability.
“How do you delete or expire URLs?”
A background Expiration Worker runs periodically (every minute), querying the expires_at index for expired rows. It deletes them in batches and removes corresponding cache entries. For on-demand deletion, invalidate the cache entry immediately and soft-delete the database row (set a deleted flag) to avoid race conditions with in-flight redirects.
“How would you add link preview / Open Graph support?” When a URL is first shortened, enqueue an asynchronous job that fetches the target page, extracts Open Graph metadata (title, description, image), and stores it alongside the URL mapping. Serve this metadata on an API endpoint for chat apps and social media platforms that request link previews.
These follow-ups test whether you can extend your design without redesigning it from scratch. The modular architecture outlined above — stateless services, a cache layer, a message queue, and background workers — accommodates each extension with minimal changes.