Design a Video Streaming Service
SummaryCovers video upload and transcoding pipelines with DAG-based...
Covers video upload and transcoding pipelines with DAG-based...
Covers video upload and transcoding pipelines with DAG-based task execution, adaptive bitrate streaming with HLS/DASH, CDN edge caching strategies, and content recommendation at scale.
Requirements
Functional Requirements
- Upload video — Creators upload videos up to 10 GB with metadata (title, description, tags, thumbnail).
- Stream video — Viewers watch videos with adaptive quality based on their network conditions.
- Search — Full-text search across video titles, descriptions, tags, and auto-generated captions.
- Comments — Threaded comments on videos with likes and replies.
- Recommendations — Personalized homepage feed and “up next” sidebar based on watch history.
- Channels & subscriptions — Creators own channels; viewers subscribe and receive new upload notifications.
Non-Functional Requirements
| Requirement | Target |
|---|---|
| Upload processing time | < 1 hour from upload to playable (for a 1-hour 1080p video) |
| Playback start latency | < 2 seconds globally |
| Concurrent viewers | 100M+ DAU, 10M+ concurrent streams |
| Buffering ratio | < 1% of playback time |
| Availability | 99.99% for streaming, 99.9% for uploads |
| Storage durability | 99.999999999% (11 nines) via replicated object storage |
Capacity Estimation
Assume 100 million daily active users, 500K video uploads per day, and an average watch session of 30 minutes.
| Metric | Estimate |
|---|---|
| Videos uploaded/day | 500,000 |
| Average raw video size | 1 GB |
| Storage per video (all resolutions) | ~5 GB (360p + 480p + 720p + 1080p + 4K) |
| Daily new storage | 500K × 5 GB = 2.5 PB/day |
| Total storage (5 years) | ~4.5 EB (before dedup/compression) |
| Concurrent streams at peak | ~10M |
| Average bitrate served | 5 Mbps (1080p) |
| Peak egress bandwidth | 10M × 5 Mbps = 50 Tbps |
| CDN cache hit ratio (target) | > 95% |
The economics of video streaming are dominated by CDN egress costs and storage costs. A 95%+ cache hit ratio at CDN edges is critical — without it, origin bandwidth alone would be prohibitive.
High-Level Design
Two distinct flows operate independently: the upload pipeline (write path) and the streaming path (read path).
UPLOAD FLOW:
┌──────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Creator │────►│ Upload │────►│ Object │────►│ Transcoding │
│ Client │ │ Service │ │ Storage │ │ Service │
└──────────┘ └──────────────┘ │ (Raw Video) │ │ (DAG Engine) │
└──────────────┘ └──────┬───────┘
│
┌──────────────┐ ┌──────▼───────┐
│ Metadata │◄────│ Post-Process │
│ Database │ │ (Thumbnails, │
└──────────────┘ │ DRM, Index) │
└──────┬───────┘
│
┌──────▼───────┐
│ CDN Origin │
│ (Processed │
│ Segments) │
└──────────────┘
STREAMING FLOW:
┌──────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Viewer │────►│ CDN Edge │────►│ CDN Regional │────►│ CDN Origin │
│ Client │◄────│ PoP │◄────│ Cache │◄────│ / Object │
└──────────┘ └──────────────┘ └──────────────┘ │ Storage │
└──────────────┘
Core components:
- Upload Service — Handles chunked uploads with resume support. Generates pre-signed URLs for direct-to-storage uploads.
- Transcoding Service — A DAG-based pipeline that splits, encodes, and packages videos into multiple resolutions and formats.
- Object Storage — S3-compatible storage for raw uploads and processed video segments.
- Metadata Database — Video metadata, user info, view counts. Relational (PostgreSQL) for strong consistency on critical data; a denormalized view in Elasticsearch for search.
- CDN — Multi-tier edge network serving video segments to viewers worldwide.
- Recommendation Service — Generates personalized feeds from user activity signals.
Deep Dive
Video Upload Pipeline
Chunked upload with resumption:
Large video files (up to 10 GB) cannot be uploaded as a single HTTP request — network interruptions would force a full restart. The upload protocol splits files into 5 MB chunks.
- Client requests an upload session from the Upload Service, which returns a
sessionIdand a pre-signed URL per chunk. - Client uploads chunks in parallel (up to 4 concurrent) directly to Object Storage, skipping the application tier.
- Each chunk upload returns an ETag. The client tracks uploaded chunks locally.
- On network failure, the client queries the Upload Service for already-received chunks and resumes from the first missing chunk.
- After all chunks land, the client calls “complete upload,” and the Upload Service triggers the transcoding pipeline.
DAG-based transcoding pipeline:
Transcoding a single video involves many interdependent tasks. A directed acyclic graph (DAG) models task dependencies:
┌──────────┐
│ Validate │
│ Input │
└─────┬────┘
│
┌─────▼────┐
│ Split │
│ into │
│ segments │
└─────┬────┘
│
┌──────────────┼──────────────┐
│ │ │
┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐
│ Encode │ │ Encode │ │ Encode │
│ 360p │ │ 720p │ │ 1080p │ ... (4K)
└─────┬─────┘ └─────┬─────┘ └─────┬─────┘
│ │ │
└──────────────┼──────────────┘
│
┌─────▼────┐
│ Generate │
│ Manifest │
└─────┬────┘
│
┌──────────────┼──────────────┐
│ │ │
┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐
│ Generate │ │ Package │ │ Extract │
│ Thumbnails │ │ DRM │ │ Captions │
└─────┬─────┘ └─────┬─────┘ └─────┬─────┘
│ │ │
└──────────────┼──────────────┘
│
┌─────▼────┐
│ Publish │
│ to CDN │
│ Origin │
└──────────┘
Each encoding resolution runs in parallel — this is where structured concurrency from Java 25 delivers significant value:
// DAG-based transcoding with structured concurrency
public record TranscodeResult(
Resolution resolution,
String outputPath,
long durationMs
) {}
public enum Resolution {
P360(640, 360, 800_000),
P720(1280, 720, 2_500_000),
P1080(1920, 1080, 5_000_000),
P4K(3840, 2160, 15_000_000);
private final int width, height, bitrate;
Resolution(int width, int height, int bitrate) {
this.width = width;
this.height = height;
this.bitrate = bitrate;
}
public int width() { return width; }
public int height() { return height; }
public int bitrate() { return bitrate; }
}
public List<TranscodeResult> transcodeAllResolutions(String inputPath, String jobId)
throws InterruptedException, ExecutionException {
// Split input into segments first
List<String> segments = splitIntoSegments(inputPath);
// Encode all resolutions in parallel using structured concurrency
try (var scope = StructuredTaskScope.open()) {
List<StructuredTaskScope.Subtask<TranscodeResult>> subtasks =
Arrays.stream(Resolution.values())
.map(res -> scope.fork(() -> {
long start = System.nanoTime();
String output = encodeSegments(segments, res, jobId);
long elapsed = (System.nanoTime() - start) / 1_000_000;
return new TranscodeResult(res, output, elapsed);
}))
.toList();
scope.join();
return subtasks.stream()
.map(StructuredTaskScope.Subtask::get)
.toList();
}
}
All four encoding tasks launch simultaneously. If any encoding fails, structured concurrency cancels the remaining tasks, preventing wasted compute on a job that will ultimately fail.
Adaptive Bitrate Streaming
HLS vs DASH:
| Feature | HLS (HTTP Live Streaming) | DASH (Dynamic Adaptive Streaming) |
|---|---|---|
| Developed by | Apple | MPEG consortium (open standard) |
| Manifest format | .m3u8 (playlist) | .mpd (XML) |
| Segment format | .ts or .fmp4 | .m4s (fragmented MP4) |
| Browser support | Native on Safari/iOS; polyfill elsewhere | Native on Chrome/Firefox/Edge via MSE |
| DRM | FairPlay | Widevine, PlayReady |
Most services support both protocols. A manifest file lists all available quality levels with their bitrate and resolution:
#EXTM3U
#EXT-X-STREAM-INF:BANDWIDTH=800000,RESOLUTION=640x360
360p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=2500000,RESOLUTION=1280x720
720p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080
1080p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=15000000,RESOLUTION=3840x2160
4k/playlist.m3u8
Client-side adaptive logic:
The video player measures download throughput for each segment. If the last three segments downloaded at 8 Mbps, the player can safely request 1080p (5 Mbps). If throughput drops to 1.5 Mbps, it switches down to 720p. The switching algorithm balances between:
- Avoiding quality oscillation — Hysteresis thresholds prevent rapid switching between quality levels.
- Buffer health — If the playback buffer drops below 5 seconds, the player immediately drops to the lowest available quality to refill the buffer.
Segment size trade-offs:
| Segment Duration | Pros | Cons |
|---|---|---|
| 2 seconds | Fast quality switching; lower latency for live | More HTTP requests; larger manifest; more CDN overhead |
| 6 seconds | Balanced | Standard choice for VOD |
| 10 seconds | Fewer requests; better compression | Slow to adapt to bandwidth changes |
CDN Architecture
Multi-tier caching:
Viewer Request Flow:
Cache Miss
┌────────┐ ┌───────────┐ ┌───────────────┐ ┌──────────────┐
│ Viewer │───►│ Edge PoP │───►│ Regional Cache │───►│ Origin Shield│───► Object Storage
│ │◄───│ (200+ loc)│◄───│ (20 locations) │◄───│ (2 locations)│
└────────┘ └───────────┘ └───────────────┘ └──────────────┘
Cache Hit Cache Hit Cache Hit
(90%+ of req) (5-8% of req) (1-2% of req)
- Edge PoPs (Points of Presence) — 200+ locations worldwide, closest to viewers. Handle 90%+ of requests for popular content. Each PoP stores the most popular video segments based on regional viewing patterns.
- Regional Caches — 20 locations per continent. Store a broader tail of content. Fill misses from Edge PoPs.
- Origin Shield — A caching layer in front of Object Storage that collapses concurrent cache-miss requests for the same segment into a single origin fetch. Protects Object Storage from thundering herd during viral events.
Cache warming:
When a popular creator uploads a new video, passive cache filling (wait for first viewer request) creates poor initial experience. Active cache warming pushes the first few minutes of transcoded video to Edge PoPs in the creator’s top-10 geographies before the video goes live. This ensures the first viewers get cache hits.
Geographic routing:
DNS-based Anycast routes viewers to the nearest Edge PoP. For finer control, HTTP 302 redirects steer viewers to a specific PoP based on real-time load and cache residency. If PoP-A is overloaded, redirect viewers to PoP-B even if it is marginally farther.
// Cache eviction policy simulator — LRU with size-weighted scoring
public class SizeWeightedLRUCache<K> {
private record CacheEntry<K>(K key, long sizeBytes, Instant lastAccessed) {}
private final long maxCapacityBytes;
private long currentSizeBytes;
private final Map<K, CacheEntry<K>> entries;
private final PriorityQueue<CacheEntry<K>> evictionQueue;
public SizeWeightedLRUCache(long maxCapacityBytes) {
this.maxCapacityBytes = maxCapacityBytes;
this.currentSizeBytes = 0;
this.entries = new ConcurrentHashMap<>();
// Evict the entry with lowest score: recency / size
// Large, stale entries get evicted before small, recent ones
this.evictionQueue = new PriorityQueue<>(
Comparator.comparingDouble(e ->
(double) e.lastAccessed().toEpochMilli() / e.sizeBytes()
)
);
}
public void put(K key, long sizeBytes) {
evictUntilFits(sizeBytes);
var entry = new CacheEntry<>(key, sizeBytes, Instant.now());
entries.put(key, entry);
evictionQueue.offer(entry);
currentSizeBytes += sizeBytes;
}
public boolean get(K key) {
CacheEntry<K> entry = entries.get(key);
if (entry == null) return false; // cache miss
// Refresh access time
evictionQueue.remove(entry);
var refreshed = new CacheEntry<>(key, entry.sizeBytes(), Instant.now());
entries.put(key, refreshed);
evictionQueue.offer(refreshed);
return true; // cache hit
}
private void evictUntilFits(long requiredBytes) {
while (currentSizeBytes + requiredBytes > maxCapacityBytes
&& !evictionQueue.isEmpty()) {
CacheEntry<K> victim = evictionQueue.poll();
entries.remove(victim.key());
currentSizeBytes -= victim.sizeBytes();
}
}
}
The eviction policy scores entries by lastAccessedTime / sizeBytes. A 500 MB video segment accessed 2 hours ago scores lower (evicted sooner) than a 5 MB thumbnail accessed 1 hour ago. This prevents large stale content from occupying cache space that many smaller popular assets could use.
Content Storage
Video segment storage:
All transcoded video segments (.ts or .fmp4 files, 2–6 seconds each) live in S3-compatible object storage. A 1-hour 1080p video at 6-second segments produces ~600 segments per resolution, or ~2,400 total objects across four quality levels. With 500K uploads/day, that adds ~1.2 billion new objects daily.
Metadata database schema:
videos: video_id (PK), channel_id, title, description, status,
duration_sec, upload_time, manifest_url
channels: channel_id (PK), owner_user_id, name, subscriber_count
view_events: video_id, user_id, watch_duration_sec, timestamp
(append-only event log → Kafka → analytics pipeline)
Separation of concerns:
- Hot data (video metadata, channel info) → PostgreSQL with read replicas and Redis caching.
- Analytics data (view events) → Append to Kafka, consumed by a Flink/Spark pipeline for aggregation.
- Search → Elasticsearch index built from video metadata + auto-generated captions.
View count aggregation:
Exact real-time view counts at YouTube scale are infeasible — a viral video can see millions of views per minute. The approach uses two layers:
- Approximate real-time counter — A Redis
INCRcounter per video, displayed as “12M views.” The counter uses probabilistic incrementing at high rates (at > 10K views/sec, only 1-in-10 view events increment Redis) to reduce write load. - Exact batch counter — A hourly batch job (Spark) processes the view event log from Kafka and writes the exact count to the metadata database. The frontend prefers the batch count when available, falling back to the approximate counter for recently viral videos.
Recommendation Engine
The recommendation system consists of two stages: candidate generation and ranking.
Candidate generation produces a broad set of videos the user might enjoy:
- Collaborative filtering — “Users who watched X also watched Y.” Builds a user-video interaction matrix and finds similar users via approximate nearest neighbor search (HNSW index). Generates ~500 candidates.
- Content-based filtering — Computes embeddings for each video (from title, tags, thumbnail, audio transcript) and retrieves videos similar to the user’s recent watches. Generates ~200 candidates.
- Subscription feed — New uploads from subscribed channels. Generates ~50 candidates.
Ranking takes the ~750 merged candidates and scores each with a lightweight ML model (gradient-boosted trees or a small neural net) using features like:
- User engagement history with this creator
- Video freshness (upload time)
- Video quality signals (watch-through rate, like ratio)
- Context (time of day, device type)
The top 20–50 ranked videos compose the homepage feed. The “up next” sidebar uses the same pipeline scoped to videos related to the currently playing video.
Real-time personalization:
User actions (watch, like, skip) flow into a Kafka topic. A Flink stream processor updates the user’s activity profile in real time. The next feed request incorporates these fresh signals — a user who watched 3 cooking videos in a row starts seeing cooking content within minutes, not hours.
Bottlenecks & Scaling
| Bottleneck | Mitigation |
|---|---|
| Upload spike handling — A major event causes 10× normal upload volume. | Auto-scaling transcoding workers on Kubernetes. Priority queue separates paid creators from free-tier. Deferred processing: accept the upload immediately, return a “processing” status, and transcode asynchronously. |
| Viral video thundering herd — A video goes viral; millions of requests hit the CDN simultaneously. | CDN edge absorbs the load — viral content has near-100% cache hit ratio. Origin Shield collapses concurrent misses. Pre-populate edge caches for trending content detected by the analytics pipeline. |
| Database hotspots for viral videos — View count writes and metadata reads concentrate on one row. | Cache metadata in Redis with a 60-second TTL. Use the approximate counter pattern for view counts. Shard the view event log by video ID hash. |
| Cost optimization — Storing every video at 4 resolutions forever is expensive. | Tiered storage: move videos with < 10 views/month to cold storage (Glacier-class). Re-transcode from cold raw footage if a cold video suddenly gets traffic. Delete raw uploads 30 days after confirmed transcoding success. |
| CDN cost per GB — CDN egress is the largest line item. | Negotiate committed-use pricing. Implement peer-to-peer (WebRTC-based) content delivery for popular live events to reduce CDN load by 30–40%. Serve lower default quality in cost-sensitive regions. |
| Transcoding compute cost — 4K transcoding uses significant GPU time. | Transcode 4K only if the source is 4K. Use hardware-accelerated encoding (NVENC/QSV). Cache popular encoding presets. Spot/preemptible instances for non-urgent transcoding jobs. |
Interviewer Tips
Common follow-up questions interviewers ask:
-
“How would you handle live streaming differently?” — Live streaming replaces the upload pipeline with a real-time ingest pipeline. The creator streams via RTMP to an ingest server, which transcodes in real time to HLS/DASH segments (2-second segments for low latency). Segments are pushed to CDN edges immediately. The manifest file is updated every 2 seconds. Latency target: 5–15 seconds for standard live, < 3 seconds for “ultra-low latency” (using CMAF chunked transfer or WebRTC).
-
“How do you handle copyright detection?” — A Content ID system computes audio/video fingerprints of uploaded content and matches them against a database of copyrighted works. This runs as a step in the transcoding DAG, before the “Publish to CDN” step. Matches trigger configurable policies: block, monetize for the rights holder, or allow with attribution.
-
“How would you design the comment system at scale?” — Comments are stored in a separate service with its own datastore spanned by
(videoId, commentId). Top-level comments and replies form a tree structure (adjacency list model). For videos with millions of comments, load the top-50 sorted by relevance (ML-scored), with “load more” pagination via cursor-based queries. -
“How do you guarantee no data loss for uploads?” — Chunked uploads with server-side checksums (SHA-256) per chunk. The Upload Service does not acknowledge the complete upload until all chunks are verified and replicated across 3 availability zones in object storage. The client retains the local file until the server confirms successful transcoding.
-
“How does the system handle different video formats and codecs?” — The transcoding pipeline normalizes all inputs. First step: probe the input with FFprobe to detect codec, resolution, frame rate, and audio format. The DAG adapts: if the input is already H.264 at 720p, skip the 720p H.264 encode. Output codecs typically include H.264 (broad compatibility) and VP9/AV1 (better compression, lower CDN cost for supported clients).
-
“How would you implement video chapters and timestamps?” — Creators provide chapter markers as metadata (timestamp + label). These are stored in the video metadata and embedded in the HLS/DASH manifest as cue points. The player UI renders a segmented progress bar with chapter labels on hover.