System Design Article
Design YouTube (Video Platform)
Difficulty: Medium
Design a video platform like YouTube with 2 billion users, 500 hours of video uploaded every minute, and 1 billion hours watched per day. The interview centerpiece is the video pipeline: chunked uploads, parallel transcoding to 8 resolutions and 3 codecs, HLS/DASH adaptive streaming over a global CDN, and the metadata service that ties it all together. We also cover recommendations (the secondary feed problem), comment scaling, view-counter accuracy, and how YouTube serves 200 Tbps of egress without melting the internet.
Design YouTube (Video Platform)
Design a video platform like YouTube with 2 billion users, 500 hours of video uploaded every minute, and 1 billion hours watched per day. The interview centerpiece is the video pipeline: chunked uploads, parallel transcoding to 8 resolutions and 3 codecs, HLS/DASH adaptive streaming over a global CDN, and the metadata service that ties it all together. We also cover recommendations (the secondary feed problem), comment scaling, view-counter accuracy, and how YouTube serves 200 Tbps of egress without melting the internet.
1,138 views
18
Requirements
Functional Requirements
- Upload a video (up to several hours, dozens of GB).
- Watch a video with adaptive quality (auto-switch between 144p and 4K based on bandwidth).
- Like, dislike, comment on videos.
- Subscribe to channels; get a subscription feed.
- Search videos by title, description, channel.
- Recommendations ('next' video, home feed).
- View counts (with anti-fraud).
Out of Scope (state explicitly)
- Live streaming (different problem; mention briefly).
- Monetization, ads, copyright/Content-ID system.
- YouTube Shorts (similar to TikTok; covered in TikTok case study).
- Premium subscription, music, kids.
Non-Functional Requirements
- Scale: 2B users, 500 hours uploaded per minute, 1B hours watched per day.
- Read-heavy: orders of magnitude more watch time than upload.
- Latency: video starts playing in < 2 seconds (time-to-first-frame), video chunks delivered with < 200 ms RTT to absorb buffering.
- Highly available: 99.99%. YouTube outages make global news.
- Durability: 11 nines. Lost videos = lost users.
- Eventually consistent counters (likes, views) within seconds.
Back-of-the-Envelope Estimation
Upload
---------- Upload volume ----------
Upload rate: 500 hours / minute = 8.3 hours/sec
Average video length: 10 min
Uploads per minute: 500 hr * 60 / 10 = 3000 videos
Uploads per second: ~50 videos/sec
Uploads per second peak: ~150 videos/sec (3x)
Bitrate at upload: ~5-10 Mbps for 1080p source
Ingest bandwidth: 500 hr * 60 sec * 7 Mbps ~= 35 Gbps sustainedTranscoding output
Each uploaded video is transcoded to multiple bitrates and codecs:
---------- Per-video transcoding output ----------
Resolutions: 144p, 240p, 360p, 480p, 720p, 1080p, 1440p, 4K = 8 resolutions
Codecs: H.264 (universal), VP9, AV1 = 3 codecs
Variants: 8 * 3 = 24 outputs per video
Approximate sizes (10 min video, average bitrates):
144p: ~30 MB
360p: ~80 MB
720p: ~250 MB
1080p: ~500 MB
4K: ~2.5 GB
Sum across all variants: ~6 GB per 10 min sourceStorage
---------- Storage growth ----------
Per day: 500 hr/min * 1440 min/day = 720,000 hours/day
Per day: 72,000 ten-minute videos /day
Storage: 72,000 * 6 GB = 432 TB / day of finished video assets
Per year: ~158 PB / year
5 years: ~800 PB just for videos
With replication (3x): ~2.4 EB
Long tail observation: 95% of videos have < 1000 views.
Tiered storage:
- Hot (top 5%): S3 Standard (or equivalent) + CDN edge cache.
- Warm (next 20%): S3 IA, fetched on demand.
- Cold (bottom 75%): Glacier-like; fetch is slow but cheap.Egress
---------- Egress bandwidth ----------
Watched hours/day: 1B
Average bitrate: ~3 Mbps (mix of mobile + desktop)
Daily egress bytes: 1B * 3600 sec * 3 Mbps / 8 = 1.35 EB / day
Sustained egress: ~125 Tbps
Peak (2x): ~250 Tbps100% of egress is from the CDN. Origin sees a tiny fraction.
High-Level Design
---------- High-level architecture ----------
+----------+
| Client |
+----------+
|
v
+-------------+
| Cloudfront | <- 99% of segment fetches served here
+-------------+
| |
v v
+----------+ +-----------------+
| Origin | | Metadata API |
| Storage | +-----------------+
| (S3) | |
+----------+ +------+------+
^ | |
| v v
+----+-----+ +----------+ +----------+
| Transcode| | Postgres | | Search |
| Workers | | (videos, | | (Elastic)|
+----------+ | users) | +----------+
^ +----------+
| ^
| |
+----+-----+ +----------+
| Kafka |<---| Recommend|
| (events) | | Service |
+----------+ +----------+
|
v
+----------+
| Upload |
| Service |
+----------+
^
|
Client (chunked PUTs)API Design
// 1. Initiate upload
POST /api/v1/videos/upload-init
{ "file_size": 5000000000, "content_type": "video/mp4", "title": "My talk" }
// Response
{
"video_id": "01HW...",
"upload_session": "sess_abc",
"chunk_size": 8388608, // 8 MB
"upload_url_pattern": "https://s3.../uploads/<video_id>/chunk_{n}?X-Amz-Signature=..."
}
// 2. Client PUTs chunks in parallel to S3 directly
// 3. Finalize upload
POST /api/v1/videos/upload-complete
{ "video_id": "01HW...", "upload_session": "sess_abc", "chunks": 596 }
// Response
{ "video_id": "01HW...", "status": "processing" }
// 4. Watch a video (returns the master manifest)
GET /api/v1/videos/:id/manifest.m3u8
// Response: HLS master playlist with variants for each bitrate
// 5. Client picks a bitrate, fetches segment manifests, then segments
GET https://cdn.../v/01HW.../1080p/segment_001.tsWatch Path (the hot read)
- Client GETs video metadata + master manifest.
- Client picks initial bitrate based on bandwidth heuristic, fetches variant manifest.
- Client streams 6-second segments from CDN, switching variants up/down based on real-time bandwidth measurement.
- View event sent fire-and-forget to View Counter Service after the user watches > 30 seconds (anti-fraud threshold).
- Player asynchronously fetches recommendations for 'up next' from Recommend Service.
---------- Time-to-first-frame budget (target < 2 sec) ----------
DNS + connect to CDN: 50 ms
Fetch master manifest: 80 ms
Fetch variant manifest: 80 ms
Fetch first segment: 500 ms (depends on bitrate)
Video decode + first frame: 100 ms
Total: ~810 ms (well under 2 sec)Detailed Design
The two interesting components are the video upload + transcoding pipeline and the adaptive streaming + CDN strategy.
Upload Pipeline
Why chunked uploads?
Video files are huge (1 GB - 50 GB+). Single-PUT uploads:
- Time out on slow connections.
- Cannot resume on failure (start over from byte 0).
- Saturate single TCP connections (limited by congestion control).
Chunked uploads (multipart upload in S3 terminology):
- 8-100 MB per chunk; client uploads chunks in parallel.
- Resumable: a failed chunk just retries.
- Faster: parallel TCP connections fill more bandwidth.
Resumability
Client persists upload_session and chunks_uploaded locally. On reconnect, asks server which chunks are missing and retries them. S3 multipart upload is built for this.
Transcoding Pipeline
This is the most interesting part of YouTube's backend.
The naive approach (don't do this)
After upload completes, run a single ffmpeg job that produces all 24 variants. For a 60-min source, this takes hours per video on one machine. With 3000 uploads/min, you need an absurd amount of compute, and a single failure restarts the whole thing.
The chunked transcoding approach
---------- Transcoding pipeline ----------
1. Splitter: splits source video into 30-second GOP-aligned chunks.
- 60 min video -> 120 chunks.
2. Each chunk goes into Kafka topic transcode.chunk.
3. Transcode Workers (autoscaled fleet, ~10K instances):
- Each pulls a chunk, transcodes to ALL 24 variants in parallel ffmpeg processes.
- Writes outputs to S3 under deterministic keys.
- Emits transcode.chunk.done.
4. Stitcher: when all 120 chunks * 24 variants are done, assembles the manifests.
- Master playlist (.m3u8) lists variants.
- Variant playlist lists chunks (URLs in CDN).
5. Marks video status='ready' in metadata.
6. Emits video.ready event for fan-out (notify subscribers).Key insight: transcode in parallel by chunks, not by video. A 60-min video uses 120 workers concurrently and finishes in the time of one chunk transcode (~1-2 min) instead of hours. The cost is the same total compute, but the wall clock is bounded.
GOP alignment
Each chunk must be a Group of Pictures (GOP) aligned to a keyframe so it can decode independently. Splitter inserts keyframes every 30 seconds during the split if needed (cheap operation).
Codec selection
- H.264: universal compatibility, software-decode on every device. The default.
- VP9: ~30% smaller than H.264 at same quality. Supported on most modern browsers.
- AV1: ~50% smaller than H.264. Slow to encode (10x H.264). Saves bandwidth at scale; only worth it for popular videos.
Real YouTube transcodes only the most-watched percentile to AV1 because of the encoding cost. For long-tail videos (which are 95% of uploads but 5% of watch hours), H.264 only.
Adaptive Bitrate Streaming
HLS (HTTP Live Streaming)
- Master playlist (.m3u8) lists variants:
#EXTM3U
#EXT-X-STREAM-INF:BANDWIDTH=400000,RESOLUTION=240x144
v/01HW.../144p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=2500000,RESOLUTION=1280x720
v/01HW.../720p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=8000000,RESOLUTION=1920x1080
v/01HW.../1080p/playlist.m3u8- Variant playlist lists segments (chunks):
#EXTM3U
#EXT-X-TARGETDURATION:6
#EXTINF:6.0,
segment_001.ts
#EXTINF:6.0,
segment_002.ts- Client downloads a segment, measures throughput, decides whether to step up/down for the next segment.
MPEG-DASH
Same concept, different format (XML manifest, .mpd). Better codec flexibility (DRM, multiple audio tracks). YouTube actually uses DASH primarily; HLS is the iOS/Safari path.
Why both?
iOS Safari requires HLS. Chrome supports both but DASH is more standardized for newer codecs. YouTube serves whichever the client requests. Both reference the same underlying transcoded segments; only the manifest format differs.
Client-side bitrate adaptation
class AbrSelector:
def __init__(self, variants):
self.variants = sorted(variants, key=lambda v: v.bandwidth)
self.current = 0
self.bandwidth_estimate = 0 # bps
def on_segment_complete(self, segment_size_bytes, download_time_sec):
self.bandwidth_estimate = (segment_size_bytes * 8) / download_time_sec
# Pick highest variant whose bandwidth <= 0.8 * estimate (safety margin)
candidate = max(
(i for i, v in enumerate(self.variants) if v.bandwidth <= self.bandwidth_estimate * 0.8),
default=0,
)
self.current = candidateCDN Strategy
Edge caching
Video segments are immutable (URL includes a content hash or version). Set Cache-Control: public, max-age=31536000, immutable. The CDN caches forever.
Tiered cache
- Edge (close to user): caches segments seen recently. Hit rate ~90% for popular content.
- Regional shield: catches edge misses. Hit rate ~99% combined.
- Origin (S3): caches at the regional shield ensure that even on cold edges, origin sees < 1% of traffic.
Pre-warming
When a video is predicted to be popular (initial uploader has many subscribers, similar videos went viral), pre-push the most-requested variants (480p, 720p, 1080p H.264) to all edges. This is cheaper than serving the first million views as cache misses.
Cold tail strategy
Long-tail videos (< 1000 views/day) get NO pre-warming. First view is a cache miss; subsequent views in that region cache normally. Acceptable because cold-tail watch time is small.
View Counter Service
View counts at YouTube scale require:
- High write throughput (hundreds of thousands of view events/sec at peak).
- Anti-fraud (don't count refresh-spammers, bots).
- Eventually consistent display (within minutes is fine).
Design:
- Player sends view event after watching > 30 seconds.
- Event lands in Kafka.
- View Counter consumer: a. Dedupe per (video_id, user_id, day) using Bloom filter or Redis SET. b. Anti-fraud: rate-limit per IP (10 views/min/IP from a single IP is suspicious). c. Increment a Redis counter; periodically flush to Postgres.
- Display 'about 1.2M views' (rounded) for counts > 1000 to absorb minor inaccuracies.
Recommendation Service
The full ML pipeline is its own beast (covered in the Recommendation Systems advanced lesson). For YouTube design, the architectural notes:
- Recommendations are precomputed per user offline (batch ML pipeline).
- Stored as Redis lists keyed by user_id.
- Re-ranked at read time with online signals (recently watched, current session).
- Cold-start for new users: trending videos by region.
Data Model
Postgres (sharded by video_id): video metadata
CREATE TABLE videos (
id BIGINT PRIMARY KEY,
channel_id BIGINT NOT NULL,
title VARCHAR(100) NOT NULL,
description TEXT,
duration_sec INTEGER,
status VARCHAR(16), -- uploading | processing | ready | deleted | takedown
visibility VARCHAR(16), -- public | unlisted | private
upload_at TIMESTAMPTZ,
published_at TIMESTAMPTZ,
view_count BIGINT DEFAULT 0, -- denormalized; eventually consistent
like_count BIGINT DEFAULT 0,
s3_master_key VARCHAR(255) -- pointer to manifest in S3
);Cassandra: comments, likes
CREATE TABLE video_comments (
video_id bigint,
comment_id bigint,
user_id bigint,
body text,
parent_id bigint,
created_at timestamp,
PRIMARY KEY ((video_id), comment_id)
) WITH CLUSTERING ORDER BY (comment_id DESC);
CREATE TABLE video_likes (
video_id bigint,
user_id bigint,
value tinyint, -- +1 like, -1 dislike, 0 neutral
PRIMARY KEY ((video_id), user_id)
);S3: video segments and manifests
Bucket: youtube-videos-prod
videos/<video_id>/master.m3u8
videos/<video_id>/master.mpd
videos/<video_id>/144p/playlist.m3u8
videos/<video_id>/144p/segment_001.ts
videos/<video_id>/144p/segment_002.ts
videos/<video_id>/720p/segment_001.ts
videos/<video_id>/4k/segment_001.ts
...Storage class lifecycle:
- 0-7 days: Standard (high access expected).
- 7-90 days: Standard-IA (access drops sharply).
- 90+ days for cold tail: Glacier Instant Retrieval (still serveable on cache miss, just slower).
Redis: hot counts, dedup, recommendations
view_count:<video_id>-> integer (live, flushed to Postgres every 60 sec).view_dedup:<video_id>:<day>-> Bloom filter or SET of user_ids.recs:<user_id>-> LIST of recommended video_ids (TTL 24h).
Scaling and Bottlenecks
Viral video: 100M views in a day
- CDN absorbs essentially 100%. Each edge serves a small slice; segment hit rate is ~99.99% for hot content.
- View counter handles 100M events/day = 1,200/sec average. Trivial for Kafka + Redis.
- Comment writes: hot video can have 10K comments/min. Cassandra absorbs partition writes.
Live event: 100M concurrent viewers
(Live streaming is technically out of scope, but commonly asked.)
- Different ingest path: live transcoder generates segments in real time with very low latency (~3 sec).
- CDN edge caches segments for the chunk duration (6 sec). With 6-sec TTL and 100M concurrent viewers, each segment serves 100M times from edge.
- Use a hierarchical CDN: edges fan in to regional caches that fan in to origin.
Storage cost is the dominant cost
- Tiered storage: 95% of bytes go to IA or Glacier within 90 days.
- Aggressive deduplication: identical uploads (re-uploads of the same source) detected by perceptual hash and served from a single set of segments.
- Newer codecs (AV1) reduce bytes per video at the cost of encode CPU. Use for the popular 5%; H.264 only for the long tail.
Transcoding fleet sizing
Ingest: ~50 videos/sec average, 150/sec peak. Each video chunk transcodes in ~1-2 min. Average video has ~20 chunks (10 min / 30 sec per chunk).
---------- Transcode worker math ----------
50 videos/sec * 20 chunks * 24 variants = 24,000 chunk-variant transcodes/sec
Each takes ~30 sec on a 4-core box (one variant at 1x speed)
Needed concurrency: 24,000 * 30 = 720,000 worker-seconds/sec
Fleet size: 720K cores / 4 cores per box = 180,000 boxes (rough order)That's a massive fleet, which is why YouTube spends serious money on transcoding. Mitigations: cheaper per-pixel encoders, hardware-accelerated transcoding (VP9 ASICs, AV1 ASICs), processing only popular content into expensive codecs.
Trade-offs and Alternatives
Why HLS + DASH instead of progressive download?
Progressive download (a single MP4 file) doesn't adapt to bandwidth. Buffering on a slow connection means waiting indefinitely. Adaptive streaming switches down to 360p so the user keeps watching, even on a 1 Mbps connection.
Why so many resolutions?
More variants = better adaptation = fewer rebuffers. 8 resolutions is a lot but each adds ~12% to storage; bitrate ladder optimization is its own field. Real YouTube tunes ladders per content type (cartoon vs nature documentary).
Why chunked transcoding instead of GPU per video?
GPU transcoding is faster per machine but doesn't parallelize one video across multiple GPUs cleanly. Chunking lets you use thousands of CPU cores simultaneously, completing a 1-hour video in 1-2 minutes wall-clock. GPUs are used for AV1 (where they're 10x faster than CPU) but the architectural pattern is still chunked.
Comments at scale: Cassandra vs Postgres
Viral video comments hit thousands of writes/sec on a single video. Cassandra partitioned by video_id absorbs this. Postgres would lock-contest. The cost: no JOINs on comments. We hydrate user info separately.
View count accuracy
We undercount slightly (Bloom filter false positives, anti-fraud filtering). For a 1B-view video, undercounting by 0.1% doesn't matter; we round display anyway. For monetization (per-view payout) we'd need exact counts via deduplicated event logs.
Why not BitTorrent / P2P delivery?
P2P video distribution exists (PeerTube uses WebTorrent) but:
- Mobile devices don't seed (battery, data plan).
- Users don't tolerate the latency variability.
- Operating costs of CDN are predictable; P2P quality of service isn't.
Real YouTube has explored P2P for live streaming (where redundancy matters) but stuck with CDN for VOD.
Single canonical bitrate ladder vs per-video tuning
A fixed ladder (144p, 240p, 360p, 480p, 720p, 1080p, 1440p, 4K at fixed bitrates) is operationally simple. Per-video ladders (using ML to pick the optimal bitrates per content type) save 20-30% bandwidth. Real YouTube does some per-video tuning; the canonical ladder is the fallback.
Real-World Examples
How real systems implement this in production
Video-on-demand platform serving 250M+ subscribers with a similar pipeline: upload, transcode to many variants, distribute via CDN. Critical difference: Netflix uploads come from studios (not users), so per-title encoding can be ML-optimized for hours per title, not minutes.
Trade-off: Netflix optimizes per-title bitrate ladders to save 20-30% bandwidth, justifying the longer encode time. YouTube cannot afford this because user uploads are too frequent. Trade-off: encoder optimization vs throughput; Netflix wins on quality, YouTube wins on volume.
Live video platform with very low latency (<3 sec stream-to-viewer for the lowest-latency mode). Uses HLS but with smaller segment sizes (2 sec) and aggressive prefetch. Transcoding happens in real time at ingest.
Trade-off: Lower latency means smaller segments and less buffer headroom. Twitch trades robustness on bad networks (more rebuffer events) for the live interactivity that's its core product. Live and VOD have different optimization targets.
Premium video host focused on creators and businesses. Same pipeline (chunked upload, multi-variant transcode, CDN distribution) but with extra emphasis on quality (better default ladders) and customization (custom players, no ads).
Trade-off: Vimeo trades scale (millions of users vs YouTube's billions) for quality and customization. Smaller fleet, fewer constraints, more polished UX. Same architecture, different optimization knobs.
Short-form video (15-60 sec) with a different access pattern: most videos are watched many times in a short window, then forgotten. Transcoding is simpler (shorter clips, fewer variants). Recommendation is far more aggressive (the For You Page is the entire product).
Trade-off: TikTok's bounded video length simplifies the pipeline (no need to tier old long videos) but shifts complexity to the recommendation engine. Long-form video (YouTube) and short-form (TikTok) have different scaling pathologies.
Quick Interview Phrases
Key terms to use in your answer
Common Interview Questions
Questions you might be asked about this topic
(1) Client requests upload-init; gets a session and chunked upload URLs. (2) Client uploads ~600 chunks of 8 MB in parallel to S3 directly via presigned URLs. (3) Client posts upload-complete; status flips to 'processing'. (4) Splitter splits source into 30-sec GOP-aligned chunks; emits Kafka events. (5) Transcode workers transcode each chunk to 24 variants in parallel, write outputs to S3. (6) Stitcher assembles manifests after all chunks done. (7) Status flips to 'ready'. (8) Friend opens the video; metadata API returns the manifest URL; player streams adaptive segments from CDN. End-to-end: minutes for upload + 1-3 min for transcode + immediate playback once ready.
CDN absorbs essentially all traffic. Hit rate at edge is >99% for hot content; regional shield absorbs the rest; origin sees < 1%. Pre-warm popular variants (480p, 720p, 1080p H.264) to edges proactively when virality is detected. View counter handles 100M events/day = 1,200/sec - trivial for Kafka. Comments are sharded by video_id in Cassandra and absorb thousands of writes/sec. The metadata service is read-cached. The user experience is identical to a normal video; the CDN does all the lifting.
Player picks the lowest reasonable bitrate for the first segment (often 144p or 240p) so it loads fast - typically < 500 ms even on 1 Mbps. Player measures throughput on segment 1 and steps up for segment 2. CDN edge serves segment 1 in < 100 ms RTT for most users (POP-distance based). Master manifest is small (~1 KB) and cached aggressively. Total budget: ~800 ms even on slow networks; faster on good ones.
Multiple layers. (1) Watch-time threshold: only count a view after >30 sec of actual playback (player-reported). (2) Per-(user, video, day) dedup via Bloom filter or Redis SET. (3) IP rate-limit: > 10 views/min from same IP is suspicious. (4) ML anomaly detection on view-velocity, geographic clustering, user-agent patterns. Suspect views land in a separate audit bucket; the public count uses the cleaned stream. For monetization, payouts use only audited views.
Live ingest: streamer's encoder pushes to a regional ingest point via RTMP or WebRTC. Live transcoder generates HLS/DASH segments in real time with ~3-6 sec latency. Segments uploaded to S3 with TTL = 24 hours, served via the same CDN. Manifests are dynamic (.m3u8 with EXT-X-PLAYLIST-TYPE:EVENT, refreshed every few seconds). Anti-DDoS at ingest, lower-latency transport (LL-HLS or WebRTC) for sub-2-sec latency products. After live ends, segments are reassembled into a normal VOD.
Interview Tips
How to discuss this topic effectively
First sentence: 'Storage and egress are the two cost centers, and they're both dominated by long-tail behavior.' This frames the entire conversation around what actually matters at YouTube scale.
Always describe the chunked transcoding pipeline. It's the canonical example of horizontal-scale batch work and demonstrates you've thought about parallelism.
Mention HLS vs DASH explicitly and pick both (different clients need different formats). Saying just 'I'll stream the video' is an instant downgrade.
Bring up tiered storage for the cold tail. 95% of videos have < 1000 views; storing them in S3 Standard is wasteful. This shows cost awareness.
Decouple view count from playback. Saying 'the player increments a counter on play' fails the scale test. Always: fire-and-forget event, asynchronous counter.
Common Mistakes
Pitfalls to avoid in interviews
Doing all transcoding for a video on a single machine
A 60-minute video has 120 chunks; transcoded to 24 variants on one machine, that's hours per video. Split into 30-second chunks, transcode each in parallel across thousands of workers, then stitch the manifest. Wall-clock drops from hours to minutes.
Serving the full video file as a single download
Progressive downloads don't adapt to bandwidth changes. On a slow connection the user waits indefinitely or gives up. Adaptive bitrate streaming (HLS/DASH) lets the player switch resolutions per segment, keeping playback going.
Storing all videos in S3 Standard regardless of view count
95% of videos have < 1000 lifetime views but consume 75% of storage. Tier them to S3 IA after 7 days and Glacier after 90 days. Bandwidth cost when an old video gets viewed is small compared to constant storage cost for billions of cold videos.
Counting views synchronously by incrementing a database column
View counts hit thousands per second on viral videos. Synchronous counters lock contend and limit throughput. Send fire-and-forget events to Kafka, dedupe and increment in Redis, periodically flush to durable storage. Display 'about 1.2M' to absorb eventual consistency.
Forgetting that the CDN does 99% of egress
At 100+ Tbps egress, your origin couldn't possibly serve directly. The CDN with edge + regional shield absorbs essentially all traffic. Origin sees < 1%. Designs that route playback through your application servers don't work at video scale.
