System Design Article
Back-of-the-Envelope Estimation & Capacity Planning
Difficulty: Medium
Back-of-the-envelope estimation is the math you do in three minutes to ground a system design in numbers. It is what tells you whether your single Postgres instance can handle the load (no), how much storage you need over five years (probably more than you think), and how much CDN bandwidth you are about to commit to (probably more than that). This lesson covers the standard latency / throughput / size / bandwidth numbers every engineer should have memorized, the unit conversions and order-of-magnitude reasoning that keep you fast, the templates for QPS, storage, and bandwidth estimation, capacity planning beyond steady state (peak vs average, headroom, growth, regional, seasonal), and the cost rough-arithmetic that turns 'we need more servers' into a defensible business case. The goal is to leave you able to walk into any interview or design review and produce useful numbers in three minutes flat.
Back-of-the-Envelope Estimation & Capacity Planning
Back-of-the-envelope estimation is the math you do in three minutes to ground a system design in numbers. It is what tells you whether your single Postgres instance can handle the load (no), how much storage you need over five years (probably more than you think), and how much CDN bandwidth you are about to commit to (probably more than that). This lesson covers the standard latency / throughput / size / bandwidth numbers every engineer should have memorized, the unit conversions and order-of-magnitude reasoning that keep you fast, the templates for QPS, storage, and bandwidth estimation, capacity planning beyond steady state (peak vs average, headroom, growth, regional, seasonal), and the cost rough-arithmetic that turns 'we need more servers' into a defensible business case. The goal is to leave you able to walk into any interview or design review and produce useful numbers in three minutes flat.
405 views
9
Motivation
A candidate sketches a Twitter design. The interviewer asks 'how many servers?'. The candidate stalls. They start trying to multiply 200,000,000 by 100 by 24 by 3600 in their head, lose track, restart, lose track again. After 90 seconds they say 'a lot' and try to move on. The interviewer's confidence drops.
This is unnecessary. Back-of-envelope estimation is not arithmetic skill; it is technique. With three numbers memorized (seconds in a day, conversions between bytes / KB / MB / GB / TB, basic latency constants) and three templates (QPS, storage, bandwidth) almost any system design estimation runs in 90 seconds.
The disciplined alternative is the engineer who, mid-conversation, says: '200M DAU, each producing 2 tweets means 400M tweets per day; 400M divided by 100,000 seconds per day is roughly 4,000 writes per second average; multiply by 3 for peak gives 12,000 peak write QPS; we cannot serve 12,000 writes per second from a single Postgres so we need either sharding or write-optimized storage'.
In 30 seconds that engineer just (a) demonstrated estimation skill, (b) grounded the design in numbers, (c) drove an architectural choice. That is the senior signal the framework is set up to elicit. This lesson is the technique.
Why this matters beyond interviews: real production engineering uses back-of-envelope math constantly. Capacity reviews, cost forecasts, on-call sizing decisions, 'should we autoscale or pre-provision?' all reduce to the same arithmetic. The interview tests it because the job requires it.
Deep Dive
The numbers to memorize
Three categories of constants that every senior engineer should have at instant recall.
Latency numbers (Jeff Dean's 'numbers every programmer should know', updated):
| Operation | Approximate latency |
|---|---|
| L1 cache reference | ~1 ns |
| Branch mispredict | ~3 ns |
| L2 cache reference | ~4 ns |
| Mutex lock / unlock | ~17 ns |
| Main memory reference | ~100 ns |
| Compress 1 KB with Zippy | ~2 us (~2,000 ns) |
| Send 1 KB over 1 Gbps network | ~10 us |
| Read 4 KB from SSD | ~16 us |
| Read 1 MB sequentially from memory | ~100 us |
| Round trip within same datacenter | ~500 us |
| Read 1 MB sequentially from SSD | ~1 ms |
| Disk seek (HDD) | ~10 ms |
| Read 1 MB sequentially from disk (HDD) | ~30 ms |
| Round trip CA to Netherlands | ~150 ms |
Rules of thumb derived from these:
- Memory is ~100x faster than SSD; SSD is ~10x faster than HDD seek.
- Same-datacenter network is ~500us; cross-region is ~50-200ms.
- Anything cross-datacenter is at least 10x slower than within-datacenter.
Throughput numbers:
| System | Approximate throughput |
|---|---|
| Single Postgres / MySQL on commodity hardware | ~1k-10k QPS reads, ~5k writes |
| Postgres with read replicas | ~100k QPS reads (with replicas) |
| Redis (single node) | ~100k-200k ops / sec |
| Redis Cluster | ~1M+ ops / sec |
| Cassandra / DynamoDB (well-modeled) | ~10k-100k QPS per node |
| Kafka (per partition) | ~10k-100k messages / sec |
| Nginx / Envoy as reverse proxy | ~50k-200k req / sec per node |
| Single modern server (HTTP API in Go / Rust) | ~10k-50k req / sec |
| Typical cloud VM network bandwidth | ~1-10 Gbps |
Size and time numbers:
| Value | |
|---|---|
| Seconds per day | 86,400 (~10^5; many engineers approximate as 100,000) |
| Seconds per year | ~3.15 * 10^7 (~30M) |
| 1 KB / 1 MB / 1 GB / 1 TB / 1 PB | 10^3 / 10^6 / 10^9 / 10^12 / 10^15 (decimal) |
| Bytes in a UUID | 16 |
| Bytes in a tweet | ~280 (text) |
| Bytes in a chat message | ~100 (text) |
| Bytes in a small image | ~100 KB |
| Bytes in an HD video minute | ~5-10 MB |
| Average web page weight | ~2 MB |
These numbers do not need to be exact. Memorize within an order of magnitude.
The QPS template
Given daily active users and per-user activity:
QPS_avg = DAU * actions_per_user_per_day / 86,400
QPS_peak = QPS_avg * peak_factor (typically 2-5x)Use 100,000 instead of 86,400 for mental arithmetic; the error is ~15% which is negligible at this stage.
Worked example: a chat app with 10M DAU, each sending 50 messages per day:
QPS_avg = 10,000,000 * 50 / 100,000 = 5,000 messages / sec
QPS_peak = 5,000 * 3 = 15,000 messages / sec at peakSecond example: a photo-sharing app with 500M DAU, each viewing 50 photos and uploading 0.5 per day:
Reads_QPS_avg = 500M * 50 / 100,000 = 250,000 reads / sec
Reads_QPS_peak = 250,000 * 3 = 750,000 reads / sec
Writes_QPS_avg = 500M * 0.5 / 100,000 = 2,500 writes / sec
Writes_QPS_peak = 2,500 * 5 = 12,500 writes / secRead-write ratio is 100:1, which immediately tells you 'cache aggressively; design the read path first'.
The storage template
Given object count and per-object size:
storage_per_day = objects_per_day * bytes_per_object * metadata_overhead
storage_per_year = storage_per_day * 365
storage_5_year = storage_per_year * 5Metadata overhead is typically 2-5x for indexed structured data, 1.1-1.5x for blob storage with simple metadata.
Worked example: a Twitter-like service. 400M tweets per day, 280 bytes per tweet, 5x metadata overhead (id, timestamps, user_id, indexes):
storage_per_day = 400,000,000 * 280 * 5 = 560 GB / day
storage_per_year = 560 * 365 = 200 TB / year
storage_5_year = 200 * 5 = 1 PBThis tells you 'one very large database' is wrong; you need either sharding or a wide-column store designed for this.
With media (1 in 5 tweets has a 1 MB image):
media_per_day = 80,000,000 * 1 MB = 80 TB / day
media_5_year = 80 * 365 * 5 = 146 PBMedia dominates storage cost; this drives the architectural decision to put media in object storage (S3) and thumbnails in a CDN.
The bandwidth template
Given QPS and per-request payload:
bandwidth = QPS * bytes_per_requestConvert to Gbps: divide by ~125,000,000 (since 1 Gbps ~= 125 MB/sec).
Worked example: 750k peak read QPS, 1 MB average response (image-heavy):
bandwidth_peak = 750,000 * 1 MB = 750 GB/sec ~= 6,000 GbpsThis is huge. Single-server NICs are ~10-25 Gbps; you cannot serve this from a small fleet. The architectural answer: CDN. CDN serves cached content at edge; origin sees only cache misses.
With CDN at 95% cache hit ratio:
origin_bandwidth = 6,000 Gbps * 0.05 = 300 GbpsStill significant but manageable across a fleet. This is exactly why image / video products always use CDNs.
Peak vs average
Real traffic is not uniform across 24 hours. Common peak-to-average ratios:
| Pattern | Peak / Average |
|---|---|
| B2B / SaaS in one timezone | ~3x (workday peak) |
| B2C global | ~2x (smoother due to multi-timezone) |
| Live event-driven (sports, election) | ~10-100x |
| Holiday shopping (Black Friday) | ~5-20x |
| Live streaming (concerts) | ~50-1000x |
Design for peak, not average. A system that handles average is in trouble half the time.
For really spiky workloads (live events, flash sales), the architecture has to be different: pre-provisioned capacity, queue absorption, degradation modes, edge caching. 'Auto-scale to peak' does not work when peak is 100x average and arrives in 30 seconds.
Headroom and growth
Never size for exactly your projected peak. Add:
- Headroom: 30-50% above projected peak so you do not page on a normal busy day. A system at 95% utilization is already in trouble.
- Growth: 1-2 years of projected user growth, depending on how fast you can re-architect.
- Failure: enough capacity that the loss of one AZ / region does not page. For three AZs: each must handle 50% load (so failure of one leaves the other two each at 75% with headroom).
A system designed for 100k peak QPS today is realistically sized to handle 200-300k QPS. The 2-3x is not waste; it is operational sanity.
Regional distribution
For multi-region systems, capacity planning is per region, not global. The arithmetic:
global_qps = sum over regions of regional_qps
regional_capacity = regional_qps * peak_factor * (1 + headroom) * failover_multiplierFailover multiplier (if the architecture is N+1 regions): each region must absorb (1 / N) of an absent region's traffic. For three regions, each region's spare capacity needs to handle ~50% extra during one-region outage.
For strongly-consistent multi-region writes, the geographic round-trip latency dominates: a write that needs quorum across continents will be ~100-300 ms regardless of how fast the database is. This often drives regional partitioning of the data ('users live in their home region').
Cost arithmetic
Rough cloud cost numbers (order-of-magnitude, 2024-2026):
| Approximate cost | |
|---|---|
| Cloud VM (4 vCPU, 16GB) | ~$100-200 / month on-demand; ~$30-70 with reserved instances |
| EC2 / GCP egress per GB | ~$0.05-0.12 |
| CloudFront / CDN egress per GB | ~$0.02-0.08 (volume discounts) |
| Inter-AZ network per GB | ~$0.01 |
| S3 standard storage per GB-month | ~$0.023 |
| S3 Glacier per GB-month | ~$0.004 |
| RDS Postgres db.r6g.large | ~$200-400 / month |
| DynamoDB on-demand per million writes | ~$1.25 |
| Lambda per million invocations | ~$0.20 (plus compute time) |
Worked example: serving 100 TB / month of CDN traffic at $0.05 / GB:
CDN cost = 100,000 GB * $0.05 = $5,000 / monthVery rough but enough to know whether you are talking about a $100/month design or a $1M/month design. The factor-of-10 distinction usually drives different conversations.
Note: cloud egress (data leaving the cloud) is the surprise line item that breaks startup budgets. A design that pulls a TB from your cloud per day is $36k / year just in egress.
Memory sizing
For caches, common rule of thumb: keep 5-20% of the dataset in cache, depending on access skew (heavy long-tail = more, uniform = less, Pareto = sweet spot at ~10%).
Worked example: 1 TB hot dataset, 10% in cache:
cache_size = 100 GB
nodes_needed = 100 GB / 32 GB per node = ~4 Redis nodesFour nodes plus replicas; this is a real Redis Cluster, not a single instance.
Connection pools and connection arithmetic
A frequent mistake: undersized DB connection pools cause request queueing far before the DB itself is saturated. Quick check: Postgres with max_connections = 100, app server pool of 20 connections per instance, 50 app instances = 1000 connections trying to land on 100. Half the requests are queued.
Rule of thumb: total connections from app fleet should be at most 75% of DB max_connections. Use connection pooling proxies (PgBouncer, RDS Proxy) when fleet is large.
When to use each architectural option (driven by numbers)
| Numbers | Architectural implication |
|---|---|
| Read QPS > 5k | Need read replicas or aggressive caching |
| Read QPS > 100k | Need CDN for static, multi-tier cache for dynamic |
| Write QPS > 5k on a single PK row range | Need sharding or write-optimized DB |
| Storage > ~1 TB | Need partitioned / sharded storage |
| Storage > ~100 TB | Object storage (S3) + lifecycle tiering |
| Bandwidth > ~10 Gbps origin | CDN required |
| Latency p99 < 50ms | Edge / regional architecture |
| Cross-region writes with strong consistency | Restructure to per-region partitioning, or accept 100-300ms latency |
| Peak / average > 10x | Pre-provisioned capacity + queue / degradation; auto-scaling alone is not enough |
This is the table you should be deriving in your head as you do the estimation. Numbers in -> architectural decisions out.
Implementation
A 90-second template you can run in any interview
Copy this structure mentally:
DAU: [number]
Actions per user per day: [reads, writes separately]
QPS avg: DAU * actions / 100,000
QPS peak: QPS_avg * peak_factor
Object size: [bytes per object]
Metadata overhead: [2-5x]
Storage per day: objects_per_day * size * overhead
Storage 5 year: daily * 365 * 5
Bandwidth peak: QPS_peak * payload_size
Gbps: bandwidth / 125 MB/sec / 8 [if needed]
Architectural conclusions from above:
- Caching needed? (yes if QPS > 5k or hit ratio matters)
- Sharding needed? (yes if write QPS > 5k or storage > 1TB)
- CDN needed? (yes if media + bandwidth > 10 Gbps origin)
- Multi-region? (yes if global users + latency < 100ms or regional residency)Three minutes; ten lines on the whiteboard; every architectural decision afterward is anchored to these numbers.
Worked example: a chat application
DAU 50,000,000
Messages per user per day 50
Messages per day 2,500,000,000
Writes per second avg 25,000
Writes per second peak 75,000 (3x)
Message size 100 bytes
Metadata overhead 3x (id, timestamps, sender, etc.)
Storage per day 750 GB
Storage 5 year 1.4 PB
Reads per user per day 150 (3x messages)
Reads per second avg 75,000
Reads per second peak 225,000
Read bandwidth peak 225k * 100B = 22 MB/sec ~= 180 Mbps
Write bandwidth peak 75k * 100B = 7 MB/sec ~= 60 Mbps
Architectural conclusions:
- 75k peak writes -> need sharding by chat_id or user_id
- 225k peak reads -> aggressive caching, maybe per-user inbox in Redis
- 1.4 PB over 5 years -> tiered storage; hot in fast store, old in cold
- Bandwidth modest (Mbps not Gbps) -> CDN unnecessary for messages themselves;
but media attachments would change this dramatically
- Real-time delivery at this scale -> WebSocket fleet sized for ~50M concurrent
connections; ~50-100k connections per server -> 500-1000 serversFour minutes of arithmetic, twelve lines, design conclusions clear.
Worked example: a video streaming service
DAU 500,000,000
Minutes watched per user per day 60
Total minutes watched 30,000,000,000 / day
Video bitrate 5 Mbps (HD average)
Bandwidth bytes per minute 37.5 MB
Daily egress 1.1 EB / day (1100 PB)
This is enormous. Conclusions:
- CDN at 95%+ cache hit ratio is mandatory
- Edge POPs around the world; user routes to nearest
- Origin handles only cache misses; ~5% of 1.1 EB = 55 PB / day = 5 Tbps
- Storage of catalog: 100k videos * 1 GB average = 100 TB hot,
plus regional replicas, plus transcoded variants (5x for resolutions) = ~5 PB
- Encoding pipeline: any newly uploaded video goes through transcoding to
multiple resolutions / codecs; massively parallel batch jobThree minutes; the design is now anchored. Without this, 'we'll use a CDN' is hand-waving; with it, 'we'll use a CDN at 95% hit ratio because the origin would otherwise serve 5 Tbps which is 1000 servers worth of network' is a senior answer.
Capacity planning beyond the interview
In real production, the same arithmetic feeds quarterly capacity reviews:
For each service:
current_peak_qps: [from monitoring]
growth_quarterly: [historical or forecast]
projected_peak_qps: current * (1 + growth)^quarters
capacity_today: [from autoscaling / load tests]
utilization_today: current_peak / capacity_today
capacity_needed: projected_peak * (1 + headroom) * failover_multiplier
delta: capacity_needed - capacity_today
cost_delta: delta * cost_per_unitEvery senior engineer should be running this for their services on a quarterly cadence. The math is the same as the interview, just continuous.
When to Use
Always estimate when
- Designing any system in an interview. No exceptions.
- Designing any new system in real work. Numbers ground decisions.
- Reviewing an existing design for scaling. Estimate before recommending changes.
Skip detailed estimation when
- The interviewer explicitly says 'do not bother with numbers; let's focus on design'. Honor the redirect.
- The system is small enough that any reasonable architecture will work. (A blog with 1000 readers does not need estimation.)
Use peak-driven sizing when
- The traffic is spiky and the cost of being undersized at peak is high (user-facing, revenue-generating).
- SLO compliance during peak matters.
Use average-driven sizing when
- The traffic is smooth and being briefly slow at peak is acceptable.
- The service is internal / batch and does not have hard latency requirements.
Use auto-scaling when
- Peak / average ratio is ~3-5x with predictable timing.
- Workload allows 30-60 second scaling delays.
- Cost optimization at scale matters.
Use pre-provisioned capacity when
- Peak / average ratio is >10x with sudden onset (live events, flash sales).
- Auto-scaling cannot react fast enough.
- The cost of underprovisioning (lost revenue, brand damage) exceeds the cost of pre-provisioning.
Use multi-region when
- User base is geographically distributed and latency matters.
- Regulatory data residency requires it.
- Single-region availability is insufficient (need to survive entire-region outages).
Case Studies
Jeff Dean's 'Numbers Every Programmer Should Know'
Google's Jeff Dean popularized a now-canonical list of latency numbers that engineers should have internalized: L1 cache, RAM, SSD, disk, network, transcontinental round-trip. The list (originally from a 2010 talk) has been republished and updated countless times. It exists because the relative magnitudes (memory is 100x faster than SSD which is 10-100x faster than disk seek which is 100x faster than transcontinental network) drive almost every architectural decision.
Lesson: internalizing relative magnitudes is more useful than memorizing exact numbers. The takeaway 'cache miss is 100x worse than cache hit; cross-region is 10,000x worse than in-process' is the real signal.
Engineering blog capacity write-ups
Many engineering teams have published capacity planning case studies: Stripe on quarterly capacity reviews, Discord on Cassandra capacity for messages, Uber on city-by-city capacity for ride matching, Cloudflare on edge capacity for DDoS absorption. The math in these posts is back-of-envelope plus refinements; the structure (current load, growth, headroom, cost) is identical to interview estimation.
Lesson: real production capacity planning is the same arithmetic as interview estimation, run continuously instead of once.
Black Friday / Cyber Monday at retail
Large retailers (Amazon, Target, Walmart, Shopify) publicly discuss the year-round preparation for Black Friday: traffic models, load tests at projected peak, pre-provisioned capacity that idles 51 weeks of the year, degradation modes for unexpected overload. The peak/average ratios at peak shopping events can exceed 50x, which is well past what auto-scaling can react to.
Lesson: estimation must include worst-case events, not just steady state. The architecture for a 50x spike is fundamentally different from the architecture for a 3x daily peak.
Live-streaming Super Bowl / cricket finals
Streaming platforms (Disney+ Hotstar, Twitch, YouTube Live) have published numbers from major live events: 30M+ concurrent viewers on a single match, peak bandwidth in the multi-Tbps range, peak QPS in the millions on a single live channel. The architectures these numbers force (massive CDN pre-provisioning, edge transcoding, queue absorption, degraded streaming modes) are directly visible in the public engineering write-ups.
Lesson: at the extreme high end of capacity planning, the math drives entirely different architectures (edge-first, pre-provisioned, multi-quality fallback) that cannot be reverse-engineered without doing the estimation first.
Quick Review
- Memorize the latency numbers (Jeff Dean's list), the throughput numbers per major system, and the size / time constants.
- The QPS template:
DAU * actions / 100,000for average, multiply by peak factor for peak. - The storage template:
objects_per_day * bytes_per_object * metadata_overhead, then multiply for years. - The bandwidth template:
QPS * bytes_per_request, divide by 125 MB/sec for Gbps. - Always design for peak, not average. Peak / average is 2-5x for B2C, 3-10x for B2B, much higher for live events.
- Add 30-50% headroom and 1-2 years of growth on top of projected peak.
- For multi-region: per-region capacity must include failover absorption.
- Cost arithmetic is order-of-magnitude: $100/mo vs $1k/mo vs $10k/mo distinguishes the relevant conversations.
- Numbers in -> architectural decisions out. The estimation is what makes the design defensible.
Real-World Examples
How real systems implement this in production
Google's Jeff Dean popularized a now-canonical list of latency numbers in a 2010 talk: L1 cache, RAM, SSD, disk, network, transcontinental round-trip. The list and its successors have been republished, annotated, and updated countless times. Almost every senior engineer has internalized at least the relative magnitudes; the absolute numbers shift over time but the orders of magnitude are stable.
Trade-off: Memorizing the absolute numbers gives quick answers but the numbers themselves drift over years; what really matters is internalizing the relative magnitudes (memory 100x faster than SSD, etc.) which stay constant and drive every architectural decision.
Amazon, Target, Shopify, and other large retailers publicly discuss year-round preparation for Black Friday and Cyber Monday: traffic modeling that projects 5-50x peak, pre-provisioned capacity that sits idle 51 weeks, load tests in production-like environments at projected peak, prepared degradation modes (queue absorption, simplified checkout, async order confirmation) for unexpected overload. The estimation work begins months in advance.
Trade-off: Pre-provisioning for retail peaks costs significant idle capacity for most of the year, but the cost of being undersized at peak (lost revenue, brand damage, inability to recover) dwarfs the cost of idle hardware; cloud reserved instances and savings plans soften but do not eliminate the trade-off.
Disney+ Hotstar publicly discussed sustaining ~30M concurrent viewers during a single cricket match, peaking in the millions of QPS for chat and metadata services. Twitch and YouTube Live publish similar numbers for esports finals and major streamer events. Engineering write-ups describe the CDN pre-provisioning, the edge transcoding, the queue absorption architecture, and the multi-quality fallback that the estimation forces.
Trade-off: Live event scale requires architectures that are over-engineered for normal load (massive edge fleets, complex queue / fallback paths), but the math leaves no alternative; auto-scaling alone cannot handle 50-1000x spikes that arrive in seconds.
Stripe, Discord, Notion, Linear, and many other modern engineering organizations have publicly discussed quarterly capacity review processes: each service owner produces current load, projected growth, capacity headroom, cost forecast. The math is the same back-of-envelope arithmetic as interview estimation, applied continuously. Without it, services either over-provision (waste) or under-provision (incidents).
Trade-off: Continuous capacity reviews catch undersizing before it becomes an outage but require dedicated time from senior engineers and product managers; smaller companies often skip the formal cadence and react to incidents instead, which is cheaper short-term and more expensive long-term.
Quick Interview Phrases
Key terms to use in your answer
Common Interview Questions
Questions you might be asked about this topic
Start with DAU and per-user activity. Assume 200M DAU producing 2 tweets per day = 400M tweets daily. Storage: 280 bytes per tweet * 5x metadata overhead = 1.4 KB / tweet * 400M = ~560 GB / day; * 365 = ~200 TB / year; * 5 years = ~1 PB. Add media: assume 1 in 5 tweets has a 1 MB image, so 80M media items / day = 80 TB / day = ~150 PB / 5 years. Bandwidth: 700k peak read QPS * average response 5 KB (text + small thumbnails) = ~3.5 GB/sec = ~28 Gbps for text, plus image bandwidth which is much larger and CDN-served. Conclusions: text storage needs sharded wide-column store, media goes to S3 + CDN, origin bandwidth manageable only via CDN with high cache hit ratio.
QPS_avg = DAU * actions_per_user_per_day / seconds_per_day. Use 100,000 instead of 86,400 for mental math. QPS_peak = QPS_avg * peak_factor. Peak factor depends on traffic shape: 2x for global B2C (smoothed by timezones), 3-5x for single-timezone B2B (workday peak), 10-100x for live event-driven workloads. Worked: 50M DAU sending 50 messages = 2.5B daily / 100k = 25k QPS avg; * 3 = 75k QPS peak. Mention that for spiky workloads (live events, flash sales) the peak factor can be 50x or more, which forces a different architecture (pre-provisioned + queue absorption, not auto-scaling).
Raw object size (280 bytes for a tweet, 100 bytes for a chat message, 1 KB for an order) does not include the per-row overhead in the database: row headers, indexes (often the biggest part), timestamps, foreign keys, soft-delete flags, audit columns, replication overhead, write-ahead log. Typical multiplier: 3-5x for indexed structured data in a relational DB; 1.1-1.5x for blob storage with simple metadata; 5-10x for fully indexed search storage like Elasticsearch. Forgetting this overhead under-estimates real disk consumption by a large margin and undersizes capacity. Always state the overhead factor used.
Bandwidth = concurrent_streams * average_bitrate. Worked: 30M concurrent viewers * 5 Mbps HD = 150 Tbps egress at peak. Daily: DAU * minutes_per_user_per_day * bitrate / 8 (for bytes). Worked: 500M DAU * 60 min * 5 Mbps / 8 = ~1.1 EB / day egress. Conclusions: CDN absolutely required at 95%+ hit ratio; origin handles only cache misses (~5%); edge POPs around the world; multi-bitrate (adaptive) variants must be pre-encoded so client can downgrade; live events may need 50x baseline pre-provisioning. The math forces an entirely different architecture from text-heavy services.
Auto-scaling responds in 30-90 seconds (depending on instance type and warm-up). It works when peak / average is 3-5x and arrives gradually (workday peak, weekly cycle). Pre-provisioning is required when peak / average is 10-100x and arrives suddenly (live event start, flash sale countdown, news breaking) because auto-scaling cannot react in time and the first 60 seconds are catastrophic. Estimation tells you the ratio: 10M DAU growing to 30M concurrent for a 90-min match is ~50x; that is pre-provisioning territory. 1M DAU with daily peak 3x average is auto-scaling territory. Mention degradation modes (queue absorption, lower-quality fallback, rate limiting) as the safety net for both.
Interview Tips
How to discuss this topic effectively
Memorize the four key numbers: 100,000 (seconds in a day), 125 MB/sec (1 Gbps), peak factor 3-5x, metadata overhead 3-5x. With these you can derive almost any estimate in 90 seconds.
Always state the assumptions you used. 'Assuming 200M DAU and read:write of 100:1' lets the interviewer correct you if their model is different, before you bake the assumption into the design.
Round aggressively. 86,400 becomes 100,000; 280 bytes becomes 300; 3.15 * 10^7 becomes 30M. Three significant figures at most. The interviewer is grading reasoning, not arithmetic precision.
Tie every estimate to an architectural conclusion. '750k peak read QPS means we cannot serve from a single DB' is a senior tell; '750k QPS' on its own is just a number.
Mention peak vs average explicitly. 'Average is 250k, peak is 3x average so 750k, plus 50% headroom so I'd size for ~1.1M' shows capacity-planning maturity.
Common Mistakes
Pitfalls to avoid in interviews
Skipping estimation entirely or doing it after the design
Estimation drives the design; doing it after means the design was guessed. Spend three minutes on numbers before drawing components. The numbers tell you whether you need caching, sharding, CDN, multi-region.
Designing for average traffic and forgetting peak
Real traffic peaks 2-10x average. A system sized for average is in trouble half the time. Always state the peak factor explicitly and design for peak with headroom.
Forgetting metadata overhead in storage estimates
A 280-byte tweet does not occupy 280 bytes in a database. With indexes, timestamps, user_id, and replica overhead, multiply by 3-5x. Metadata overhead is the most commonly forgotten factor in storage math.
Treating cloud egress as free
Egress (data leaving the cloud) is one of the largest cloud bills for media-heavy services. Always include bandwidth costs, not just compute and storage. A 100 TB/month CDN bill is a $5k-$10k line item that can dominate the architecture.
Doing arithmetic in your head and stalling
Write the math on the whiteboard. Round to easy numbers (100k seconds in a day, not 86,400). Show the steps. The interviewer is grading reasoning, not mental arithmetic. Visible math is faster and more defensible.
