System Design Article

Back-of-the-Envelope Estimation & Capacity Planning

Difficulty: Medium

Back-of-the-envelope estimation is the math you do in three minutes to ground a system design in numbers. It is what tells you whether your single Postgres instance can handle the load (no), how much storage you need over five years (probably more than you think), and how much CDN bandwidth you are about to commit to (probably more than that). This lesson covers the standard latency / throughput / size / bandwidth numbers every engineer should have memorized, the unit conversions and order-of-magnitude reasoning that keep you fast, the templates for QPS, storage, and bandwidth estimation, capacity planning beyond steady state (peak vs average, headroom, growth, regional, seasonal), and the cost rough-arithmetic that turns 'we need more servers' into a defensible business case. The goal is to leave you able to walk into any interview or design review and produce useful numbers in three minutes flat.

System Design
/

Back-of-the-Envelope Estimation & Capacity Planning

Back-of-the-Envelope Estimation & Capacity Planning

Back-of-the-envelope estimation is the math you do in three minutes to ground a system design in numbers. It is what tells you whether your single Postgres instance can handle the load (no), how much storage you need over five years (probably more than you think), and how much CDN bandwidth you are about to commit to (probably more than that). This lesson covers the standard latency / throughput / size / bandwidth numbers every engineer should have memorized, the unit conversions and order-of-magnitude reasoning that keep you fast, the templates for QPS, storage, and bandwidth estimation, capacity planning beyond steady state (peak vs average, headroom, growth, regional, seasonal), and the cost rough-arithmetic that turns 'we need more servers' into a defensible business case. The goal is to leave you able to walk into any interview or design review and produce useful numbers in three minutes flat.

System Design
Medium
estimation
capacity-planning
interview-strategy
back-of-envelope
system-design
advanced
premium

405 views

9

Motivation

A candidate sketches a Twitter design. The interviewer asks 'how many servers?'. The candidate stalls. They start trying to multiply 200,000,000 by 100 by 24 by 3600 in their head, lose track, restart, lose track again. After 90 seconds they say 'a lot' and try to move on. The interviewer's confidence drops.

This is unnecessary. Back-of-envelope estimation is not arithmetic skill; it is technique. With three numbers memorized (seconds in a day, conversions between bytes / KB / MB / GB / TB, basic latency constants) and three templates (QPS, storage, bandwidth) almost any system design estimation runs in 90 seconds.

The disciplined alternative is the engineer who, mid-conversation, says: '200M DAU, each producing 2 tweets means 400M tweets per day; 400M divided by 100,000 seconds per day is roughly 4,000 writes per second average; multiply by 3 for peak gives 12,000 peak write QPS; we cannot serve 12,000 writes per second from a single Postgres so we need either sharding or write-optimized storage'.

In 30 seconds that engineer just (a) demonstrated estimation skill, (b) grounded the design in numbers, (c) drove an architectural choice. That is the senior signal the framework is set up to elicit. This lesson is the technique.

Why this matters beyond interviews: real production engineering uses back-of-envelope math constantly. Capacity reviews, cost forecasts, on-call sizing decisions, 'should we autoscale or pre-provision?' all reduce to the same arithmetic. The interview tests it because the job requires it.

Deep Dive

The numbers to memorize

Three categories of constants that every senior engineer should have at instant recall.

Latency numbers (Jeff Dean's 'numbers every programmer should know', updated):

OperationApproximate latency
L1 cache reference~1 ns
Branch mispredict~3 ns
L2 cache reference~4 ns
Mutex lock / unlock~17 ns
Main memory reference~100 ns
Compress 1 KB with Zippy~2 us (~2,000 ns)
Send 1 KB over 1 Gbps network~10 us
Read 4 KB from SSD~16 us
Read 1 MB sequentially from memory~100 us
Round trip within same datacenter~500 us
Read 1 MB sequentially from SSD~1 ms
Disk seek (HDD)~10 ms
Read 1 MB sequentially from disk (HDD)~30 ms
Round trip CA to Netherlands~150 ms

Rules of thumb derived from these:

  • Memory is ~100x faster than SSD; SSD is ~10x faster than HDD seek.
  • Same-datacenter network is ~500us; cross-region is ~50-200ms.
  • Anything cross-datacenter is at least 10x slower than within-datacenter.

Throughput numbers:

SystemApproximate throughput
Single Postgres / MySQL on commodity hardware~1k-10k QPS reads, ~5k writes
Postgres with read replicas~100k QPS reads (with replicas)
Redis (single node)~100k-200k ops / sec
Redis Cluster~1M+ ops / sec
Cassandra / DynamoDB (well-modeled)~10k-100k QPS per node
Kafka (per partition)~10k-100k messages / sec
Nginx / Envoy as reverse proxy~50k-200k req / sec per node
Single modern server (HTTP API in Go / Rust)~10k-50k req / sec
Typical cloud VM network bandwidth~1-10 Gbps

Size and time numbers:

Value
Seconds per day86,400 (~10^5; many engineers approximate as 100,000)
Seconds per year~3.15 * 10^7 (~30M)
1 KB / 1 MB / 1 GB / 1 TB / 1 PB10^3 / 10^6 / 10^9 / 10^12 / 10^15 (decimal)
Bytes in a UUID16
Bytes in a tweet~280 (text)
Bytes in a chat message~100 (text)
Bytes in a small image~100 KB
Bytes in an HD video minute~5-10 MB
Average web page weight~2 MB

These numbers do not need to be exact. Memorize within an order of magnitude.

The QPS template

Given daily active users and per-user activity:

Text
QPS_avg  = DAU * actions_per_user_per_day / 86,400
QPS_peak = QPS_avg * peak_factor    (typically 2-5x)

Use 100,000 instead of 86,400 for mental arithmetic; the error is ~15% which is negligible at this stage.

Worked example: a chat app with 10M DAU, each sending 50 messages per day:

Text
QPS_avg  = 10,000,000 * 50 / 100,000 = 5,000 messages / sec
QPS_peak = 5,000 * 3 = 15,000 messages / sec at peak

Second example: a photo-sharing app with 500M DAU, each viewing 50 photos and uploading 0.5 per day:

Text
Reads_QPS_avg  = 500M * 50 / 100,000 = 250,000 reads / sec
Reads_QPS_peak = 250,000 * 3 = 750,000 reads / sec

Writes_QPS_avg  = 500M * 0.5 / 100,000 = 2,500 writes / sec
Writes_QPS_peak = 2,500 * 5 = 12,500 writes / sec

Read-write ratio is 100:1, which immediately tells you 'cache aggressively; design the read path first'.

The storage template

Given object count and per-object size:

Text
storage_per_day  = objects_per_day * bytes_per_object * metadata_overhead
storage_per_year = storage_per_day * 365
storage_5_year   = storage_per_year * 5

Metadata overhead is typically 2-5x for indexed structured data, 1.1-1.5x for blob storage with simple metadata.

Worked example: a Twitter-like service. 400M tweets per day, 280 bytes per tweet, 5x metadata overhead (id, timestamps, user_id, indexes):

Text
storage_per_day  = 400,000,000 * 280 * 5 = 560 GB / day
storage_per_year = 560 * 365            = 200 TB / year
storage_5_year   = 200 * 5              = 1 PB

This tells you 'one very large database' is wrong; you need either sharding or a wide-column store designed for this.

With media (1 in 5 tweets has a 1 MB image):

Text
media_per_day = 80,000,000 * 1 MB = 80 TB / day
media_5_year  = 80 * 365 * 5      = 146 PB

Media dominates storage cost; this drives the architectural decision to put media in object storage (S3) and thumbnails in a CDN.

The bandwidth template

Given QPS and per-request payload:

Text
bandwidth = QPS * bytes_per_request

Convert to Gbps: divide by ~125,000,000 (since 1 Gbps ~= 125 MB/sec).

Worked example: 750k peak read QPS, 1 MB average response (image-heavy):

Text
bandwidth_peak = 750,000 * 1 MB = 750 GB/sec ~= 6,000 Gbps

This is huge. Single-server NICs are ~10-25 Gbps; you cannot serve this from a small fleet. The architectural answer: CDN. CDN serves cached content at edge; origin sees only cache misses.

With CDN at 95% cache hit ratio:

Text
origin_bandwidth = 6,000 Gbps * 0.05 = 300 Gbps

Still significant but manageable across a fleet. This is exactly why image / video products always use CDNs.

Peak vs average

Real traffic is not uniform across 24 hours. Common peak-to-average ratios:

PatternPeak / Average
B2B / SaaS in one timezone~3x (workday peak)
B2C global~2x (smoother due to multi-timezone)
Live event-driven (sports, election)~10-100x
Holiday shopping (Black Friday)~5-20x
Live streaming (concerts)~50-1000x

Design for peak, not average. A system that handles average is in trouble half the time.

For really spiky workloads (live events, flash sales), the architecture has to be different: pre-provisioned capacity, queue absorption, degradation modes, edge caching. 'Auto-scale to peak' does not work when peak is 100x average and arrives in 30 seconds.

Headroom and growth

Never size for exactly your projected peak. Add:

  • Headroom: 30-50% above projected peak so you do not page on a normal busy day. A system at 95% utilization is already in trouble.
  • Growth: 1-2 years of projected user growth, depending on how fast you can re-architect.
  • Failure: enough capacity that the loss of one AZ / region does not page. For three AZs: each must handle 50% load (so failure of one leaves the other two each at 75% with headroom).

A system designed for 100k peak QPS today is realistically sized to handle 200-300k QPS. The 2-3x is not waste; it is operational sanity.

Regional distribution

For multi-region systems, capacity planning is per region, not global. The arithmetic:

Text
global_qps        = sum over regions of regional_qps
regional_capacity = regional_qps * peak_factor * (1 + headroom) * failover_multiplier

Failover multiplier (if the architecture is N+1 regions): each region must absorb (1 / N) of an absent region's traffic. For three regions, each region's spare capacity needs to handle ~50% extra during one-region outage.

For strongly-consistent multi-region writes, the geographic round-trip latency dominates: a write that needs quorum across continents will be ~100-300 ms regardless of how fast the database is. This often drives regional partitioning of the data ('users live in their home region').

Cost arithmetic

Rough cloud cost numbers (order-of-magnitude, 2024-2026):

Approximate cost
Cloud VM (4 vCPU, 16GB)~$100-200 / month on-demand; ~$30-70 with reserved instances
EC2 / GCP egress per GB~$0.05-0.12
CloudFront / CDN egress per GB~$0.02-0.08 (volume discounts)
Inter-AZ network per GB~$0.01
S3 standard storage per GB-month~$0.023
S3 Glacier per GB-month~$0.004
RDS Postgres db.r6g.large~$200-400 / month
DynamoDB on-demand per million writes~$1.25
Lambda per million invocations~$0.20 (plus compute time)

Worked example: serving 100 TB / month of CDN traffic at $0.05 / GB:

Text
CDN cost = 100,000 GB * $0.05 = $5,000 / month

Very rough but enough to know whether you are talking about a $100/month design or a $1M/month design. The factor-of-10 distinction usually drives different conversations.

Note: cloud egress (data leaving the cloud) is the surprise line item that breaks startup budgets. A design that pulls a TB from your cloud per day is $36k / year just in egress.

Memory sizing

For caches, common rule of thumb: keep 5-20% of the dataset in cache, depending on access skew (heavy long-tail = more, uniform = less, Pareto = sweet spot at ~10%).

Worked example: 1 TB hot dataset, 10% in cache:

Text
cache_size = 100 GB
nodes_needed = 100 GB / 32 GB per node = ~4 Redis nodes

Four nodes plus replicas; this is a real Redis Cluster, not a single instance.

Connection pools and connection arithmetic

A frequent mistake: undersized DB connection pools cause request queueing far before the DB itself is saturated. Quick check: Postgres with max_connections = 100, app server pool of 20 connections per instance, 50 app instances = 1000 connections trying to land on 100. Half the requests are queued.

Rule of thumb: total connections from app fleet should be at most 75% of DB max_connections. Use connection pooling proxies (PgBouncer, RDS Proxy) when fleet is large.

When to use each architectural option (driven by numbers)

NumbersArchitectural implication
Read QPS > 5kNeed read replicas or aggressive caching
Read QPS > 100kNeed CDN for static, multi-tier cache for dynamic
Write QPS > 5k on a single PK row rangeNeed sharding or write-optimized DB
Storage > ~1 TBNeed partitioned / sharded storage
Storage > ~100 TBObject storage (S3) + lifecycle tiering
Bandwidth > ~10 Gbps originCDN required
Latency p99 < 50msEdge / regional architecture
Cross-region writes with strong consistencyRestructure to per-region partitioning, or accept 100-300ms latency
Peak / average > 10xPre-provisioned capacity + queue / degradation; auto-scaling alone is not enough

This is the table you should be deriving in your head as you do the estimation. Numbers in -> architectural decisions out.

Implementation

A 90-second template you can run in any interview

Copy this structure mentally:

Text
DAU:                          [number]
Actions per user per day:     [reads, writes separately]
QPS avg:                      DAU * actions / 100,000
QPS peak:                     QPS_avg * peak_factor

Object size:                  [bytes per object]
Metadata overhead:            [2-5x]
Storage per day:              objects_per_day * size * overhead
Storage 5 year:               daily * 365 * 5

Bandwidth peak:               QPS_peak * payload_size
Gbps:                         bandwidth / 125 MB/sec / 8 [if needed]

Architectural conclusions from above:
  - Caching needed?    (yes if QPS > 5k or hit ratio matters)
  - Sharding needed?   (yes if write QPS > 5k or storage > 1TB)
  - CDN needed?        (yes if media + bandwidth > 10 Gbps origin)
  - Multi-region?      (yes if global users + latency < 100ms or regional residency)

Three minutes; ten lines on the whiteboard; every architectural decision afterward is anchored to these numbers.

Worked example: a chat application

Text
DAU                              50,000,000
Messages per user per day                 50
Messages per day              2,500,000,000
Writes per second avg                25,000
Writes per second peak               75,000  (3x)

Message size                            100 bytes
Metadata overhead                       3x   (id, timestamps, sender, etc.)
Storage per day                       750 GB
Storage 5 year                       1.4 PB

Reads per user per day                  150  (3x messages)
Reads per second avg                 75,000
Reads per second peak               225,000

Read bandwidth peak                  225k * 100B = 22 MB/sec ~= 180 Mbps
Write bandwidth peak                  75k * 100B =  7 MB/sec ~=  60 Mbps

Architectural conclusions:
  - 75k peak writes -> need sharding by chat_id or user_id
  - 225k peak reads -> aggressive caching, maybe per-user inbox in Redis
  - 1.4 PB over 5 years -> tiered storage; hot in fast store, old in cold
  - Bandwidth modest (Mbps not Gbps) -> CDN unnecessary for messages themselves;
    but media attachments would change this dramatically
  - Real-time delivery at this scale -> WebSocket fleet sized for ~50M concurrent
    connections; ~50-100k connections per server -> 500-1000 servers

Four minutes of arithmetic, twelve lines, design conclusions clear.

Worked example: a video streaming service

Text
DAU                              500,000,000
Minutes watched per user per day         60
Total minutes watched           30,000,000,000 / day
Video bitrate                            5 Mbps  (HD average)
Bandwidth bytes per minute               37.5 MB
Daily egress                       1.1 EB / day  (1100 PB)

This is enormous. Conclusions:
  - CDN at 95%+ cache hit ratio is mandatory
  - Edge POPs around the world; user routes to nearest
  - Origin handles only cache misses; ~5% of 1.1 EB = 55 PB / day = 5 Tbps
  - Storage of catalog: 100k videos * 1 GB average = 100 TB hot,
    plus regional replicas, plus transcoded variants (5x for resolutions) = ~5 PB
  - Encoding pipeline: any newly uploaded video goes through transcoding to
    multiple resolutions / codecs; massively parallel batch job

Three minutes; the design is now anchored. Without this, 'we'll use a CDN' is hand-waving; with it, 'we'll use a CDN at 95% hit ratio because the origin would otherwise serve 5 Tbps which is 1000 servers worth of network' is a senior answer.

Capacity planning beyond the interview

In real production, the same arithmetic feeds quarterly capacity reviews:

Text
For each service:
  current_peak_qps:      [from monitoring]
  growth_quarterly:      [historical or forecast]
  projected_peak_qps:    current * (1 + growth)^quarters
  capacity_today:        [from autoscaling / load tests]
  utilization_today:     current_peak / capacity_today
  capacity_needed:       projected_peak * (1 + headroom) * failover_multiplier
  delta:                 capacity_needed - capacity_today
  cost_delta:            delta * cost_per_unit

Every senior engineer should be running this for their services on a quarterly cadence. The math is the same as the interview, just continuous.

When to Use

Always estimate when

  • Designing any system in an interview. No exceptions.
  • Designing any new system in real work. Numbers ground decisions.
  • Reviewing an existing design for scaling. Estimate before recommending changes.

Skip detailed estimation when

  • The interviewer explicitly says 'do not bother with numbers; let's focus on design'. Honor the redirect.
  • The system is small enough that any reasonable architecture will work. (A blog with 1000 readers does not need estimation.)

Use peak-driven sizing when

  • The traffic is spiky and the cost of being undersized at peak is high (user-facing, revenue-generating).
  • SLO compliance during peak matters.

Use average-driven sizing when

  • The traffic is smooth and being briefly slow at peak is acceptable.
  • The service is internal / batch and does not have hard latency requirements.

Use auto-scaling when

  • Peak / average ratio is ~3-5x with predictable timing.
  • Workload allows 30-60 second scaling delays.
  • Cost optimization at scale matters.

Use pre-provisioned capacity when

  • Peak / average ratio is >10x with sudden onset (live events, flash sales).
  • Auto-scaling cannot react fast enough.
  • The cost of underprovisioning (lost revenue, brand damage) exceeds the cost of pre-provisioning.

Use multi-region when

  • User base is geographically distributed and latency matters.
  • Regulatory data residency requires it.
  • Single-region availability is insufficient (need to survive entire-region outages).

Case Studies

Jeff Dean's 'Numbers Every Programmer Should Know'

Google's Jeff Dean popularized a now-canonical list of latency numbers that engineers should have internalized: L1 cache, RAM, SSD, disk, network, transcontinental round-trip. The list (originally from a 2010 talk) has been republished and updated countless times. It exists because the relative magnitudes (memory is 100x faster than SSD which is 10-100x faster than disk seek which is 100x faster than transcontinental network) drive almost every architectural decision.

Lesson: internalizing relative magnitudes is more useful than memorizing exact numbers. The takeaway 'cache miss is 100x worse than cache hit; cross-region is 10,000x worse than in-process' is the real signal.

Engineering blog capacity write-ups

Many engineering teams have published capacity planning case studies: Stripe on quarterly capacity reviews, Discord on Cassandra capacity for messages, Uber on city-by-city capacity for ride matching, Cloudflare on edge capacity for DDoS absorption. The math in these posts is back-of-envelope plus refinements; the structure (current load, growth, headroom, cost) is identical to interview estimation.

Lesson: real production capacity planning is the same arithmetic as interview estimation, run continuously instead of once.

Black Friday / Cyber Monday at retail

Large retailers (Amazon, Target, Walmart, Shopify) publicly discuss the year-round preparation for Black Friday: traffic models, load tests at projected peak, pre-provisioned capacity that idles 51 weeks of the year, degradation modes for unexpected overload. The peak/average ratios at peak shopping events can exceed 50x, which is well past what auto-scaling can react to.

Lesson: estimation must include worst-case events, not just steady state. The architecture for a 50x spike is fundamentally different from the architecture for a 3x daily peak.

Live-streaming Super Bowl / cricket finals

Streaming platforms (Disney+ Hotstar, Twitch, YouTube Live) have published numbers from major live events: 30M+ concurrent viewers on a single match, peak bandwidth in the multi-Tbps range, peak QPS in the millions on a single live channel. The architectures these numbers force (massive CDN pre-provisioning, edge transcoding, queue absorption, degraded streaming modes) are directly visible in the public engineering write-ups.

Lesson: at the extreme high end of capacity planning, the math drives entirely different architectures (edge-first, pre-provisioned, multi-quality fallback) that cannot be reverse-engineered without doing the estimation first.

Quick Review

  • Memorize the latency numbers (Jeff Dean's list), the throughput numbers per major system, and the size / time constants.
  • The QPS template: DAU * actions / 100,000 for average, multiply by peak factor for peak.
  • The storage template: objects_per_day * bytes_per_object * metadata_overhead, then multiply for years.
  • The bandwidth template: QPS * bytes_per_request, divide by 125 MB/sec for Gbps.
  • Always design for peak, not average. Peak / average is 2-5x for B2C, 3-10x for B2B, much higher for live events.
  • Add 30-50% headroom and 1-2 years of growth on top of projected peak.
  • For multi-region: per-region capacity must include failover absorption.
  • Cost arithmetic is order-of-magnitude: $100/mo vs $1k/mo vs $10k/mo distinguishes the relevant conversations.
  • Numbers in -> architectural decisions out. The estimation is what makes the design defensible.

Real-World Examples

How real systems implement this in production

Jeff Dean's 'Numbers Every Programmer Should Know'

Google's Jeff Dean popularized a now-canonical list of latency numbers in a 2010 talk: L1 cache, RAM, SSD, disk, network, transcontinental round-trip. The list and its successors have been republished, annotated, and updated countless times. Almost every senior engineer has internalized at least the relative magnitudes; the absolute numbers shift over time but the orders of magnitude are stable.

Trade-off: Memorizing the absolute numbers gives quick answers but the numbers themselves drift over years; what really matters is internalizing the relative magnitudes (memory 100x faster than SSD, etc.) which stay constant and drive every architectural decision.

Black Friday capacity at large retailers

Amazon, Target, Shopify, and other large retailers publicly discuss year-round preparation for Black Friday and Cyber Monday: traffic modeling that projects 5-50x peak, pre-provisioned capacity that sits idle 51 weeks, load tests in production-like environments at projected peak, prepared degradation modes (queue absorption, simplified checkout, async order confirmation) for unexpected overload. The estimation work begins months in advance.

Trade-off: Pre-provisioning for retail peaks costs significant idle capacity for most of the year, but the cost of being undersized at peak (lost revenue, brand damage, inability to recover) dwarfs the cost of idle hardware; cloud reserved instances and savings plans soften but do not eliminate the trade-off.

Live streaming peaks (Hotstar, Twitch, YouTube Live)

Disney+ Hotstar publicly discussed sustaining ~30M concurrent viewers during a single cricket match, peaking in the millions of QPS for chat and metadata services. Twitch and YouTube Live publish similar numbers for esports finals and major streamer events. Engineering write-ups describe the CDN pre-provisioning, the edge transcoding, the queue absorption architecture, and the multi-quality fallback that the estimation forces.

Trade-off: Live event scale requires architectures that are over-engineered for normal load (massive edge fleets, complex queue / fallback paths), but the math leaves no alternative; auto-scaling alone cannot handle 50-1000x spikes that arrive in seconds.

Quarterly capacity reviews at scale-up companies

Stripe, Discord, Notion, Linear, and many other modern engineering organizations have publicly discussed quarterly capacity review processes: each service owner produces current load, projected growth, capacity headroom, cost forecast. The math is the same back-of-envelope arithmetic as interview estimation, applied continuously. Without it, services either over-provision (waste) or under-provision (incidents).

Trade-off: Continuous capacity reviews catch undersizing before it becomes an outage but require dedicated time from senior engineers and product managers; smaller companies often skip the formal cadence and react to incidents instead, which is cheaper short-term and more expensive long-term.

Quick Interview Phrases

Key terms to use in your answer

let me do back-of-envelope estimation
DAU times actions divided by 100,000
design for peak with headroom and growth
this drives the architecture choice
metadata overhead of 3 to 5x
CDN at 95 percent hit ratio

Common Interview Questions

Questions you might be asked about this topic

Start with DAU and per-user activity. Assume 200M DAU producing 2 tweets per day = 400M tweets daily. Storage: 280 bytes per tweet * 5x metadata overhead = 1.4 KB / tweet * 400M = ~560 GB / day; * 365 = ~200 TB / year; * 5 years = ~1 PB. Add media: assume 1 in 5 tweets has a 1 MB image, so 80M media items / day = 80 TB / day = ~150 PB / 5 years. Bandwidth: 700k peak read QPS * average response 5 KB (text + small thumbnails) = ~3.5 GB/sec = ~28 Gbps for text, plus image bandwidth which is much larger and CDN-served. Conclusions: text storage needs sharded wide-column store, media goes to S3 + CDN, origin bandwidth manageable only via CDN with high cache hit ratio.

Interview Tips

How to discuss this topic effectively

1

Memorize the four key numbers: 100,000 (seconds in a day), 125 MB/sec (1 Gbps), peak factor 3-5x, metadata overhead 3-5x. With these you can derive almost any estimate in 90 seconds.

2

Always state the assumptions you used. 'Assuming 200M DAU and read:write of 100:1' lets the interviewer correct you if their model is different, before you bake the assumption into the design.

3

Round aggressively. 86,400 becomes 100,000; 280 bytes becomes 300; 3.15 * 10^7 becomes 30M. Three significant figures at most. The interviewer is grading reasoning, not arithmetic precision.

4

Tie every estimate to an architectural conclusion. '750k peak read QPS means we cannot serve from a single DB' is a senior tell; '750k QPS' on its own is just a number.

5

Mention peak vs average explicitly. 'Average is 250k, peak is 3x average so 750k, plus 50% headroom so I'd size for ~1.1M' shows capacity-planning maturity.

Common Mistakes

Pitfalls to avoid in interviews

Skipping estimation entirely or doing it after the design

Estimation drives the design; doing it after means the design was guessed. Spend three minutes on numbers before drawing components. The numbers tell you whether you need caching, sharding, CDN, multi-region.

Designing for average traffic and forgetting peak

Real traffic peaks 2-10x average. A system sized for average is in trouble half the time. Always state the peak factor explicitly and design for peak with headroom.

Forgetting metadata overhead in storage estimates

A 280-byte tweet does not occupy 280 bytes in a database. With indexes, timestamps, user_id, and replica overhead, multiply by 3-5x. Metadata overhead is the most commonly forgotten factor in storage math.

Treating cloud egress as free

Egress (data leaving the cloud) is one of the largest cloud bills for media-heavy services. Always include bandwidth costs, not just compute and storage. A 100 TB/month CDN bill is a $5k-$10k line item that can dominate the architecture.

Doing arithmetic in your head and stalling

Write the math on the whiteboard. Round to easy numbers (100k seconds in a day, not 86,400). Show the steps. The interviewer is grading reasoning, not mental arithmetic. Visible math is faster and more defensible.