Community Article

The System Design Interview Framework I Use in Every Loop

The 45-minute structure I have used as a candidate and the rubric I now use as an interviewer. Sequencing, sample whiteboard, and the 6 common failure modes.

The System Design Interview Framework I Use in Every Loop

The 45-minute structure I have used as a candidate and the rubric I now use as an interviewer. Sequencing, sample whiteboard, and the 6 common failure modes.

system-design-interview
interview-prep
interview-strategy
senior
staff-engineer
hannahdelgado

By @hannahdelgado

December 19, 2025

·

Updated May 18, 2026

857 views

6

Rate

Twelve minutes into the interview, the candidate had drawn 14 boxes on the whiteboard, none of them connected, and had not asked a single clarifying question. They were explaining to me how they would shard their database. I had asked them to design a URL shortener. By minute 30 they had three different storage systems on the board and we still had not established whether the system needed to handle one user or one billion. We did not get to the second half of the interview because they were not done with the first half. I gave them a mid-bar score, which is the polite way of saying they did not pass, and the calibration I wrote afterward boiled down to: they knew the components but they did not know the order.

That interview is one of about 80 system design interviews I have run as a hiring manager. It is also a near-replica of the interview I bombed myself, in 2017, when I was a senior engineer interviewing for a staff role and watched my own performance unravel because I had no framework. I bombed three of those before I sat down and built one. The same framework, refined, is what I now use both when sitting on either side of the whiteboard.

This article is the structure: how I sequence a 45-minute interview, what I aim to put on the whiteboard, the rubric I score against when I am the interviewer, and the six failure modes I most commonly see candidates fall into. None of this is unique to one company; the rubric I describe is closer to the median of what most teams I have interviewed at use, which is also why teaching it works.

The 45-minute clock

The interview is almost always 45 minutes of design plus 5 minutes of candidate questions. Out of those 45, I aim to spend the time roughly as follows. The exact split flexes by problem, but the order is fixed.

45-minute system design interview clock

  0-5 min     Clarify requirements (functional, non-functional)
  5-10 min    Estimate scale (QPS, storage, bandwidth)
  10-15 min   API contract and data model
  15-25 min   High-level architecture (the boxes-and-arrows diagram)
  25-35 min   Deep dive on 1-2 components the interviewer pushes on
  35-42 min   Identify bottlenecks, propose mitigations
  42-45 min   Wrap-up: what would I do differently if this were real

The single biggest mistake I made as a candidate (and now see daily as an interviewer) is rushing the first ten minutes. The first ten minutes feel slow, like nothing is happening, like the interviewer wants to see code on the board. The opposite is true. Those ten minutes are where the frame of the rest of the interview is set, and if you skip them you are improvising the rest of the conversation against a moving target.

Phase 1 (0-5 min): Clarify requirements

The instinct is to start drawing. The right move is to ask questions, on the order of three to five, and write the answers in a corner of the whiteboard so they stay visible.

The questions I ask, paraphrased, in roughly this order:

  • "What does the system need to do? Can we list the user-visible features?"
  • "Who are the users, and roughly how many? Hundreds, thousands, millions?"
  • "What read/write ratio are we expecting? Mostly reads, mostly writes, even mix?"
  • "Are there latency targets we should hit? Real-time, sub-second, eventually consistent?"
  • "What are the non-functional requirements that matter? Durability, consistency, availability, security?"

The interviewer's answers shape every later decision. "Mostly reads, very high read volume" steers you to caching and read replicas. "Tight write consistency, 99.99% availability" steers you to a different beast entirely. Without those answers nailed down, you cannot defend any later choice because you do not know what you are optimizing for.

Write the answers somewhere on the board where you can point at them later. The number of times I have watched candidates change their architecture mid-interview because they forgot the read/write ratio they had agreed on five minutes earlier is high.

Phase 2 (5-10 min): Estimate scale

This is the phase that separates strong candidates from competent ones, in my experience grading. A competent candidate sketches a plausible architecture. A strong candidate sketches one to fit a number.

A shape of estimation that has worked for me: pick the headline number, derive QPS, derive storage, sanity check against a real-world reference.

Estimation scratchwork for a URL shortener

  Headline: 100M new URLs per month. (You can get this number
            by asking, or proposing and getting confirmation.)
  Writes:   100M / (30 * 86,400) = ~40 writes/sec average
            Peak ~3-5x average = ~150 writes/sec
  Reads:    Assume 100:1 read:write ratio = ~4,000 reads/sec average
            Peak ~15,000 reads/sec
  Storage:  Each URL row ~500 bytes, 100M/month = ~50 GB/month
            Five years = ~3 TB. Fits on a single beefy DB or a
            small sharded cluster; not a hadoop problem.
  Bandwidth: 4,000 reads/sec * 500 bytes = 2 MB/sec read
             Peak 15,000 * 500 = 7.5 MB/sec. Easy.

The numbers do not need to be precise. They need to be defensible. If the interviewer says "why 100:1?", you say "a typical link sharing system has many people clicking each link the few people post; I picked 100:1 as a round number we can tune later". That answer is good enough. If you guess 10:1 instead and the interviewer says "why 10:1", you say the same thing, and the architecture probably ends up similar.

The trap I see: candidates who skip estimation entirely, then halfway through the high-level design propose a Cassandra cluster for what is genuinely a single-Postgres workload. The cluster is fine but it is unmotivated, and the senior interviewer will probe and the candidate will not have a number to defend.

Phase 3 (10-15 min): API contract and data model

Five minutes, two artefacts on the board. The API contract is a list of endpoints with their inputs and outputs, no implementation. The data model is the schema (or the document shape) for the main entities. Both are simpler than candidates expect.

URL shortener API contract

  POST /shorten
    body: { longUrl: string, alias?: string, ttlDays?: number }
    returns: { shortCode: string, fullShortUrl: string }

  GET /:shortCode
    returns: 302 redirect to longUrl, or 404

  GET /:shortCode/stats
    returns: { clicks: int, lastAccessed: timestamp, ... }
URL shortener primary table (Postgres-shaped)

  urls
    short_code      VARCHAR(8) PRIMARY KEY
    long_url        TEXT
    user_id         UUID NULL
    created_at      TIMESTAMP
    expires_at      TIMESTAMP NULL
    custom_alias    BOOLEAN

Writing these out forces the conversation to be concrete. Once short_code is a VARCHAR(8), the storage estimate from phase 2 is now grounded. Once the API has a ttlDays, the data model has an expires_at, and the deep dive on storage will have to talk about expiring keys.

Phase 4 (15-25 min): High-level architecture

This is the boxes-and-arrows diagram. Ten minutes. The board should have between six and twelve boxes by the end, no more.

The rough shape for a read-heavy service:

High-level architecture (read-heavy URL shortener)

  [Client] -> [CDN/Edge cache] -> [Load balancer] -> [API service]
                                                          |
                                                          v
                                                  [Cache layer (Redis)]
                                                          |
                                                          v
                                                  [Primary DB (Postgres)]
                                                          |
                                                          v
                                                  [Read replicas]

Five things make this diagram useful. First, every box has a reason that traces back to phases 1-3. The CDN is there because phase 1 mentioned read latency. The cache is there because phase 2's QPS exceeds what the DB can comfortably handle. The replicas are there because the read:write ratio is 100:1. None of those are guesses.

Second, the arrows are directional and labelled. "Cache miss falls through to DB" is an arrow with a label. "DB writes propagate to replicas" is another. Unlabelled arrows are where interviewer questions live.

Third, I name technologies on the boxes ("Redis", "Postgres", "NGINX") rather than leaving them generic ("cache", "DB", "LB"). Specific names give the interviewer a hook to ask follow-up questions; generic names ask them to do my work.

Fourth, I leave room on the right side of the board for the deep-dive sections that come next. If the diagram fills the whole board there is nowhere to expand on a single component.

Fifth, I narrate as I draw. "I am putting a CDN here because the read-side traffic is high and most of it is cacheable. The cache TTL is short, say 60 seconds, because the URL itself does not change but the click stats might want to be fresher. Does that sound right?" That sentence is a candidate-led deep dive in miniature, and a good interviewer will jump on it and steer.

Phase 5 (25-35 min): Deep dive

The interviewer picks a component and asks me to go deep. Sometimes they tell me which one; usually they ask a leading question and I have to figure it out. The most common deep dives I have seen on this URL-shortener problem:

  • How is the short_code generated? Hash of the URL? Counter-based? UUID? How do collisions get resolved?
  • How does the cache stay consistent with the DB? Read-through? Write-through? TTL only?
  • How do you handle hot keys (a short URL that goes viral)?
  • How does the system handle URL expiration? TTL field plus a sweep? Per-row expiration index?
  • How do you scale the DB beyond a single Postgres? Read replicas? Sharding by short_code prefix?

I go deep on one or two of these for ten minutes. Going deep means: state the design, walk through a request through it, name the failure modes, propose a mitigation for each.

A worked deep dive on the short_code generation:

The simplest design is to generate a random 7-character base62 string per request and check the database for collision. With 62^7 = 3.5 trillion possible codes and a target of 100M URLs, the collision probability per write is around 30 in a million. I would design for that: try the insert, catch a unique-constraint violation, retry up to three times. The 99.99th percentile insert latency is one DB round trip. The mean is one round trip. This is fine for our QPS.

An alternative is a counter-based scheme: assign each new URL the next integer ID, then base62-encode the ID. No collisions, no retries. The downside is that the codes are sequential and guessable, which leaks creation order and lets someone enumerate the URL space. For a public URL shortener I would not do this; for an internal one I would.

A hash-of-URL scheme (truncated SHA-256) is the third option. It dedupes automatically (same URL maps to same code, no extra row), but collision handling under truncation is fiddly and I do not think the dedup pays for the complexity. I would not pick it as a default.

I would go with random-base62 plus retry-on-collision.

That is the structure of a good deep dive. State the design. Walk through it. Compare to two alternatives. Pick one with a defended trade-off. Five minutes to ten minutes.

Phase 6 (35-42 min): Bottlenecks and mitigations

This is the phase a lot of candidates miss because they ran out of time. Five minutes to talk about what breaks first under growth.

The shape of this conversation is straightforward: which component will bottleneck first, what does the alert look like when it bottlenecks, and what is the next move?

For the URL shortener with the architecture above, my bottleneck order would be:

  1. The Postgres primary's writes (~150 writes/sec is fine, 1500/sec is not). When the headline number doubles, sharding the URLs table by short_code prefix is the move. Each shard handles a slice of writes; reads for a given code go to the shard owning that prefix.
  2. The cache becomes a single point of failure when usage grows. A Redis cluster (or DAX, or ElastiCache replicas) instead of a single node fixes that.
  3. The CDN at the edge is the only thing that handles the absolute peak of a viral link. If the CDN goes down, even briefly, the origin gets multi-million-QPS spikes. Origin-shield and request coalescing at the CDN layer mitigate this.
  4. Not a bottleneck but a real risk: cache stampede when a hot key expires. Mitigation: probabilistic early refresh, or hold-and-serve-stale.

Not every candidate gets to all four of these in the time. I am happy when a candidate names two and explains the mitigation. I worry when a candidate names zero, because it tells me they have not thought about the system in motion, only in steady state.

Phase 7 (42-45 min): Name the limits of your own design

Three minutes. What would I do differently if this were a real production project? Usually I name two things: one piece I would prioritize earlier than the design implies ("I would set up structured logging and a tracing pipeline before scaling the DB; observability before scale"), and one thing the design above does not address that I would treat as out of scope but would want to flag ("this design does not consider geo-distribution; if we needed sub-50ms latency in three continents, the architecture changes substantially").

This is the polish phase. It tells the interviewer that I know my own design's limits. Strong signal.

The rubric I score against

When I am the interviewer, I am scoring on five axes. They are roughly in the order I form an opinion about them.

  1. Did the candidate clarify the problem before designing? Strong yes / partial / no.
  2. Were the choices motivated by numbers and requirements, not by familiarity? I.e., did they pick Cassandra because the workload genuinely needs it, or because they saw it on a blog?
  3. Could they go deep on at least one component when pushed? Or did the deep dive flatten into hand-waving?
  4. Did they name failure modes and how the system handles them? Cache eviction, DB failover, retry storms, the like.
  5. Did they communicate while drawing? A silent candidate, even one drawing a perfect diagram, scores lower than a candidate who narrates the trade-offs as they go.

A passing candidate hits 4 or 5 of these solidly. A bar-raising candidate hits all 5 plus has the time to discuss something the rubric did not anticipate.

The six failure modes I see most

  1. Designing without numbers. The architecture floats free of the requirements. Phase 2 fixes this if you do not skip it.
  2. The big buzzword shop. Cassandra + Kafka + ZooKeeper + Spark + Hadoop. Maybe correct; usually unmotivated. Each component must trace to a phase 2 number.
  3. Going to depth too early. Sharding the database in minute 8 before the API contract is written. The deep dive belongs after the high-level design, not before it.
  4. Missing the failure conversation. Candidate happy-paths the design and runs out the clock without ever discussing what breaks. Phase 6 is the fix; budget for it.
  5. Silent drawing. Beautiful boxes-and-arrows on the board, no narration. The interviewer cannot grade the reasoning if the reasoning is not spoken aloud. Talk while you draw.
  6. Not asking for the question's constraints. "Design Twitter" is not a single problem; it is fifteen. Asking "what features are in scope, what are out, what is the scale, what is the latency target" turns it into a tractable problem.

Common variations and how the framework adapts

Not every interview is a URL shortener. The framework above is robust across categories because the order (clarify, estimate, contract, architecture, deep dive, bottlenecks, wrap) is the same. What changes is which phase consumes the most time. A few examples from interviews I have run or coached people through.

Design a chat application. Phase 1 expands. The functional clarification (group chat? read receipts? typing indicators? message edits? presence?) easily takes 8 minutes because the feature set decides everything downstream. Phase 4 also expands because you are dealing with persistent connections (WebSockets) and a fan-out problem. The deep dive almost always lands on "how do you fan out a message to a group of 5,000 users efficiently". The bottleneck conversation is connection state, not request rate.

Design a ride-sharing service. Phase 2 dominates. The headline number for trips per day implies QPS for matching, geographic density of drivers and riders, the size of the index for nearby-driver lookup. Phase 4 is where geospatial data structures (geohashes, S2 cells, R-trees) come up. The deep dive typically lands on the matcher itself: how do you efficiently find the nearest available driver for a given rider, with low latency, at scale.

Design a payment system. Phase 1 swells with non-functional questions (consistency guarantees, audit trail requirements, fraud detection latency). The data model phase becomes the load-bearing one because payments live and die on the schema. Phase 6 (bottlenecks and failures) is the meat of the interview because payments must handle partial failures, retries, and idempotency. A weak candidate ships an architecture that double-charges in a network split. A strong one names that failure mode in phase 5 and shows the idempotency-key design that prevents it.

The phases stay; the time allocation flexes. If the interviewer's body language suggests they care about a specific phase, lean into it. The 10-minute high-level architecture in the URL shortener could be 5 minutes for a chat app where the architecture is dominated by one component (the connection-fanout layer) and 15 minutes for a payment system where there are six interacting services.

Practice the framework, not the problems

The last thing I want to leave you with is the practice approach that has worked for me. I do not memorize designs of specific systems. I memorize the framework and practice running it against unfamiliar problems. Forty-five minutes, a whiteboard or a Google doc, a problem I have not seen, the seven phases above. Twenty problems run through the framework over a month is plenty. The framework is what gets graded; the specific problem is the canvas. The candidates I have helped most have all done the same: drilled the structure until the seven phases were automatic, and then walked into interviews on autopilot for the structure and on focus for the content.

The interview that opened this article failed because the candidate had no structure. The candidates I now hire have a structure that is not exactly the one above but is some structure they trust. Build yours, practice it twenty times, and your worst interview gets dramatically better. The best interview gets a little better too. Either way, you stop walking into the room hoping to be lucky.

Back to Articles