Interview Experience

Cloudflare System Design: The Edge-Latency Question

A senior backend system design round at Cloudflare anchored on p99 latency at the edge, where the interviewer pushed past the obvious answers until I had to commit to a defensible number budget.

Cloudflare System Design: The Edge-Latency Question

A senior backend system design round at Cloudflare anchored on p99 latency at the edge, where the interviewer pushed past the obvious answers until I had to commit to a defensible number budget.

system-design
system-design-interview
distributed-systems
cdn
senior-interviews
oliviafoster

By @oliviafoster

January 15, 2026

·

Updated May 18, 2026

232 views

2

4.2 (10)

I sat the Cloudflare senior backend loop in late 2024 and one of the four onsite rounds was system design. It was 60 minutes, one prompt, and the interviewer was a principal engineer on the edge-data team. I had prepared for the standard topology questions (CDN, WAF, DDoS pipeline) and the prompt I got was different in a way that took me about ten minutes to fully appreciate.

I received an offer about a week after the loop and I declined for unrelated reasons (a competing offer at a company closer to my partner's job). I am writing the round up because the framing the interviewer was using is one I had not seen in any other loop, and it is a framing that, once you see it, you can practice for.

A round graded on whether you have a number budget

The prompt, paraphrased: "Design a feature flag service that serves flag values to clients. Targets: tens of millions of clients globally, p99 latency under 50 milliseconds end-to-end including TLS, flag updates propagate in under 30 seconds."

For the first ten minutes I treated this as a standard "design a config-distribution service" question. I drew the box diagram. Origin write path. Replication to regional caches. Edge POPs serving reads. Pull-vs-push for flag updates. I covered the standard tradeoffs cleanly. The interviewer was patient.

At minute eleven they asked, paraphrased: "What is the p50 and p99 latency budget for the read path you just drew, broken down by component?"

That was the question.

What "a number budget" actually meant

The rest of the round was anchored on producing and defending a per-component latency budget. Not in the abstract. In numbers. What follows is the table I produced after the third revision, with the interviewer pushing on each row:

Latency budget for a single flag read at the edge (target p99 = 50 ms)
  component                        p50      p99    notes
  TLS handshake (resumed)           1 ms    8 ms   session ticket assumed; cold = much worse
  TLS handshake (cold)             40 ms   80 ms   blows the budget; client must reuse session
  routing to nearest POP            5 ms   15 ms   anycast; depends on client geo
  edge worker dispatch              0.5 ms  2 ms   v8 isolate startup
  flag-cache lookup (in-isolate)    0.05 ms 0.2 ms in-memory map
  flag-cache miss to regional       3 ms   12 ms   1% miss rate at steady state
  response serialization            0.2 ms  1 ms   protobuf
  egress + last-mile to client      8 ms   25 ms   varies hugely by geo and access network
  ----------------------------------------------------------------------
  total (warm path)                17.75   63.2 ms BARELY OVER. need to fix the long pole.
  total (cold TLS path)            56.75  131.2 ms over budget. design implication below.

The interviewer's first push was on the cold-TLS row. "That row blows the budget. What do you do about it?" My honest answer was that the cold path is unavoidable for first-time clients, and the design implication is that the SDK has to enforce session reuse aggressively (longer ticket lifetime, opportunistic 0-RTT where safe, retry on session-loss). The interviewer accepted that with one caveat: "And the API contract has to say so. The latency target only holds for warm clients. That is a documentation problem, not a system problem." I wrote that down. It was a small thing that mattered more than I expected.

The second push was on the cache-miss row. "You said 1% miss rate. Why?" I had to defend the number. My answer was that flag values change rarely (the 30-second propagation target is for the long tail; most flags update once a day or less), the working set per region is small (tens of thousands of flags per tenant, low-millions across all tenants), and the eviction policy can be LFU-with-decay. The interviewer pushed: "What does the miss rate look like during a flag-update storm? Tenant pushes 10k flag changes in a minute." My revised answer was that the miss rate spikes to 30-40% during the storm window, the budget is blown, and the design has to either degrade gracefully (serve stale, async refresh) or push the storm to the regional layer with a circuit breaker. The interviewer wanted the explicit choice. I picked degrade-gracefully with a stale-acceptable header that the SDK could opt into.

The artifact the interviewer asked me to draw

With about fifteen minutes left, the interviewer asked: "Draw the read path with the failure modes annotated." Not the topology. The failure modes. Here is what I drew, after some back-and-forth:

Read path with failure annotations
  client SDK
     | (cold TLS = 80ms p99, must use session resume)
     v
  anycast routing (5-15 ms p99, depends on client geo)
     |
     v
  edge POP, edge worker (isolate cold start = 2ms p99)
     | (in-isolate cache hit = 0.2ms; miss = 12ms p99)
     v
  regional cache (only on miss)
     | (miss-storm degrades to stale-serve, alarms at >5% miss for 30s)
     v
  origin read DB (only on regional miss; flag-update path)

The interviewer specifically called out that the alarm threshold ("5% miss for 30 seconds") was the kind of thing they wanted to see in a senior-level answer. "You can have the right system, but if you do not know what to alarm on, you will not catch a regression in the steady-state numbers."

The follow-up that closed the round

The last five minutes were a question I did not see coming: "You quoted p99 of 63ms in your warm-path total, which is over budget by 13ms. What do you do?"

I had three options: cut the long pole, relax the budget, or change the contract. I argued for the third. The long pole was the egress-plus-last-mile row at 25ms p99, which is a function of where the client is and what their access network looks like. We do not control either. Cutting it requires either edge presence the company already has near-maximally optimized, or a different transport (QUIC, which the company already supports, but does not change the access-network tail). The honest answer was that the 50ms target is achievable for clients with a reasonable last-mile connection and is not achievable for the long tail. The contract should be "p99 = 50ms for clients within 30ms RTT of the nearest POP" rather than "p99 = 50ms."

The interviewer's response, paraphrased: "That is the answer I was waiting for. Most candidates either pretend they can hit the original number or hand-wave at edge expansion. The right answer at this level is to renegotiate the contract."

The framing the interviewer was actually using

The round was not graded on the topology. The topology was a precondition. The round was graded on whether I would commit to numbers under pressure, whether I would defend the numbers when they were challenged, and whether I would name the implication when a number did not fit (renegotiate the contract, change the SDK behavior, accept the failure mode and alarm on it). The framing the interviewer was using was: at this level, the system you draw is a load-bearing assertion about a number budget, and the candidate's job is to defend the assertion. I had practiced topology. I had not practiced number-budget defense. The next time I sit a senior infra round I will do this practice as a separate drill, with a stopwatch and a partner pushing back on every row.