Interview Experience

Designing a Feed in 45 Minutes at a Mid-Size SaaS

A senior system design round at a mid-size B2B SaaS where the prompt was a generic activity feed but 45 minutes forced me to commit to a fan-out strategy in the first ten minutes.

Designing a Feed in 45 Minutes at a Mid-Size SaaS

A senior system design round at a mid-size B2B SaaS where the prompt was a generic activity feed but 45 minutes forced me to commit to a fan-out strategy in the first ten minutes.

system-design
system-design-interview
news-feed
scalability
senior-interviews
liamsuzuki

By @liamsuzuki

April 25, 2026

·

Updated May 20, 2026

1,125 views

26

4.3 (9)

Most system design rounds I have sat are 60 minutes. This one was 45, on purpose, and the interviewer told me at the start that the abbreviated length was the test. The company was a mid-size B2B SaaS in NYC, around 800 engineers, the kind of place that does not show up on prep guides because they are not public-facing brand names. I was interviewing for a senior fullstack role and the design round was one of four onsite rounds.

I received the offer two weeks later and I declined for unrelated reasons (I took a competing offer at a different mid-size company with better infrastructure tenure). I am writing this round up because the 45-minute format taught me something the 60-minute format had not: which decisions are actually load-bearing and which decisions are decoration.

Why 45 minutes is a different round than 60

The interviewer said it directly at the top: "In 45 minutes, you cannot cover the full surface area. We are looking at how you triage. Which decisions did you spend time on, and which did you skip on purpose?"

That framing is the round. The standard 60-minute round rewards thoroughness. The 45-minute version rewards explicit triage, where you say out loud "I am skipping the auth layer because for this prompt it is not differentiating" and the interviewer either accepts the skip or pulls you back if your skip was wrong.

The prompt, paraphrased: "Design an in-product activity feed for a B2B collaboration tool. Per-tenant. Tens of thousands of tenants, low-thousands of users per tenant on the larger end. Feed events come from many sources inside the product. Reads dominate writes by roughly 100 to 1."

How I prepped specifically for the abbreviated format

The recruiter told me a week before onsite that the design round would be 45 minutes, not the 60 I had been practicing for. That single piece of information changed everything about how I prepped. I ran six 45-minute mocks in five days, with a kitchen timer and a friend playing interviewer. The first three were rough. I kept producing the same 60-minute design, just compressed, which meant I was racing the clock and never quite landing any single component. My rule for the second three was different: pick the one or two load-bearing decisions for the prompt, decide them in the first ten minutes, and accept that everything else gets a one-sentence skip with a defensible reason.

The mocks taught me a second thing I had not learned from 60-minute practice. The round that goes well in 45 minutes is the round where you and the interviewer agree on what the load-bearing call is within the first five minutes. If you are still discovering the load-bearing call at minute fifteen, you have already lost the round, because the back-half of the round is supposed to be defending and refining a decision, not arriving at one. The kitchen-timer mocks made this visceral in a way that abstract advice never had.

My triage in the first six minutes

I started by writing a triage list on the whiteboard before drawing anything:

Triage list (minute 4 of 45)
  WILL spend time on:
    - fan-out strategy (push, pull, or hybrid) -- the load-bearing call
    - read-path latency budget (this is the user-visible feature)
    - per-tenant isolation (B2B, noisy neighbor matters)
    - permissioning at read-time (not all events are visible to all users)
  WILL skip on purpose, will name the skip:
    - auth layer (assume an existing tenant-scoped JWT)
    - generic CDN / TLS topology (assume standard)
    - schema migrations (assume mature enough to handle)
    - cost optimization (out of scope at this prompt)

The interviewer wrote that down and said, paraphrased: "Good. Two of those skips I would have called out. The other two are fine." I asked which two and they declined to say, which is fair ("go figure it out from the round"). In retrospect, the skip the interviewer might have wanted me to defend was the per-tenant-isolation one, because at this scale the noisy-neighbor case is real. I did spend time on it, just later. I think the thing they would have called out was "schema migrations" because feed schemas evolve and a poorly-designed event envelope locks you in.

The fan-out decision and why I committed to hybrid in minute eight

The load-bearing call was push-vs-pull. With reads dominating writes 100:1 and tenants ranging from 5 to several thousand users, a pure-pull design is wasteful at read time and a pure-push design has fan-out amplification on the write side that gets expensive for active tenants. I committed to hybrid in minute eight, with the rule:

Fan-out rule (minute 8)
  if tenant.size < 200 users:        push at write time to per-user inboxes
  if tenant.size in [200, 2000):     push only to currently-active users (online in last 24h)
  if tenant.size >= 2000:            do not push; pull on read

The interviewer pushed on the boundaries ("why 200?") and my answer was that the boundaries were illustrative and the real numbers would come from measuring read latency under load on a sample tenant of each size. They accepted that. They pushed harder on the middle band: "Active in last 24h is a moving target. What happens when a user comes back after 48h?" My answer was a small reconciliation step at session-resume time, where the client asks the server for any events newer than the last cursor it has, and the server runs a pull-style query to fill the gap. The interviewer liked this and pushed once more: "What if the gap is large? User has been gone for two weeks." My answer was that we cap the fill, show a banner, and the user can pull older history on scroll.

The small bit of code that anchored this:

async function loadFeedOnSessionResume(userId, lastSeenCursor) {
  const fillCap = 200;
  const events = await api.getEventsSince({
    userId,
    cursor: lastSeenCursor,
    limit: fillCap,
  });
  if (events.length === fillCap) {
    // The cap was reached. Show the user that history was truncated
    // and let them pull more on demand.
    return { events, hasOlderUnseen: true };
  }
  return { events, hasOlderUnseen: false };
}

The read-path latency budget I produced

For the read path I gave a quick budget, in the same shape as the cloudflare round I had practiced for:

Read-path latency target: p99 = 200 ms (in-product, modal-load)
  TLS resumed                 < 5 ms
  routing + auth verify       < 10 ms
  inbox lookup (push tenants) < 30 ms (indexed read)
  pull query (pull tenants)   < 100 ms (per-tenant scan, bounded)
  permission filter           < 20 ms (bitmap-style, in memory)
  serialization               < 5 ms
  egress                      < 30 ms
  ----------------------------------------
  push tenant total p99       ~100 ms (well under)
  pull tenant total p99       ~170 ms (close to the wire)

The interviewer's question: "Your pull-tenant total is close to the budget. What is the regression mode?" My answer was that the pull query is bounded by the per-tenant event-table size and the permission filter is bounded by the user's group membership, so the regression mode is a single tenant whose event volume grows out of band, not a global scaling problem. The mitigation is per-tenant rate limiting on the write path, with an alarm when a tenant's daily event count crosses a threshold. They wrote that down.

Permissioning was the round's hidden test

With twelve minutes left, the interviewer asked about permissioning at read time. "Not all events are visible to all users. How do you filter without blowing the budget?" This was the round's hidden test. The naive answer is to filter at read time, which works for small tenants and falls apart for the large ones. The better answer is to denormalize the per-event ACL into a compact bitmap stored on the event, so the read-time filter is a single bitwise-and against the user's group bitmap. I drew this:

Permission representation
  event row has: { tenant_id, event_id, payload, visible_to_groups: bitmap }
  user has:      { tenant_id, user_id, member_of_groups: bitmap }
  read filter:   event.visible_to_groups & user.member_of_groups != 0

The interviewer's pushback was on the size of the bitmap ("what if the tenant has thousands of groups?") and my answer was that for tenants above a threshold, the bitmap moves to a compressed representation (Roaring) and the filter becomes an intersection-non-empty check rather than a bitwise-and. They accepted this.

The sentence the interviewer wrote down

With two minutes left I gave a one-sentence summary: "The thing I am most worried about in this design is the boundary between the push tenants and the pull tenants, because the boundary is a moving target as tenants grow, and the wrong move on a tenant straddling the boundary will spike either write cost or read latency."

The interviewer wrote that sentence down on their notes. The recruiter mentioned it later as the moment the round flipped from a hire to a strong hire. Triage early, name the load-bearing call, defend it with numbers, and end with the part of the design that scares you. Forty-five minutes is enough time for that. It is not enough time for anything else.