██████╗ ██████╗ ██████╗ ███████╗    ███████╗███╗   ██╗ █████╗ ████████╗ ██████╗██╗  ██╗██╔════╝██╔═══██╗██╔══██╗██╔════╝    ██╔════╝████╗  ██║██╔══██╗╚══██╔══╝██╔════╝██║  ██║██║     ██║   ██║██║  ██║█████╗      ███████╗██╔██╗ ██║███████║   ██║   ██║     ███████║██║     ██║   ██║██║  ██║██╔══╝      ╚════██║██║╚██╗██║██╔══██║   ██║   ██║     ██╔══██║╚██████╗╚██████╔╝██████╔╝███████╗    ███████║██║ ╚████║██║  ██║   ██║   ╚██████╗██║  ██║ ╚═════╝ ╚═════╝ ╚═════╝ ╚══════╝    ╚══════╝╚═╝  ╚═══╝╚═╝  ╚═╝   ╚═╝    ╚═════╝╚═╝  ╚═╝

Learn · Earn · Connect

Community Question Bundle

SRE Incident Drill Questions From Our On-Call

Four incident-shaped questions our on-call rotation uses to interview SREs. Each one starts with a symptom, asks you to write the smallest diagnostic snippet, and the discussion is more important than the code.

SRE Incident Drill Questions From Our On-Call

Four incident-shaped questions our on-call rotation uses to interview SREs. Each one starts with a symptom, asks you to write the smallest diagnostic snippet, and the discussion is more important than the code.

Question Bundle

Python

4 questions

reliability

on-call

monitoring

interview-prep

By @alexsaeed

March 30, 2026

·

Updated May 20, 2026

827 views

26

4.3 (14)

Q1

Pager fires at 03:14: "p99 latency on /checkout doubled in the last 5 minutes." Walk me through your first 5 minutes. Include the one-liner you would actually run, and the wrong move I am watching for.

On the pager

On the pager at 03:14, the sequence I'd run: first_five_minutes(alert) returns ['confirm_in_dashboard', 'check_recent_deploys', 'check_dependency_health', 'form_hypothesis', 'decide_action']. The wrong move I'm watching for is kubectl rollout undo before a hypothesis is formed: if the upstream resolves on its own at the same moment, you cannot tell whether your rollback fixed it or the issue cleared.

Q2

Q3

Q4

Back to Question Bundles