Community Article

Code Review: The Checklist I Wish I'd Had Day One

The five questions I ask on every PR, the comments I have learned not to leave, and the review style that keeps the team shipping without sacrificing quality.

Code Review: The Checklist I Wish I'd Had Day One

The five questions I ask on every PR, the comments I have learned not to leave, and the review style that keeps the team shipping without sacrificing quality.

craftsmanship

clean-code

code-organization

collaboration

feedback

By @arjunrivera

April 1, 2026

Updated May 20, 2026

508 views

4.4 (9)

Two years into my first staff job, I left a comment on a junior engineer's PR that I still think about. The change was a 600-line refactor of an authentication module. My comment was, "this naming feels off, please reconsider". The author, who had spent two days on the work, replied politely asking what I would prefer. I did not have a specific suggestion. I just had a vague sense of unease. The PR sat for four more days while we exchanged seven more comments about names that were not actually the problem.

The real issue, as I figured out on the call we eventually had, was that the new module quietly broadened a session expiry from one hour to twelve. None of my comments touched it. I had spent ninety minutes on the cosmetics and missed the actual security regression. That review is the one I keep returning to when I think about how to do this well.

This article is the checklist I wish I had taped to my monitor on day one. Five questions I run every PR through, in order, with the latency-killing pitfalls that tempted me away from each one.

Question 1: Does the change do what the description says?

I start every review by reading the PR description, then opening the diff with that description in mind. About 10% of the time, the diff and the description are subtly different stories. The description says "add retry on 5xx" and the code adds retry on every error. The description says "refactor for readability" and the diff also changes a default timeout from 5 seconds to 30. These mismatches are not bugs in the strict sense but they are the most common source of post-merge surprises on the teams I have worked on.

The comment shape that has worked for me here is concrete: "the description says X but the code also does Y. Is Y intentional? If so, can you call it out in the description so reviewers (and the author of the next PR that builds on this) know it is part of the change?" That phrasing accepts that Y might be deliberate, asks for a paper trail, and does not require the author to defend their judgement.

Question 2: What can break in production that isn't covered by the tests?

This is the question that catches the security regression I missed in the story above. I read the diff with one eye on the happy path and one eye on what happens when the world is wrong: a network call times out, the database returns zero rows, a list is empty, a string is unexpectedly Unicode, a clock is skewed, a user has 30,000 of something instead of 30.

My own bias is to test the happy path well and forget about the failure modes. I assume reviewers see this in my own PRs too. So in code review I deliberately invert it: I trust the happy-path tests are fine, and I spend my reading time on the cases the tests do not cover.

A shortlist of failure modes I check on almost every PR:

What happens if the input is empty? null? a single item?
What happens if the network call succeeds but the response shape is unexpected?
What happens if two requests arrive simultaneously and modify the same record?
What happens if a downstream service returns a 500 in the middle of a multi-step operation? Are we left in a half-applied state?
What is the behavior at the boundary (last day of the month, year rollover, leap day, off-by-one in a paginated response)?

Not every PR needs every check. I pick the two or three that are relevant given the surface area of the change.

Question 3: Will future-me hate this code six months from now?

Most code I review is correct on the day it lands. The question that matters more is whether anyone can change it safely six months later. Three things tend to predict that they cannot:

Implicit coupling. A function that reads from a global, calls a singleton, and depends on a module-level cache is not a function I can test in isolation. The author may not have noticed because they wrote it in one sitting. Six months from now, the next person to touch it will not know which globals matter.

Names that describe the implementation, not the intent. processData() tells me nothing. validateAndPersistInvoice() tells me what the function is for, which means the next developer can decide whether the name still matches once they have changed the body. Names that describe intent age well, names that describe mechanism rot.

No comments at the place where the reader will be confused. The general rule that "code should be self-documenting" is true for the obvious cases. It is wrong for the cases where the code is doing something subtle on purpose: a workaround for a downstream bug, a deliberate non-obvious ordering, a surprising default that exists because of a real customer ticket. Those decisions need a comment that explains why, not what. Otherwise the next person to read it will assume it is wrong and "fix" it.

Question 4: Is the change the right size?

The single largest predictor of review quality, in my experience, is PR size. A 50-line PR gets a careful reading. A 500-line PR gets a skim followed by an LGTM. A 2,000-line PR gets a sigh followed by an LGTM. The math is simple: my review attention is finite, and a big PR rations it across the whole diff.

The practical move in review is not to refuse big PRs (sometimes the work is genuinely atomic), but to push back when a big PR is splittable. Three patterns I propose often:

"The first half of this PR is a pure refactor with no behavior change. Can it land separately so the second half is small enough to read carefully?"
"This adds the new feature and changes how an existing one works. Can the existing one's change land first, get verified in production for a day, then add the new path on top?"
"There are three independent fixes here. Could they be three separate PRs?"

This is the one piece of feedback I have given the most often and seen the most return on. Authors who hear it once or twice usually internalize it. Their PRs land faster, with better reviews, for the rest of the time we work together.

Question 5: What would I do differently here?

This is the question I save for last because it is the one most likely to make the review feel like a rewrite. The trap is that I read a piece of code, notice that I would have structured it differently, and start sketching the structure I would have used. That is feedback about my preferences, not about the PR.

The filter I run before leaving a structural comment: would I block-merge a hypothetical version of this PR that did things my way but with the same bugs? If the answer is no ("my version would have the same problems"), the comment is taste, not quality, and I keep it to myself or share it as a non-blocking note. If the answer is yes ("my version would actually be more testable / less coupled / faster"), the comment is worth leaving, and I phrase it as a suggestion with a concrete example, not a vague "maybe consider".

The phrasing that has saved me the most pain is to label every comment with its severity. I borrowed this from Conventional Comments and it has changed how my reviews land:

Review comment severity legend
  blocking: must be fixed before merge
  question: I am not sure, please clarify
  suggestion: a non-blocking improvement, optional
  nit:        cosmetic, take or leave
  praise:     this is good, calling it out

The nit: and suggestion: labels are the magic. Authors learn to skim them quickly, address what they want to address, and move on. Without the label, every comment looks like a blocker and the PR sits.

Comments I have learned not to leave

A short list of comments I no longer write, with what I write instead.

"This naming feels off." → "Could processData be renamed to something that describes intent? validateAndPersistInvoice would tell me what to expect."
"Why are you doing it this way?" → "What was the alternative you considered? I want to understand the trade-off."
"This is wrong." → "I think this returns the wrong value when items is empty: line 42 hits the early return before the validation runs. Can you add a test for the empty case?"
"LGTM" on a 500-line PR. → If I cannot leave any specific comment, I have not actually reviewed the PR. I either spend more time or I ask someone else to be the second reviewer.

The review I aim for

My bar for my own reviews is now this: every comment is specific, every blocking comment names a concrete production scenario it prevents, and every non-blocking comment is labelled suggestion: or nit:. The reviews that hit that bar take 20-40 minutes for a typical PR. The author can address them in one sitting, often without a back-and-forth. Things that are wrong get caught. Things that are taste get noted but do not stall the merge. Things that are personal preferences I keep off the PR.

The security regression in my opening story would have been caught by question 2 ("what can break that isn't covered by the tests?") in five minutes. I now ask that question first, every time. The cosmetics can wait.

Stop reviewing the code, start reviewing the change

The shift that moved my reviews from picky to useful was reframing what I was reviewing. I was not reviewing the code as if I were grading an exam. I was reviewing a change against a system that already worked yesterday. The right question is not "is this code I would have written?" It is "does this change make the system better than it was?" That second question almost always has a clearer answer, and it cuts most of the drift toward bikeshedding before the comment is even written. Day-one me would have argued. Today-me writes the questions on a Post-it and works through them in order.

Back to Articles