Community Article

AI Coding Assistants: Where They Help and Where They Hurt

Two years of using AI coding assistants daily, the four tasks where they have made me measurably faster, the three places they have actively cost me time, and the workflow I have settled on.

AI Coding Assistants: Where They Help and Where They Hurt

Two years of using AI coding assistants daily, the four tasks where they have made me measurably faster, the three places they have actively cost me time, and the workflow I have settled on.

machine-learning
craftsmanship
code-organization
ai-safety
owentanaka

By @owentanaka

May 17, 2026

·

Updated May 18, 2026

616 views

17

4.2 (13)

I have used AI coding assistants daily for the better part of two years across a few different products. The marketing has been consistent ("10x your productivity"); the actual experience has been more textured. They have made me faster on a specific subset of tasks, slower on a few, and roughly neutral on most of the work that fills my calendar. Knowing which is which has been the single most useful thing I have learned about working with these tools.

This is the writeup I would have wanted before my first month of AI-assisted coding. Four tasks where the assistant has measurably helped me, three where it has actively cost me time, and the workflow I have settled on so I get the wins without the recurring losses. I will hedge on specific products and benchmarks (the field changes weekly); the patterns I am describing have been consistent across the assistants I have used and the codebases I have worked in.

Why "productivity" is the wrong frame

The vendor framing for coding assistants is productivity: keystrokes saved, lines generated, tasks completed in less wall time. The frame I have ended up using instead is more granular. Every coding task I do has three phases: thinking about the problem, writing the code, and verifying that it works. AI assistants compress phase two dramatically and have negligible (or sometimes negative) effect on phases one and three.

For tasks where phase two dominates (boilerplate, well-understood translation, common patterns), the assistant is a clear win. For tasks where phases one and three dominate (debugging, novel design, integrating with an unfamiliar system), the win is much smaller and sometimes negative because the assistant's plausible-looking output adds verification work that did not previously exist.

The useful question is not "is the assistant making me faster?" The useful question is "on this specific task, is phase two the bottleneck?" If yes, the assistant is probably going to help. If no, it is probably going to cost more in verification than it saves in typing.

Four tasks where it has measurably helped me

Boilerplate I already know how to write

The canonical case. I need a TypeScript file that exports a Zod schema, a TypeScript type derived from it, and a CRUD service object with five methods. I know exactly how I want it to look. The assistant produces 95% of it on the first try, and I edit the 5%.

This is the single biggest time-save I get. On a typical week of feature work, I save what feels like half a day of typing. The boilerplate was never the interesting part of the work; offloading it has been a clean win.

The key word in "boilerplate I already know how to write" is "I already know". When the assistant produces a slightly wrong version, I notice instantly because it differs from the version I had in my head. The boilerplate is a thing I have a clear opinion about; the assistant is a typing accelerator, not a decision-maker.

Tests for code I just wrote

I write the function. I ask the assistant to write a test file that covers the happy path, the error path, and the edge cases I list. It produces a Jest file with a dozen reasonable test cases. I review them, edit two or three, delete one that is testing the wrong thing, and add the one it missed.

The assistant is decent at the rote work of test scaffolding (mock setup, describe blocks, the boilerplate of arrange/act/assert) and merely-competent at thinking of edge cases. The combined output is faster than my from-scratch version, and reviewing the cases is itself a useful exercise (sometimes the assistant flags an edge case I had not considered).

The rule that has worked: I write the assertion list; the assistant writes the test file. "Generate tests for this function" without guidance produces a watered-down test file. "Generate tests covering: empty input, single item, three items, items with the same key (should preserve order), input with non-ASCII characters" produces something close to what I would have written.

Language and framework translation

I know how to do something in Python and need it in TypeScript. The structure is the same; the syntax is what I do not have at my fingertips. The assistant translates with high fidelity. The wins compound across small tasks (regex syntax, list comprehension shape, async patterns).

This is where the assistant is most clearly behaving as a sophisticated lookup tool, not a pattern-generator. It "knows" the translation because the equivalent code exists across the training data many times. The output is reliable for exactly that reason.

The limitation: the assistant can be confidently wrong about API details that have changed since its training. It will happily generate fs.readFile with the old callback signature when modern code uses the promise-based form. The verification step (run the code, see what fails) is short, but it is necessary; the model's confidence does not correlate with version-correctness.

Explaining code I have not seen before

I open a 600-line file in a new codebase. I ask the assistant for a paragraph explaining what it does. The summary is a useful starting point: it tells me the rough shape, names the dependencies, and points out the entry function. I read the file critically, but the summary saved me 5 minutes of orienting.

The assistant is good at this because the task is essentially a sophisticated paraphrase: take this code, describe its structure in prose. The cases where it goes wrong are usually subtle (it summarizes what the code looks like it does, not what it actually does, when the code has a non-obvious side effect or off-by-one bug). The summary is a starting point, not a substitute for actually reading.

Three places it has actively cost me time

Debugging an unfamiliar bug

The assistant's instinct on debugging is to suggest changes that look reasonable. Sometimes one of them is right. More often, the suggestion is plausible but does not address the actual cause. Following an incorrect suggestion costs me real time: I edit the code, run the tests, watch them still fail, undo, try the next suggestion. By the time I give up and read the code carefully myself, I have spent 30 minutes on suggestions and the actual bug took 5 minutes to find once I focused.

The failure mode here is that the assistant generates suggestions optimized for plausibility, not for diagnosis. Diagnosis is the slow, careful step the assistant cannot do for me, and asking the assistant to do it is a way of not doing it.

The rule I have settled on: when I am debugging, I do not invoke the assistant for fixes. I sometimes use it for explanations of unfamiliar code I encounter while reading. But fixes come from understanding the bug, and understanding is the part the assistant cannot accelerate.

Novel design decisions

I need to choose between two architectural approaches for a new feature. Both have trade-offs. The assistant, asked which is better, produces a confident-sounding analysis that often misses the local context (the team's skill set, the existing codebase's conventions, the specific failure modes the team has seen before). The recommendation is fluent and superficial.

The failure mode: the assistant has read a lot about software design but does not know the specific situation. Its output reads like a senior engineer's recommendation but is generic. Following it without skepticism produces decisions that look reasonable in isolation and do not fit the codebase.

The pattern that has worked: I describe both options to the assistant and ask it to list the trade-offs of each. That output is useful as a checklist (it surfaces concerns I had not enumerated). Then I make the decision myself, weighing the trade-offs against the local context. The assistant is a brainstorming partner, not a decision-maker.

Security-critical code

Authentication, authorization, cryptographic operations, anything that has a security-critical correctness criterion. The assistant produces code that compiles and looks reasonable; the failure mode is that the produced code has a subtle vulnerability (a missing constant-time comparison, an inadequate input validation, a weak hash, a confused-deputy permission check) that requires expert review to spot.

I have caught these in my own work and in PRs I have reviewed. The frequency is low (the assistant gets these right more often than not), but the cost of one wrong instance is high. "Cost-weighted accuracy" is the framing: a 95% correct rate on security code is unacceptable when the 5% can be a serious vulnerability.

The rule: I do not accept assistant-generated security code without explicit, careful review against an authoritative source (the OWASP cheat sheet, the framework docs, a known-good reference implementation). The assistant accelerates the typing; the review is the same amount of work it always was.

A side-by-side that lives on my notes

Where AI coding assistants help me, where they hurt, and where they are neutral
  Task                              Net effect       Why
  Boilerplate I know cold           Strong help      Phase 2 dominant; I notice errors fast
  Test scaffolding                  Help             I write the assertion list, assistant fills in
  Language/framework translation    Help             Mostly lookup, easy to verify with `run`
  Explaining unfamiliar code        Help             Useful starting point for orientation
  ----
  Debugging unfamiliar bugs         Hurt             Plausible suggestions distract from diagnosis
  Novel architectural decisions     Hurt             Fluent but generic; lacks local context
  Security-critical code            Hurt             Failure modes are high-cost, low-rate
  ----
  Refactoring                       Mixed            Good for mechanical renames; weak on intent
  Code review                       Mixed            Catches obvious issues; misses subtle ones
  Reading large unfamiliar codebases Slight help     Useful index; not a substitute for reading

The workflow I have settled on

After two years of trial and error, the workflow that gets me the wins without the recurring losses:

1. I plan in my head, not in the assistant. Before I open the chat, I have a clear idea of what I am building. The assistant is invoked once I know the shape, to accelerate the typing. Inverting this (asking the assistant what to build before I have thought about it) leads to the novel-design failure mode.

2. I review every diff line by line. The assistant produces 200 lines; I read 200 lines. The temptation to skim a generated file because it looks reasonable is exactly the temptation that ships bugs. Reading is slower than generating, and the read time is the actual time cost; the time savings come from not having to type, not from skipping the review.

3. I do not use the assistant for diagnosis. Bugs come from understanding. Asking the assistant to suggest fixes is a way of avoiding the understanding. I read the code, I form a hypothesis, I verify the hypothesis, I write the fix. The assistant only re-enters when I know the fix and want it typed faster.

4. I keep a list of "do not autocomplete" zones. Authentication code, database migrations, anything with a security-critical correctness criterion. In those zones, I write the code by hand and use the assistant only for documentation or explanation. The discipline is mechanical: a list, applied without exception.

5. I treat assistant suggestions as a starting point, not an answer. Every suggestion is a hypothesis I either accept (with edits) or reject. The assistant is a tool, not a colleague; it does not have skin in the game and its confidence is not calibrated to correctness.

Phase two is what they actually compress

The single sentence I would put on the team wiki: AI coding assistants compress phase two of every coding task (the typing) and have negligible or negative effect on phases one (thinking) and three (verifying). Every workflow rule I have settled on flows from that. Use them on tasks where phase two dominates: boilerplate, tests, translations, explanations of unfamiliar code. Do not use them on tasks where phases one and three dominate: diagnosis, novel design, security review. The discipline of knowing which is which, applied across a workday, is what turns these tools from a 5% productivity gain dressed up as 50% into a real, durable improvement on the part of the job that was never the interesting part anyway. The cost of using them well is the discipline. The cost of using them poorly is the verification work you did not previously need. Choose deliberately.

Back to Articles