I once spent two days debugging a checkout flow that double-charged a small percentage of customers. The code looked fine. The tests passed. The bug only happened when two threads of the same user clicked the buy button within milliseconds of each other. The fix was a one-line change to the transaction isolation level. The lesson, which I still carry, is that you cannot reason about correctness of concurrent code without understanding what isolation level your transactions actually run at, what anomalies that level allows, and how the database's locking and MVCC machinery enforces it.
This article walks through the four standard isolation levels (read uncommitted, read committed, repeatable read, serializable) with a specific anomaly each one prevents and a runnable Postgres example showing the failure. The opinion this article defends: read committed (the default in most databases) is wrong for state-changing flows where the same row is read and then conditionally updated. For those flows, you need either repeatable read with explicit retry on serialization failure, or serializable, or row-level locks. "Read committed plus hope" is how you get production bugs that look like the one I opened with.
The setup
Every example below uses two psql sessions, labelled T1 and T2. Both connect to the same Postgres instance. Both work on a tiny accounts table:
I run each example on a fresh table by truncating and reinserting between trials. Postgres defaults to read committed; for the higher levels I set the isolation level explicitly with BEGIN TRANSACTION ISOLATION LEVEL ....
Read uncommitted: dirty reads (and why Postgres does not have them)
The lowest standard isolation level is read uncommitted, which permits dirty reads. A dirty read is when one transaction reads data that another transaction has written but not yet committed.
The textbook example:
T2 read a value that disappeared. If T2 acted on it (sent a notification, kicked off a workflow), it acted on a phantom write. This is the canonical "do not let me see writes that did not happen" anomaly.
Postgres specifically does not implement read uncommitted as a different level. SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED is silently treated as read committed. The reason is Postgres's MVCC implementation: every transaction sees a consistent snapshot of committed data, and the cost of letting it see uncommitted data would be higher than the cost of just running at read committed. So in Postgres specifically, dirty reads cannot happen. In MySQL with InnoDB they can, if you ask for them. SQL Server allows them via WITH (NOLOCK), which is a hint many people overuse without understanding what it costs.
If you need to actually demonstrate a dirty read, use MySQL/InnoDB and switch session 1 to read uncommitted. Either way, the anomaly is real, and almost no one wants it.
Watching the dirty read in MySQL/InnoDB
Postgres will not let you observe a dirty read, but MySQL with InnoDB will. The demo is worth running once because the failure mode is more disorienting than the textbook description sounds.
Open two mysql clients against an InnoDB database with the same table:
Run the two sessions in lockstep. Session 2 must explicitly request read uncommitted; InnoDB's default is repeatable read.
The first time I ran this against an InnoDB instance I had used for years, my reaction was "wait, MySQL really does that". It does, if you ask for it. Any code path at read uncommitted in MySQL can act on writes that vanish on rollback: notification jobs, audit log inserts, downstream queue messages, all of them potentially firing on data the originating transaction undid.
The reason WITH (NOLOCK) in SQL Server is so popular and so dangerous is the same: it is a per-query escape hatch that drops the reader to read uncommitted for that one statement. People reach for it when a query is slow and a coworker says locks are the cause. The query gets faster. The correctness drops. I have audited codebases where every reporting query had WITH (NOLOCK) sprinkled in, and the team was surprised when totals came out wrong on a busy day.
Read committed: non-repeatable reads
Read committed is the default in Postgres and MySQL. Each statement sees a snapshot of the data that was committed before the statement started. Different statements within the same transaction can see different states, because the second statement sees commits that landed between the first and the second.
The anomaly this allows is the non-repeatable read: the same SELECT returns different values within one transaction.
Two reads, two different answers, both in the same transaction. This is non-deterministic from T1's point of view: any logic that reads a value, does some computation, and then reads it again expecting consistency is broken at this level.
The bug I opened the article with was a read-committed bug. The flow was approximately:
Two concurrent runs of this flow, with the same user clicking buy twice fast, both read balance = 100, both write the charge, both update the balance to 50. The user is charged twice but only debited once on the balance side, depending on the exact ordering. Read committed allowed both transactions to see the pre-decrement state.
Repeatable read: phantom reads (in some databases)
Repeatable read fixes the non-repeatable read. Within a transaction at repeatable read, every statement sees the same snapshot, taken at the start of the transaction. The same SELECT returns the same answer every time.
Repeatable read still permits one anomaly in the SQL standard: the phantom read. A range query like SELECT * FROM accounts WHERE balance > 50 could return a different number of rows on a second call because another transaction inserted a new row matching the predicate. In standard repeatable read, individual rows are stable but the set of rows is not.
Postgres's repeatable read implementation is stronger than the standard: it uses a snapshot per transaction, which means even range queries are consistent. So in Postgres specifically, phantom reads do not happen at repeatable read. In MySQL/InnoDB, repeatable read is closer to the standard, and phantoms can happen unless you use locking reads (SELECT ... FOR UPDATE).
What repeatable read still does not prevent, even in Postgres, is the lost update anomaly. Two transactions read the same row, both compute a new value based on what they read, and both try to write. With Postgres's snapshot isolation at repeatable read, the second transaction's UPDATE will fail with a serialization error if and only if the same row was modified by the first transaction's commit. That is the "could not serialize access due to concurrent update" error you have probably seen.
The application must be ready to handle this error and retry. Code that runs at repeatable read without a retry loop is not actually safe; it just looks safe until the first concurrent collision.
Serializable: full ordering
Serializable is the strongest level. The promise is that any concurrent execution produces the same result as some serial (one-at-a-time) execution. No anomalies. Period.
Postgres implements serializable using SSI (Serializable Snapshot Isolation), which adds predicate-conflict detection on top of repeatable read. The cost is more transactions get aborted with serialization failures, and your application must retry them.
The classic example of an anomaly that repeatable read allows but serializable does not is the write skew problem.
Each transaction read a stable snapshot, made a decision based on that snapshot, and updated a different row. No row was concurrently modified, so repeatable read sees no conflict. But the invariant the read was checking ("at least one doctor on call") was violated by the combined effect.
Serializable in Postgres detects this: SSI tracks the predicate each transaction read, and if two transactions' reads and writes form a dependency cycle that would not exist in any serial execution, one of them is aborted with a serialization failure.
The application sees a serialization failure and retries. On retry, T2 reads count = 1 (Alice is no longer on call), and the application logic refuses Bob's update. Invariant preserved.
How the same level differs across Postgres, MySQL, and SQL Server
The SQL standard names the levels but does not pin down the implementation, so the same ISOLATION LEVEL REPEATABLE READ clause does meaningfully different things depending on which engine reads it. The differences I have run into often enough to remember:
Postgres. Repeatable read is snapshot isolation: the transaction sees a consistent snapshot taken at the first statement. Phantoms cannot happen. Lost updates raise a serialization error. Serializable adds SSI on top, with predicate-conflict detection. There is no native read-uncommitted level.
MySQL/InnoDB. Repeatable read is the default and is closer to the SQL standard. Phantoms can occur on range queries unless you add SELECT ... FOR UPDATE or LOCK IN SHARE MODE, which take next-key locks (a row lock plus the gap before it). Serializable promotes every plain SELECT to LOCK IN SHARE MODE, which is more lock-heavy than Postgres's SSI and tends to cause deadlocks under contention.
SQL Server. Repeatable read uses shared locks held until commit, which prevents non-repeatable reads but blocks writers aggressively. SQL Server also has a separate SNAPSHOT isolation level (opt-in via ALLOW_SNAPSHOT_ISOLATION) that is closer to Postgres's repeatable read. Serializable holds range locks and is even more blocking.
The lesson is that "set the isolation level to repeatable read" is not a portable instruction. If your team supports both Postgres and MySQL, the same clause produces different anomalies in production. The fix is usually explicit row locks (SELECT ... FOR UPDATE) plus an idempotency key, which gives you a correctness floor that does not depend on the engine's interpretation of the level.
SELECT FOR UPDATE: the explicit-lock alternative
An alternative to bumping the isolation level is to take an explicit row-level lock with SELECT ... FOR UPDATE. This locks the row at read time and prevents any other transaction from updating it until the current one commits or rolls back.
The trade-off compared to serializable: locks force concurrent transactions to wait, while serializable lets them proceed and then aborts on conflict. Under low contention, serializable is faster (no waiting). Under high contention on the same rows, locks are faster (no abort-and-retry overhead). I have used both; the right choice depends on your contention profile.
One pragmatic rule: if your read-then-update flow always touches the same row by primary key, SELECT ... FOR UPDATE is simpler. If it touches different rows but checks an invariant across them (the doctors example), serializable is the right tool because the lock-the-row approach does not catch cross-row invariants.
The optimistic-locking pattern with a version column
A third option, separate from raising the isolation level and from SELECT ... FOR UPDATE, is optimistic locking with a version column. The pattern is old, well-tested, and sits outside the engine's isolation machinery entirely. It works at any isolation level, including read committed, which is why I reach for it when I cannot or do not want to bump the level.
The schema gets one extra column:
Reads include the version. Writes use it as a guard, and bump it:
If another transaction has bumped the version since we read it, the WHERE version = 7 clause matches zero rows, the UPDATE does nothing, and the application sees a row count of zero. That is the signal to retry: re-read, recompute, re-write.
The application loop:
The advantage over SELECT ... FOR UPDATE is that no lock is held between read and write, so other readers and unrelated writers are not blocked. The disadvantage is that under high contention on the same row you spend more time retrying than working. I default to optimistic locking for low-contention rows (per-user balances, per-document edits) and SELECT ... FOR UPDATE for hot rows (a global counter, a single inventory row).
Quick reference: what each level prevents
| Level | Dirty read | Non-repeatable read | Phantom read | Lost update | Write skew |
|---|---|---|---|---|---|
| Read uncommitted | allowed | allowed | allowed | allowed | allowed |
| Read committed | prevented | allowed | allowed | allowed | allowed |
| Repeatable read (Postgres) | prevented | prevented | prevented | prevented | allowed |
| Serializable | prevented | prevented | prevented | prevented | prevented |
The Postgres-specific note: "prevented" at repeatable read for phantom and lost update is stronger than the SQL standard guarantees. Other databases at "repeatable read" may still permit those.
What I run at, in production
My defaults across the last few systems:
- Read committed for read-only / report queries. Snapshots are not needed; the latest committed data is fine, and you avoid the abort-and-retry overhead.
- Repeatable read with explicit retry for read-modify-write flows on a single row. Plus
SELECT ... FOR UPDATEif the contention is high. Plus an idempotency key on the request, so retries from above (HTTP retries, queue redeliveries) do not double-charge even if the database somehow lets a transaction repeat. - Serializable for cross-row invariants. When the correctness of the operation depends on a count or sum across multiple rows, serializable is the only level that can guarantee it without explicit table-level locking. The retry overhead is real but bounded if contention is low.
Choosing the right level is not a one-time decision. I revisit it whenever a new flow is added that reads-then-writes, or whenever a postmortem reveals a concurrency bug.
Why this matters more than people admit
Most production database bugs I have shipped were not algorithm bugs. They were isolation-level bugs. They look like "my code is correct, but two of these run at the same time and the result is wrong". The root cause is always the same: the developer assumed an isolation level that the database is not actually providing.
I have come to believe that any developer writing code that reads-then-writes the same row should know which isolation level their transaction runs at, what anomalies that level allows, and what the application does when a serialization failure occurs. Not because they will need it every day, but because when they do need it, the absence of that knowledge is the difference between a one-line fix and a two-day debugging session.
A retry helper that has held up
Every time I bump a flow to repeatable read or serializable, I wrap the call in a retry helper. The shape that has worked across a few codebases:
Three details worth calling out. The exponential backoff plus jitter prevents two retrying transactions from colliding on every retry. The 40001 code is the Postgres SQL state for serialization failure; 40P01 is deadlock. Both are retryable. And the retry function takes a callback that runs inside the transaction, so the user code does not have to remember to begin/commit; it just expresses the work to do, and the helper handles the boilerplate plus the retry semantics.
What to take from the failing examples
If the only thing you take from this article is the habit of running concurrent flows in two psql windows before you ship them, I will count it a win. The two-window setup is the cheapest test you can run for concurrency correctness, and it catches more bugs than any unit test ever will. Pick a flow in your codebase that reads then updates a row. Open two sessions. Run the flow in both, interleaving the steps. If the result is wrong at your current isolation level, raise the level or add a lock. The bug will not survive the change. Production will be quieter for it.
