The first time I designed an idempotent API, I got the easy part right and the hard part wrong. The easy part is the key: a string the client sends with each request, and if you see it twice, you return the cached response from the first call. The hard part is everything around it. How long do you store the key? What if the original request is still running when the retry arrives? What if the client sends the same key with a different body? What about partial failures where the database write succeeded but the response was lost?
This article is the version of "how to do idempotency keys right" I wish I had read before I shipped my first version. The Stripe blog post on this pattern is widely cited, and rightly so; it remains the canonical reference. My stance here is that the key itself is the trivial part of the design, and most production failures with idempotency are storage and lifecycle bugs, not key-collision bugs.
The narrow problem idempotency keys solve
A client makes a POST /charges request. The network drops the response. The client retries. Without idempotency, the server processes both, charges the customer twice, and the customer files a chargeback. With idempotency, the second request is recognized as a duplicate, the cached response is returned, and the customer is charged once.
That is the entire goal: turn a non-idempotent operation into one safe to retry. "Safe to retry" is a strong property and you should not give it away cheaply.
A minimal implementation:
Twelve lines. It looks correct. It is wrong in five ways, which I will walk through.
Bug 1: the request body is not part of the key
What happens when the client sends Idempotency-Key: abc-123 with one body, and then sends abc-123 with a different body? The naive implementation returns the cached response from the first call. From the client's perspective, the second request looks like it succeeded; from the database's perspective, the second request was never processed.
The fix is to record a fingerprint of the request body alongside the key, and reject the second request if the fingerprint does not match. The convention is to return 409 Conflict or 422 Unprocessable Entity.
This protects you from client bugs (the client mutated the request between retries) and from accidental collisions when two different operations happen to use the same key.
Bug 2: concurrent retries
Two requests with the same key arrive at the same time. Both check the store, both find no cached response, both proceed to charge the customer. You have just created the bug idempotency was supposed to prevent.
The fix is to make the "check and lock" atomic. The pattern I have used most:
The placeholder is the lock. If the placeholder insert succeeds, this request owns the key and proceeds. If it fails, another request is either still working on this key or has already finished. The status field on the row is what tells you which.
A SQL implementation looks something like this:
The PRIMARY KEY constraint gives you the atomic insert-or-fail. After the request completes, an UPDATE flips status to completed and writes the response.
Bug 3: the work succeeded but the response was not saved
The charge went through. The downstream call to Stripe returned 201 Created. Then the server crashed before writing the response back to the idempotency store. The client retries with the same key. The store says "pending". What now?
This is the genuinely hard case. Two bad answers:
- "Process the charge again." Now you have charged the customer twice. The whole point of idempotency keys was to prevent this.
- "Refuse the request as pending." Now the client is stuck. The request is in limbo, the server cannot tell whether the original processing finished, and there is no way to make progress.
The right answer needs cooperation from the downstream system. If the downstream call (the Stripe charge, the database insert, whatever) is itself idempotent and accepts your idempotency key, you can retry the downstream call safely; the second call will be recognized as a duplicate and return the original result. This is why payment processors, ours included, support idempotency keys at their layer too. The pattern composes.
Without that downstream support, you are left writing custom recovery logic: "check whether the charge actually went through by querying the downstream by some other identifier we recorded before the call." That is fragile. The pattern depends on every layer of the stack supporting idempotency for it to compose end-to-end.
Bug 4: TTL too short or too long
How long do you keep an idempotency key? The two failure modes are symmetric:
The figure I have seen quoted in the literature is 24 hours. That covers most retry windows; clients that are still retrying after 24 hours are probably broken in some other way. For a high-volume API, 24 hours of keys is a measurable amount of storage. A back-of-the-envelope: at 10,000 requests per second sustained, 24 hours is 10000 * 86400 = 864 million keys. At 200 bytes per key (key + fingerprint + small response), that is roughly 170 GB. Not crippling, but plan for it.
I have settled on 24 hours as the default and longer (up to a week) for low-volume APIs where storage is cheap and clients are unpredictable. Shorter (one to two hours) for very high-volume APIs where storage is the bottleneck.
The cleanup job is a thing to actually build and run. A DELETE FROM idempotency_keys WHERE created_at < now() - interval '24 hours' running on a schedule, or a TTL index in a database that supports them. I have seen teams forget the cleanup job and watch the table grow without bound.
Bug 5: the cleanup races the retry
Suppose your TTL is 24 hours and the cleanup runs at 24 hours plus epsilon. A client that retries at 24 hours plus epsilon-minus-something will hit either the cached response (if the cleanup has not run yet) or a missing key (if it has). The behavior is timing-dependent. If your TTL is exactly 24 hours, expect a small fraction of retries to land on the wrong side of the cleanup.
Two mitigations I use:
- Set the TTL slightly longer than the published retry window. Tell clients "retries will be deduplicated for up to 24 hours" and run the cleanup at 25 or 26 hours.
- Make the cleanup gradual. Delete in batches across the day rather than in one big sweep at midnight, so the timing of any individual key's expiration is fuzzed.
Neither of these is exotic; both are about acknowledging that "24 hours" is not a sharp boundary in a real system.
What the key itself should look like
Three rules:
- The client generates the key, not the server. The whole point is that the same retry produces the same key, so the client must be able to deterministically produce it. A UUID v4 generated once and stored alongside the request is the standard.
- The key is opaque to the server. Do not encode meaning into the key. Do not make it
user-42-charge-2026-01-01. The server just compares strings. - The key is per-operation, not per-session. Two unrelated charge requests in the same session should have different keys. Reusing a key across operations is a client bug.
The conventional header name is Idempotency-Key, which Stripe popularized and which the IETF standardized in 2024 as a draft. Using that name means clients written for one API can mostly retry against another with the same retry library.
Where idempotency keys do not help
Three operations where I would not bother with idempotency keys, and what I do instead:
For idempotent operations by construction (PUT, DELETE, GET), the operation itself is already idempotent at the protocol level. You do not need a key. You might still want one for audit purposes, but the safety is built in.
For long-running operations (an export job, a billing run), the right pattern is a separate "job" resource with its own ID, polled by the client. The client can retry the "create job" call with an idempotency key; the underlying job is started once and the client polls until it completes.
For operations with externally-visible side effects you cannot undo (sending a marketing email, posting to a third-party that does not support idempotency), the key only protects you from your own duplicate processing. It does not help if the third party processes your request twice. You still need to design for at-most-once at the API layer or accept at-least-once and live with the consequences.
A failure I would not repeat
A previous team I was on stored idempotency keys in Redis with a 24-hour TTL. The TTL was set per-key. We did not realize that Redis evicts keys lazily (on access) when memory pressure rises, which meant under memory pressure, an idempotency key could disappear before its TTL ran out. A duplicate request sent during the gap would be processed as new. We saw this on a Black Friday traffic spike: a small percentage of requests double-processed because the keys had been evicted ahead of schedule.
The fix was to move the storage to Postgres with an explicit cleanup job, accepting the higher write latency for the durability guarantee. Redis is fast; it is not a system of record for things you cannot afford to lose. If the idempotency key is the difference between charging a customer once and twice, treat it like the financial record it is.
What I tell teams designing their first idempotent API
Three things, in this order:
- Use the standard header name (
Idempotency-Key), accept any opaque string, and require it on every state-changing endpoint that needs retry safety. - Store the keys in a system that does not evict under pressure, with an explicit fingerprint of the request body, an explicit pending/completed status, and a 24-hour TTL with a separate cleanup job.
- Design downstream calls to be idempotent under the same key. The pattern composes if every layer supports it; it falls apart at the first layer that does not.
The reason the Stripe write-up became the canonical reference is that they explained the lifecycle (the pending status, the body fingerprint, the storage choice) clearly enough that it could be copied. Most teams that get idempotency wrong have implemented the cache-the-response part and skipped the lifecycle part.
What earns the complexity
Idempotency keys add a meaningful amount of state and latency to every state-changing endpoint. For low-stakes APIs (analytics ingestion, cache warm-ups), that overhead might not be justified. For high-stakes APIs (payments, inventory commitments, anything that triggers an irreversible side effect), the overhead is justified the first time it prevents a double-charge in production. The cost of one debugging session for a customer who saw their card charged twice almost always exceeds the engineering cost of doing this right the first time, which is why every payment API I have seen, including ones not run by Stripe, has converged on roughly this pattern. The shape is not unique to one company. It is the shape this problem has when you take it seriously.
