System Design Article
Cache Invalidation Strategies & Consistency
Difficulty: Medium
There are only two hard problems in computer science: cache invalidation, naming things, and off-by-one errors. This lesson tackles the first one. We cover TTL-based, write-driven, and event-driven invalidation; the canonical race conditions (lost-update, double-write inconsistency, stale-after-failover); the consistency models a cache can offer; and the patterns that real systems (Facebook, Stripe, AWS) use to keep cached data trustworthy. By the end you can pick an invalidation strategy, defend it under interviewer pressure, and explain exactly why your cache will not silently serve yesterday's data.
Cache Invalidation Strategies & Consistency
There are only two hard problems in computer science: cache invalidation, naming things, and off-by-one errors. This lesson tackles the first one. We cover TTL-based, write-driven, and event-driven invalidation; the canonical race conditions (lost-update, double-write inconsistency, stale-after-failover); the consistency models a cache can offer; and the patterns that real systems (Facebook, Stripe, AWS) use to keep cached data trustworthy. By the end you can pick an invalidation strategy, defend it under interviewer pressure, and explain exactly why your cache will not silently serve yesterday's data.
578 views
12
What 'Invalidation' Actually Means
A cache holds a copy. The source of truth (database, upstream API, file system) holds the original. The moment the original changes, the copy is wrong. Invalidation is the act of removing or refreshing that copy so future reads see the new value.
There are only three things you can do when the source changes:
- Delete the cached entry. Next read misses, repopulates from the source.
- Overwrite the cached entry with the new value.
- Wait for the TTL to expire and let the entry rot until then.
Every production caching system uses some combination of these three. The interesting design decisions are when to trigger which, and how to make the triggers reliable across failures and concurrent writers.
The Three Invalidation Modes
Mode 1: TTL-based (passive)
Every cached entry has an expiration. After it passes, the entry is treated as a miss. No coordination with the source is required.
Pros: zero coupling; cache and database know nothing about each other. Bounded staleness: data is at most TTL seconds old.
Cons: until the TTL expires, every reader sees the stale value. For a profile photo with TTL 5 minutes, the user sees their old picture for 5 minutes after upload - unacceptable.
When alone is fine: data that ages naturally and where stale-by-TTL is tolerable (top stories of the hour, weather, exchange rates with daily granularity).
Mode 2: Write-driven (active)
Whenever the source is updated, the application explicitly invalidates the cache key.
---------- Write-driven invalidation ----------
client
|
v
[ app server ] - 1. UPDATE ----> [ database ]
<- 2. ack
[ app server ] - 3. DEL key ---> [ cache ]
<- 4. ack
|
v
return successPros: cache becomes consistent within milliseconds of a write. Bounded staleness window is the time between steps 1 and 3.
Cons: requires every code path that writes to also invalidate. A new endpoint that forgets to invalidate silently serves stale data. The race conditions in step 1-3 are the source of most cache bugs (covered below).
Mode 3: Event-driven (reactive)
The application does not invalidate directly. Instead, the database emits change events (via change-data-capture or transactional outbox), and a worker consumes them and invalidates the cache.
---------- Event-driven invalidation ----------
client - write --> [ app ] - write --> [ database ]
|
| (CDC stream: Debezium, MySQL binlog)
v
[ event bus (Kafka) ]
|
v
[ invalidator worker ]
|
v
[ cache ] DEL keyPros: invalidation is decoupled from application code. Schema changes (new fields, new tables) automatically propagate. Works across language boundaries (a Java service writes; a Python worker invalidates a Redis cluster used by Go services).
Cons: more moving parts; CDC infrastructure (Debezium, Maxwell) must be operated. Higher latency between write and invalidation (typically 100 ms to 1 s).
When alone is fine: large microservice architectures where many services share a cache and you cannot trust every team to invalidate manually.
The Canonical Race: Update-Then-Invalidate
The most common cache pattern is 'cache-aside read, update database then delete cache'. There are exactly four orderings for this code path under concurrent writes, and one of them silently produces stale data forever.
Consider two clients writing different values for the same key, with two operations each (DB write, cache delete):
---------- Lost-update race ----------
T0 Client A: UPDATE users SET name='Alice' --> DB
T1 Client B: UPDATE users SET name='Bob' --> DB (DB now has 'Bob')
T2 Client B: DEL user:42 --> cache (cache now empty)
T3 Reader R: GET user:42 --> miss -> SELECT --> 'Bob' --> SET cache='Bob' (cache has 'Bob', correct)
T4 Client A: DEL user:42 --> cache (cache empty again)
T5 Reader R: GET user:42 --> miss -> SELECT --> 'Bob' --> SET cache='Bob' (still correct)The DEL-after-UPDATE pattern is safe because deletes are idempotent. The next reader fetches from the source.
Now consider the broken pattern - 'update database, then SET cache':
---------- The bug: update-then-SET ----------
T0 Client A: UPDATE users SET name='Alice' --> DB
T1 Client B: UPDATE users SET name='Bob' --> DB (DB now has 'Bob')
T2 Client B: SET user:42 = 'Bob' --> cache (cache is 'Bob', correct)
T3 Client A: SET user:42 = 'Alice' --> cache (cache is 'Alice', WRONG!)The cache now permanently disagrees with the database. Until the TTL expires (or someone manually deletes the key), every reader sees 'Alice' even though the source says 'Bob'.
The rule: on a write, always DELETE the cache, never SET it. Let the next reader repopulate.
A second race: read populates an old value
Even with DELETE on write, there is a subtler race:
---------- Read-populates-stale race ----------
T0 Reader R: GET user:42 --> miss
T1 Reader R: SELECT user where id=42 --> 'old' (from leader or replica)
T2 Client W: UPDATE users SET name='new' --> DB
T3 Client W: DEL user:42 --> cache (was already empty)
T4 Reader R: SET user:42 = 'old' --> cache now holds STALE 'old'The writer's invalidation came between the reader's miss and the reader's populate. The cache is now stale until the next write or TTL.
Mitigations:
- Short TTL on populated entries so the staleness is bounded (typically 5 to 60 seconds for sensitive data).
- Versioned reads: the reader includes a 'last-seen DB version' (LSN, oplog timestamp) and only writes the cache if the version is fresh. Compare-and-set in Redis (
WATCH/MULTI/EXEC). - Read-your-writes routing: writers tag their session and read from the database for a short window after a write to verify the invalidation took.
DELETE vs SET vs Versioned Keys
Three primitives, three trade-offs.
| Primitive | When to use | Why |
|---|---|---|
DEL key | Almost always on writes | Idempotent; concurrent invalidations can't reorder badly. Forces re-fetch from source. |
SET key value | Only when the writer is the single source of the new value AND ordering is guaranteed (rare) | Bug-prone under concurrent writes. Saves one read on populate. |
Versioned key (user:42:v17) | Rapidly-changing data where stale entries must be unreachable | Each write produces a new key; old keys age out via TTL; readers always look up current_version. Costs an extra lookup but bypasses the race entirely. |
The versioned-key pattern is what GitHub uses for repository pages, what Wikipedia uses for article rendering, and what the AWS console uses for resource lists. It is the safest choice when you cannot afford even brief stale reads.
The Dual-Write Problem
A write must touch two systems: the database (source of truth) and the cache (or the event bus). What happens if the second write fails?
---------- Dual-write failure ----------
app - UPDATE DB --> ok
app - DEL cache --> network timeout
?? cache now permanently stale, no retry, no error to callerNaive code retries inline, but a process crash between the two calls leaves the cache stale forever.
Solution 1: Transactional Outbox
Write the database change and the invalidation event in the same transaction by writing the event to an outbox table in the same database. A separate poller reads the outbox and emits invalidation events to the cache (or to a queue). If the application crashes, the next poll cycle picks up the unsent event.
BEGIN;
UPDATE users SET name = 'Alice' WHERE id = 42;
INSERT INTO outbox (event_type, key, created_at)
VALUES ('invalidate', 'user:42', NOW());
COMMIT;A worker tails the outbox table (SELECT ... ORDER BY id WHERE sent = false LIMIT 100), publishes each event, and marks it sent. At-least-once delivery; consumers (cache invalidator) must be idempotent (which DEL already is).
Solution 2: Change Data Capture (CDC)
Let the database itself be the event source. Tools like Debezium tail the binary log (MySQL binlog, Postgres WAL, MongoDB oplog) and publish row-level change events to Kafka. The cache invalidator consumes these and deletes affected keys.
No application code changes needed; every UPDATE/INSERT/DELETE is captured automatically. The cost is operating Debezium and a Kafka cluster, but for organizations that already run them, CDC is the cleanest answer to the dual-write problem.
Consistency Models a Cache Can Offer
State the model, do not hope your interviewer infers it.
| Model | Definition | When to claim it |
|---|---|---|
| Strong | Every read sees the latest committed write | Only achievable by reading from source on every request (i.e., no cache) |
| Read-your-writes | A client always sees its own most recent write | Route the writer's reads to the source for a short window after a write |
| Monotonic reads | A client never sees a value go backward in time | Pin a session to a single replica, or track per-session version |
| Eventual | Reads converge to the latest value within bounded time | Default for cache-aside with TTL invalidation |
| Bounded staleness | Reads are at most T seconds out of date | Combine event-driven invalidation with a TTL of T |
Most caches sit at 'eventually consistent within ~100 ms of a write' or 'bounded staleness of 30 seconds via TTL'. Both are acceptable; the failure mode is when the team thinks they have strong consistency and they actually don't.
Implementing read-your-writes with a session marker
// On write, stamp the session with a timestamp.
async function updateProfile(userId, patch, session) {
await db.update('users', userId, patch);
await cache.del(`user:${userId}`);
session.lastWriteAt = Date.now();
}
// On read, if recent write, bypass the cache.
async function getProfile(userId, session) {
const recentWrite = session.lastWriteAt && Date.now() - session.lastWriteAt < 5000;
if (recentWrite) {
return await db.findOne('users', userId);
}
return await cacheAsideGet(`user:${userId}`, () => db.findOne('users', userId));
}Cross-Region Invalidation
With caches in multiple regions, an invalidation must reach every region. Two approaches.
Pub/sub bus
Every region subscribes to a global topic. On write, the writer publishes an invalidation event. Each region's subscriber processes it and deletes the local cache key.
Pros: simple; works with any cache. Latency: 100 to 500 ms cross-region. Cons: pub/sub bus is a single point of failure; partition between regions delays invalidation; need a dead-letter queue for missed messages.
Per-region CDC + replication of the bus
Each region runs its own CDC pipeline tailing the local replica of the database. The pipeline publishes invalidation events to a regional Kafka cluster. A cache invalidator in each region consumes only its own region's events.
Pros: no cross-region coupling on the hot path. Survives region partitions; each region invalidates from its local CDC stream once the database replica catches up. Cons: requires database replication (which you usually have anyway) and per-region CDC infrastructure.
At scale, the second pattern wins. It is what Stripe, Uber, and Airbnb use for their cross-region cache layers.
Real-World Examples
How real systems implement this in production
TAO (Facebook's social graph cache) uses a lease mechanism to avoid the read-populates-stale race. When a reader misses, TAO grants a short-lived lease (a token) along with permission to populate. If a write invalidates the key while the lease is outstanding, the reader's SET is rejected at populate time. This eliminates the race without versioned keys.
Trade-off: At extreme scale, the cache itself becomes a coordinator, not just a key-value store.
Stripe caches account, customer, and product data aggressively. They use a write-through pattern with a short TTL plus a Kafka-based event bus that broadcasts changes across regions. Reads of a customer's own data are routed to the source for 30 seconds after any write to that customer.
Trade-off: Read-your-writes is the consistency model users actually notice, and it is cheap to implement at the routing layer.
Wikipedia caches every article render in Varnish. On article edit, MediaWiki sends an HTTP PURGE to every Varnish node in every datacenter. The fan-out is massive (hundreds of nodes globally) but the ops are tiny (one request per node).
Trade-off: Explicit per-node invalidation is feasible when each invalidation is cheap; do not prematurely build a complex pub/sub system if a simple loop will do.
CloudFront does not allow per-key invalidation under a high TTL; instead, you submit an invalidation request via API, and AWS propagates it to every edge POP (typically within 10 to 60 seconds). The trade-off is explicit: cheap reads, expensive invalidations.
Trade-off: CDN invalidation is engineered to be infrequent; design content URLs (with hashes or version params) so that invalidation is rare.
Quick Interview Phrases
Key terms to use in your answer
Common Interview Questions
Questions you might be asked about this topic
Two writers can update the DB in order A then B (DB ends with B), then update the cache in order B then A (cache ends with A) - cache permanently disagrees with DB. Fix: always DELETE the cache key on a write, never SET. DEL is idempotent so reordering is harmless; the next reader repopulates from the source of truth. Mention also the 'read populates stale' race and mitigations (short TTL, versioned keys, compare-and-set).
Cache-aside in regional Redis clusters. Writes update Postgres and insert into an outbox table in the same transaction. An outbox poller publishes invalidation events to Kafka; each region's invalidator consumes and deletes affected keys. TTL of 5 minutes as a safety net. Hot products get an in-process LRU. For external API consumers, use HTTP Cache-Control headers with `must-revalidate` and ETag-based revalidation. Mention the consistency model: eventually consistent within ~100 ms typical, ~5 min worst case.
TTL: zero coupling, bounded staleness, but every reader sees stale data until expiry. Use for naturally-aging data (top stories, weather). Write-driven: app explicitly DELs on write, fast invalidation but easy to forget. Use in monoliths or simple service architectures. Event-driven (CDC or pub/sub): decoupled invalidation, schema-agnostic, but more infra. Use in large microservice architectures where many services share a cache. Real systems usually combine: write-driven for the happy path, TTL as a safety net, CDC as the long-term answer for cross-team consistency.
A single logical operation must update two systems (database + cache, or database + queue). Without coordination, the second update can fail (network, crash) and leave the systems inconsistent with no error to the caller. Solutions: (1) Transactional outbox - write the second action as a row in the same DB transaction; a worker publishes it later, retried until success. (2) CDC - skip the explicit second write entirely; tail the DB log and let a downstream consumer derive the cache invalidation. Both convert dual-write into single-write-plus-async-delivery.
First, was DEL called? Check application logs and Redis SLOWLOG. Second, did the DEL hit the right shard? Check cluster topology and key routing. Third, is there a read-populates-stale race - look for high read concurrency around the write timestamp. Fourth, is the outbox poller backed up - check the outbox table size. Fifth, cross-region: did the invalidation propagate to the region serving the stale read? Most often it is the application path - someone added a new write endpoint and forgot to invalidate. Long-term fix: switch to CDC so invalidation cannot be forgotten.
Interview Tips
How to discuss this topic effectively
State the consistency model out loud. 'Eventually consistent within 100 ms via outbox-driven invalidation, bounded by 60-second TTL' is a complete answer that signals operational experience.
Always say DELETE, not SET, when invalidating. If the interviewer asks why, walk through the two-writer SET-reorder race - it is the question they actually wanted you to answer.
Mention the dual-write problem before they do. The moment you say 'invalidate the cache', follow with 'using a transactional outbox to make it atomic with the database write'.
For cross-region, default to per-region CDC over a global pub/sub bus. It is the answer used at Stripe, Uber, and Airbnb.
When asked about a cache bug, your first three checks should be: (1) was DEL called at all, (2) is there a read-populates-stale race, (3) is replication lag fooling the reader. Saying these in order shows you have debugged this in production.
Common Mistakes
Pitfalls to avoid in interviews
Using SET instead of DEL on write
SET is not safe under concurrent writers - two writers can SET in the wrong order and leave the cache permanently inconsistent with the database. Always DELETE the key on a write and let the next reader repopulate from the source. DEL is idempotent and immune to ordering.
Treating the invalidation as part of the request and ignoring failures
The application can crash between updating the database and invalidating the cache, leaving the cache stale forever. Use a transactional outbox or CDC so the invalidation is durably enqueued in the same transaction as the database write.
Forgetting the read-populates-stale race after invalidation
A reader that missed the cache before a writer's invalidation can populate the cache with the old value after the invalidation. Mitigate with a short TTL on populated entries, versioned keys, or compare-and-set semantics on the populate.
Claiming strong consistency for a cached system
Any cache with asynchronous invalidation is eventually consistent at best. Be explicit: 'eventually consistent within X ms in the happy path, bounded by TTL otherwise'. Never imply strong consistency unless you read from the source on every request.
Synchronous cross-region invalidation on the write path
Blocking a write until every region's cache acknowledges is fragile - a slow region or partition causes write timeouts. Decouple via a per-region CDC pipeline or a pub/sub bus with at-least-once delivery; the write returns as soon as the local store is durable.
