System Design Article

Cache Invalidation Strategies & Consistency

Difficulty: Medium

There are only two hard problems in computer science: cache invalidation, naming things, and off-by-one errors. This lesson tackles the first one. We cover TTL-based, write-driven, and event-driven invalidation; the canonical race conditions (lost-update, double-write inconsistency, stale-after-failover); the consistency models a cache can offer; and the patterns that real systems (Facebook, Stripe, AWS) use to keep cached data trustworthy. By the end you can pick an invalidation strategy, defend it under interviewer pressure, and explain exactly why your cache will not silently serve yesterday's data.

Cache Invalidation Strategies & Consistency

System Design

Medium

caching

cache-invalidation

consistency

ttl

distributed-systems

race-conditions

system-design

intermediate

premium

578 views

What 'Invalidation' Actually Means

A cache holds a copy. The source of truth (database, upstream API, file system) holds the original. The moment the original changes, the copy is wrong. Invalidation is the act of removing or refreshing that copy so future reads see the new value.

There are only three things you can do when the source changes:

Delete the cached entry. Next read misses, repopulates from the source.
Overwrite the cached entry with the new value.
Wait for the TTL to expire and let the entry rot until then.

Every production caching system uses some combination of these three. The interesting design decisions are when to trigger which, and how to make the triggers reliable across failures and concurrent writers.

The Three Invalidation Modes

Mode 1: TTL-based (passive)

Every cached entry has an expiration. After it passes, the entry is treated as a miss. No coordination with the source is required.

Pros: zero coupling; cache and database know nothing about each other. Bounded staleness: data is at most TTL seconds old.

Cons: until the TTL expires, every reader sees the stale value. For a profile photo with TTL 5 minutes, the user sees their old picture for 5 minutes after upload - unacceptable.

When alone is fine: data that ages naturally and where stale-by-TTL is tolerable (top stories of the hour, weather, exchange rates with daily granularity).

Mode 2: Write-driven (active)

Whenever the source is updated, the application explicitly invalidates the cache key.

Text

---------- Write-driven invalidation ----------
   client                                                   
     |                                                      
     v                                                      
  [ app server ] - 1. UPDATE ----> [ database ]            
                <- 2. ack          
  [ app server ] - 3. DEL key ---> [ cache ]               
                <- 4. ack                                   
     |                                                      
     v                                                      
   return success

Pros: cache becomes consistent within milliseconds of a write. Bounded staleness window is the time between steps 1 and 3.

Cons: requires every code path that writes to also invalidate. A new endpoint that forgets to invalidate silently serves stale data. The race conditions in step 1-3 are the source of most cache bugs (covered below).

Mode 3: Event-driven (reactive)

The application does not invalidate directly. Instead, the database emits change events (via change-data-capture or transactional outbox), and a worker consumes them and invalidates the cache.

Text

---------- Event-driven invalidation ----------
  client - write --> [ app ] - write --> [ database ]
                                                |
                                                | (CDC stream: Debezium, MySQL binlog)
                                                v
                                          [ event bus (Kafka) ]
                                                |
                                                v
                                          [ invalidator worker ]
                                                |
                                                v
                                          [ cache ] DEL key

Pros: invalidation is decoupled from application code. Schema changes (new fields, new tables) automatically propagate. Works across language boundaries (a Java service writes; a Python worker invalidates a Redis cluster used by Go services).

Cons: more moving parts; CDC infrastructure (Debezium, Maxwell) must be operated. Higher latency between write and invalidation (typically 100 ms to 1 s).

When alone is fine: large microservice architectures where many services share a cache and you cannot trust every team to invalidate manually.

The Canonical Race: Update-Then-Invalidate

The most common cache pattern is 'cache-aside read, update database then delete cache'. There are exactly four orderings for this code path under concurrent writes, and one of them silently produces stale data forever.

Consider two clients writing different values for the same key, with two operations each (DB write, cache delete):

Text

---------- Lost-update race ----------
T0  Client A: UPDATE users SET name='Alice'   --> DB
T1  Client B: UPDATE users SET name='Bob'     --> DB     (DB now has 'Bob')
T2  Client B: DEL user:42                     --> cache  (cache now empty)
T3  Reader R: GET user:42 --> miss            -> SELECT --> 'Bob' --> SET cache='Bob'  (cache has 'Bob', correct)
T4  Client A: DEL user:42                     --> cache  (cache empty again)
T5  Reader R: GET user:42 --> miss            -> SELECT --> 'Bob' --> SET cache='Bob'  (still correct)

The DEL-after-UPDATE pattern is safe because deletes are idempotent. The next reader fetches from the source.

Now consider the broken pattern - 'update database, then SET cache':

Text

---------- The bug: update-then-SET ----------
T0  Client A: UPDATE users SET name='Alice'   --> DB
T1  Client B: UPDATE users SET name='Bob'     --> DB     (DB now has 'Bob')
T2  Client B: SET user:42 = 'Bob'             --> cache  (cache is 'Bob', correct)
T3  Client A: SET user:42 = 'Alice'           --> cache  (cache is 'Alice', WRONG!)

The cache now permanently disagrees with the database. Until the TTL expires (or someone manually deletes the key), every reader sees 'Alice' even though the source says 'Bob'.

The rule: on a write, always DELETE the cache, never SET it. Let the next reader repopulate.

A second race: read populates an old value

Even with DELETE on write, there is a subtler race:

Text

---------- Read-populates-stale race ----------
T0  Reader R:  GET user:42 --> miss
T1  Reader R:  SELECT user where id=42 --> 'old' (from leader or replica)
T2  Client W:  UPDATE users SET name='new' --> DB
T3  Client W:  DEL user:42 --> cache (was already empty)
T4  Reader R:  SET user:42 = 'old'         --> cache now holds STALE 'old'

The writer's invalidation came between the reader's miss and the reader's populate. The cache is now stale until the next write or TTL.

Mitigations:

Short TTL on populated entries so the staleness is bounded (typically 5 to 60 seconds for sensitive data).
Versioned reads: the reader includes a 'last-seen DB version' (LSN, oplog timestamp) and only writes the cache if the version is fresh. Compare-and-set in Redis (WATCH/MULTI/EXEC).
Read-your-writes routing: writers tag their session and read from the database for a short window after a write to verify the invalidation took.

DELETE vs SET vs Versioned Keys

Three primitives, three trade-offs.

Primitive	When to use	Why
`DEL key`	Almost always on writes	Idempotent; concurrent invalidations can't reorder badly. Forces re-fetch from source.
`SET key value`	Only when the writer is the single source of the new value AND ordering is guaranteed (rare)	Bug-prone under concurrent writes. Saves one read on populate.
Versioned key (`user:42:v17`)	Rapidly-changing data where stale entries must be unreachable	Each write produces a new key; old keys age out via TTL; readers always look up `current_version`. Costs an extra lookup but bypasses the race entirely.

The versioned-key pattern is what GitHub uses for repository pages, what Wikipedia uses for article rendering, and what the AWS console uses for resource lists. It is the safest choice when you cannot afford even brief stale reads.

The Dual-Write Problem

A write must touch two systems: the database (source of truth) and the cache (or the event bus). What happens if the second write fails?

Text

---------- Dual-write failure ----------
  app - UPDATE DB --> ok
  app - DEL cache --> network timeout
  ?? cache now permanently stale, no retry, no error to caller

Naive code retries inline, but a process crash between the two calls leaves the cache stale forever.

Solution 1: Transactional Outbox

Write the database change and the invalidation event in the same transaction by writing the event to an outbox table in the same database. A separate poller reads the outbox and emits invalidation events to the cache (or to a queue). If the application crashes, the next poll cycle picks up the unsent event.

SQL

BEGIN;
UPDATE users SET name = 'Alice' WHERE id = 42;
INSERT INTO outbox (event_type, key, created_at)
    VALUES ('invalidate', 'user:42', NOW());
COMMIT;

A worker tails the outbox table (SELECT ... ORDER BY id WHERE sent = false LIMIT 100), publishes each event, and marks it sent. At-least-once delivery; consumers (cache invalidator) must be idempotent (which DEL already is).

Solution 2: Change Data Capture (CDC)

Let the database itself be the event source. Tools like Debezium tail the binary log (MySQL binlog, Postgres WAL, MongoDB oplog) and publish row-level change events to Kafka. The cache invalidator consumes these and deletes affected keys.

No application code changes needed; every UPDATE/INSERT/DELETE is captured automatically. The cost is operating Debezium and a Kafka cluster, but for organizations that already run them, CDC is the cleanest answer to the dual-write problem.

Consistency Models a Cache Can Offer

State the model, do not hope your interviewer infers it.

Model	Definition	When to claim it
Strong	Every read sees the latest committed write	Only achievable by reading from source on every request (i.e., no cache)
Read-your-writes	A client always sees its own most recent write	Route the writer's reads to the source for a short window after a write
Monotonic reads	A client never sees a value go backward in time	Pin a session to a single replica, or track per-session version
Eventual	Reads converge to the latest value within bounded time	Default for cache-aside with TTL invalidation
Bounded staleness	Reads are at most T seconds out of date	Combine event-driven invalidation with a TTL of T

Most caches sit at 'eventually consistent within ~100 ms of a write' or 'bounded staleness of 30 seconds via TTL'. Both are acceptable; the failure mode is when the team thinks they have strong consistency and they actually don't.

Implementing read-your-writes with a session marker

JavaScript

Python

// On write, stamp the session with a timestamp.
async function updateProfile(userId, patch, session) {
    await db.update('users', userId, patch);
    await cache.del(`user:${userId}`);
    session.lastWriteAt = Date.now();
}

// On read, if recent write, bypass the cache.
async function getProfile(userId, session) {
    const recentWrite = session.lastWriteAt && Date.now() - session.lastWriteAt < 5000;
    if (recentWrite) {
        return await db.findOne('users', userId);
    }
    return await cacheAsideGet(`user:${userId}`, () => db.findOne('users', userId));
}

Cross-Region Invalidation

With caches in multiple regions, an invalidation must reach every region. Two approaches.

Pub/sub bus

Every region subscribes to a global topic. On write, the writer publishes an invalidation event. Each region's subscriber processes it and deletes the local cache key.

Pros: simple; works with any cache. Latency: 100 to 500 ms cross-region. Cons: pub/sub bus is a single point of failure; partition between regions delays invalidation; need a dead-letter queue for missed messages.

Per-region CDC + replication of the bus

Each region runs its own CDC pipeline tailing the local replica of the database. The pipeline publishes invalidation events to a regional Kafka cluster. A cache invalidator in each region consumes only its own region's events.

Pros: no cross-region coupling on the hot path. Survives region partitions; each region invalidates from its local CDC stream once the database replica catches up. Cons: requires database replication (which you usually have anyway) and per-region CDC infrastructure.

At scale, the second pattern wins. It is what Stripe, Uber, and Airbnb use for their cross-region cache layers.

Real-World Examples

How real systems implement this in production

Facebook TAO (lease-on-miss)

TAO (Facebook's social graph cache) uses a lease mechanism to avoid the read-populates-stale race. When a reader misses, TAO grants a short-lived lease (a token) along with permission to populate. If a write invalidates the key while the lease is outstanding, the reader's SET is rejected at populate time. This eliminates the race without versioned keys.

Trade-off: At extreme scale, the cache itself becomes a coordinator, not just a key-value store.

Stripe idempotency & cross-region invalidation

Stripe caches account, customer, and product data aggressively. They use a write-through pattern with a short TTL plus a Kafka-based event bus that broadcasts changes across regions. Reads of a customer's own data are routed to the source for 30 seconds after any write to that customer.

Trade-off: Read-your-writes is the consistency model users actually notice, and it is cheap to implement at the routing layer.

Wikipedia + Varnish PURGE

Wikipedia caches every article render in Varnish. On article edit, MediaWiki sends an HTTP PURGE to every Varnish node in every datacenter. The fan-out is massive (hundreds of nodes globally) but the ops are tiny (one request per node).

Trade-off: Explicit per-node invalidation is feasible when each invalidation is cheap; do not prematurely build a complex pub/sub system if a simple loop will do.

AWS CloudFront CreateInvalidation

CloudFront does not allow per-key invalidation under a high TTL; instead, you submit an invalidation request via API, and AWS propagates it to every edge POP (typically within 10 to 60 seconds). The trade-off is explicit: cheap reads, expensive invalidations.

Trade-off: CDN invalidation is engineered to be infrequent; design content URLs (with hashes or version params) so that invalidation is rare.

Quick Interview Phrases

Key terms to use in your answer

DELETE on write

transactional outbox

change data capture

read-your-writes consistency

bounded staleness

versioned keys

Common Interview Questions

Questions you might be asked about this topic

Walk me through the race condition in 'update DB then update cache' and how you fix it.

Two writers can update the DB in order A then B (DB ends with B), then update the cache in order B then A (cache ends with A) - cache permanently disagrees with DB. Fix: always DELETE the cache key on a write, never SET. DEL is idempotent so reordering is harmless; the next reader repopulates from the source of truth. Mention also the 'read populates stale' race and mitigations (short TTL, versioned keys, compare-and-set).

Design cache invalidation for a product catalog used by web, mobile, and 3rd-party APIs.

Compare TTL-based, write-driven, and event-driven invalidation. When would you use each?

What is the dual-write problem and how do you solve it?

Your cache shows old data 30 minutes after a write, even though invalidation should be ~100 ms. Walk me through the debug.

Interview Tips

How to discuss this topic effectively

State the consistency model out loud. 'Eventually consistent within 100 ms via outbox-driven invalidation, bounded by 60-second TTL' is a complete answer that signals operational experience.

Always say DELETE, not SET, when invalidating. If the interviewer asks why, walk through the two-writer SET-reorder race - it is the question they actually wanted you to answer.

Mention the dual-write problem before they do. The moment you say 'invalidate the cache', follow with 'using a transactional outbox to make it atomic with the database write'.

For cross-region, default to per-region CDC over a global pub/sub bus. It is the answer used at Stripe, Uber, and Airbnb.

When asked about a cache bug, your first three checks should be: (1) was DEL called at all, (2) is there a read-populates-stale race, (3) is replication lag fooling the reader. Saying these in order shows you have debugged this in production.

Common Mistakes

Pitfalls to avoid in interviews

Using SET instead of DEL on write

SET is not safe under concurrent writers - two writers can SET in the wrong order and leave the cache permanently inconsistent with the database. Always DELETE the key on a write and let the next reader repopulate from the source. DEL is idempotent and immune to ordering.

Treating the invalidation as part of the request and ignoring failures

The application can crash between updating the database and invalidating the cache, leaving the cache stale forever. Use a transactional outbox or CDC so the invalidation is durably enqueued in the same transaction as the database write.

Forgetting the read-populates-stale race after invalidation

A reader that missed the cache before a writer's invalidation can populate the cache with the old value after the invalidation. Mitigate with a short TTL on populated entries, versioned keys, or compare-and-set semantics on the populate.

Claiming strong consistency for a cached system

Any cache with asynchronous invalidation is eventually consistent at best. Be explicit: 'eventually consistent within X ms in the happy path, bounded by TTL otherwise'. Never imply strong consistency unless you read from the source on every request.

Synchronous cross-region invalidation on the write path

Blocking a write until every region's cache acknowledges is fragile - a slow region or partition causes write timeouts. Decouple via a per-region CDC pipeline or a pub/sub bus with at-least-once delivery; the write returns as soon as the local store is durable.

Back to System Design