System Design Article
Caching Fundamentals (Write-Through, Write-Back, Write-Around)
Difficulty: Easy
A cache is a small, fast store that holds copies of data so the next request does not pay the cost of fetching it from the source of truth. This lesson covers what a cache is, where it lives in a stack, the four read and write patterns you will be asked about (cache-aside, read-through, write-through, write-back, write-around), eviction policies, and the failure modes (stampedes, hot keys, stale data) that bite real systems. By the end you can pick a caching strategy and defend it in an interview.
Caching Fundamentals (Write-Through, Write-Back, Write-Around)
A cache is a small, fast store that holds copies of data so the next request does not pay the cost of fetching it from the source of truth. This lesson covers what a cache is, where it lives in a stack, the four read and write patterns you will be asked about (cache-aside, read-through, write-through, write-back, write-around), eviction policies, and the failure modes (stampedes, hot keys, stale data) that bite real systems. By the end you can pick a caching strategy and defend it in an interview.
802 views
11
What is a Cache?
A cache is a smaller, faster copy of data placed close to whoever asks for it, so the next request can skip the expensive trip to the source of truth.
Three numbers explain why caches exist at all (Jeff Dean's classic latency table, rounded for memory):
| Operation | Latency |
|---|---|
| L1 CPU cache reference | 1 ns |
| Main memory reference | 100 ns |
| Read 1 KB from local SSD | 150 us |
| Round trip in same datacenter | 500 us |
| Read 1 MB from disk | 1 ms |
| Cross-region round trip | 50 to 150 ms |
Each jump is 100 to 1000 times slower than the previous one. Caching is the act of remembering data at one level so future reads do not have to drop down to the next.
Hit, miss, eviction, TTL
Four words you must know cold:
- Cache hit: the requested key is in the cache; the slow source is not touched.
- Cache miss: the key is absent; the system must fetch from the source and (usually) populate the cache.
- Eviction: a key is removed to make room for newer ones (driven by an eviction policy).
- TTL (time-to-live): an expiration timestamp. Once it passes, the entry is treated as a miss even if it is still in memory.
The single most important metric for any cache is hit rate: hits divided by (hits + misses). A 95% hit rate means only 5 of every 100 requests reach the database. A 50% hit rate means your cache is barely helping and may even be hurting (extra hop + double the writes).
How a Cache Sits in a Request
A typical read against a cache-aside cache (the most common pattern):
---------- Cache-aside read flow ----------
client
|
v
[ app server ] - 1. GET key --> [ cache (Redis) ]
<-- 2a. hit ----- (return value, done)
<-- 2b. miss ----
[ app server ] - 3. SELECT --> [ database (Postgres) ]
<-- 4. row -----
[ app server ] - 5. SET key --> [ cache (Redis) ]
|
v
return to clientNotice the two important details:
- The application code (not the cache) decides when to read from and write to the cache.
- On a miss, the application is responsible for populating the cache.
Where Caches Live
A real system stacks several caches, each closer to the user than the last. Understanding the stack matters because each layer has different invalidation, capacity, and consistency rules.
---------- Cache hierarchy ----------
user device
|
v
[ browser cache ] ETag / Cache-Control headers, ~hundreds of MB
|
v
[ CDN edge cache ] Cloudflare / Fastly / CloudFront, ~hundreds of GB per POP
|
v
[ reverse proxy cache ] NGINX / Varnish, in your datacenter
|
v
[ application cache ] in-process LRU map, microsecond access
|
v
[ remote cache ] Redis / Memcached cluster, single-digit ms
|
v
[ database cache ] Postgres shared_buffers, MySQL InnoDB buffer pool
|
v
[ disk / source of truth ]A request that misses every layer pays the full cost. The art is to make as few requests as possible reach the bottom.
Read Patterns
There are two ways application code can interact with a cache for reads.
1. Cache-Aside (Lazy Loading)
The application is in charge. It checks the cache first; on a miss, it reads from the database and writes the result back into the cache.
async function getUser(userId) {
const key = `user:${userId}`;
let user = await redis.get(key);
if (user) return JSON.parse(user);
user = await db.query('SELECT * FROM users WHERE id = $1', [userId]);
if (user) {
await redis.set(key, JSON.stringify(user), 'EX', 300); // 5 min TTL
}
return user;
}Pros: simple, only caches data that is actually requested, resilient to cache failures (a down cache means slower reads, not broken reads). Cons: each new key takes a full database round trip on first access (cold cache problem); the application carries the caching logic.
This is the default pattern for most web apps and the pattern Memcached and Redis are usually used with.
2. Read-Through
The cache is in charge. The application asks the cache for a key; if missing, the cache itself fetches from the database and stores the result. The application never talks to the database directly for cached data.
---------- Read-through ----------
app - GET key --> [ cache library ]
| (on miss)
v
[ database ]
|
v
app <-- value ---- [ cache library ] (now populated)Pros: application code is clean, no boilerplate cache lookups. Cons: requires a cache library that knows how to talk to your database (Hibernate L2, AWS DAX for DynamoDB, Apollo Client for GraphQL). A cache outage is also a database-access outage because the app does not know how to bypass it.
Write Patterns
The interesting design decisions live on the write side. There are three names every interviewer expects you to know.
Write-Through
Every write goes to both the cache and the database synchronously. The write is acknowledged only after both succeed.
---------- Write-through ----------
client - write --> [ cache ] - write --> [ database ]
| |
v v
(both updated; ack only after both succeed)Pros: the cache is always consistent with the database. Subsequent reads always hit fresh data. Cons: writes are slower because they pay two round trips. Unused data is cached unnecessarily (every write populates the cache, even for keys that will never be read again).
Use when: read-heavy workload over data that is updated occasionally and read often. Examples: user profile fields, product catalog, configuration.
Write-Back (also called Write-Behind)
The write goes to the cache only; the cache acknowledges immediately and asynchronously flushes the change to the database in the background.
---------- Write-back ----------
client - write --> [ cache ] (ack immediately)
| (async batch flush)
v
[ database ]Pros: very fast writes; the cache batches updates so the database sees fewer, larger writes. Great for write-heavy workloads (analytics counters, view counts, leaderboards). Cons: data loss risk - if the cache crashes before the flush, recent writes are gone. The cache is now part of your durability story; you need replication or a write-ahead log on the cache itself.
Use when: high write throughput, some data loss is acceptable, or the cache is durable (Redis with AOF persistence, for example).
Write-Around
Writes skip the cache and go straight to the database. The cache is populated only when a subsequent read misses.
---------- Write-around ----------
client - write --> [ database ]
(cache untouched on write)
client - read --> [ cache ] (miss)
|
v
[ database ] - value --> populate cachePros: avoids polluting the cache with write-once data (logs, audit events, sensor readings). Cons: the write-then-immediate-read pattern always misses the cache. May be combined with a short TTL to avoid serving an old cached copy.
Use when: write-heavy data that is rarely re-read. Examples: append-only logs, audit trails, IoT sensor batches.
Decision matrix
| Pattern | Latency | Consistency | Risk | When to use |
|---|---|---|---|---|
| Cache-aside | Slow first read, fast after | App-controlled, can serve stale | Cold-cache misses on every new key | Default for most web apps |
| Read-through | Same as cache-aside, hidden | Same as cache-aside | Cache outage breaks reads | When cache library can fetch from DB |
| Write-through | Slower writes | Strong (cache always fresh) | Wastes space on never-read keys | Read-heavy data that changes occasionally |
| Write-back | Fastest writes | Eventual | Data loss if cache crashes | Write-heavy counters, analytics |
| Write-around | Normal writes | Risk of stale read after write | None unique | Write-once data rarely read |
Eviction Policies
A cache has finite memory. When it fills up, it must evict something to make room. The choice of policy directly drives hit rate.
- LRU (Least Recently Used): evict the entry that was accessed longest ago. The default in Redis (
allkeys-lru), Memcached, OS page caches, browsers. Works well for skewed workloads where some keys are read repeatedly. - LFU (Least Frequently Used): evict the entry with the fewest accesses. Better than LRU for keys with seasonal popularity (e.g., a flash sale item that should stay cached even if it was just read minutes ago). Available as
allkeys-lfuin Redis 4+. - FIFO (First In, First Out): evict the oldest insertion regardless of recent access. Simple but rarely the best fit; mostly seen in queue-like caches.
- TTL-based (expiration): not strictly an eviction policy but combined with the above. Redis
EXPIRE key 300says 'this entry self-deletes after 5 minutes'. - Random: evict a random key. Surprisingly competitive when paired with TTL; cheap to implement.
Rule of thumb: start with LRU + a TTL. Move to LFU only if profiling shows certain keys keep getting evicted before their next access.
Failure Modes (the part interviewers love)
Cache Stampede (Thundering Herd)
A popular cache entry expires. A thousand requests arrive in the same second, all miss, all hammer the database with the same query. The database falls over, the cache stays empty, the system collapses.
---------- Cache stampede ----------
T0 key 'top-products' expires in cache
T0 1000 concurrent requests arrive
T0 1000 misses --> 1000 SELECTs against the same row
T0+1 database overloaded, all requests time outMitigations (use one or more):
- Locking / single-flight: only one request is allowed to recompute the value; the rest wait for it. Built into Go's
singleflight, easy in Redis withSETNX. - Stale-while-revalidate: serve the expired value to most callers while one background worker refreshes it. Used by browsers, NGINX, and the
stale-while-revalidateHTTP header. - Probabilistic early expiration: each requester refreshes the value with small probability before the TTL expires, so refreshes are spread out instead of synchronized.
- Pre-warm: refresh the entry before it expires (a cron job that recomputes the homepage every minute).
Hot Keys
One key receives a huge fraction of traffic (the homepage of a viral video, the product page of a flash sale). A single cache shard becomes a bottleneck.
Mitigations:
- Replicate the hot key across multiple cache nodes; clients pick a replica with a consistent hash on a salt.
- Local in-process cache in front of the remote cache for the top-N keys, with a short TTL (10 to 60 seconds).
- Read-through CDN for content that can be served from the edge.
Stale Reads
With write-back and TTL-based caches, the cache lags the source. A user updates their profile picture and refreshes; the old picture appears for the next 5 minutes.
Mitigations:
- Invalidate on write: after updating the database, delete the key from the cache (
redis.del('user:42')). On the next read, the cache repopulates. - Write-through for user-visible mutable data; trade write latency for freshness.
- Versioned keys: include a version or timestamp in the key (
user:42:v17); writing increments the version, old keys age out via TTL.
Real-World Examples
How real systems implement this in production
Facebook's social graph runs on TAO, a distributed cache that fronts MySQL. Reads come close to 100% from cache; writes go through TAO (write-through to the cache) and to MySQL. TAO serves more than a billion reads per second across thousands of servers, with a hit rate above 99%.
Trade-off: At extreme read-heaviness, the cache becomes the system and the database becomes a backup of the cache, not the other way around.
Twitter caches the most recent ~800 tweets per user's home timeline in Redis. On read, Redis is hit first; on miss, the timeline is recomputed by fan-out from the user's follow graph. New tweets are pushed (write-through) into the timeline caches of online followers, but for high-follower-count celebrities the system falls back to read-time merging to avoid a write fan-out storm.
Trade-off: Choose your write pattern based on the cardinality of fan-out.
When you visit a site behind Cloudflare, the response can be cached at the nearest edge POP and served to subsequent visitors without ever reaching the origin. Cache-Control and ETag headers tell the edge how long to keep an entry and how to revalidate. Origin shielding adds a tier-2 cache so multiple POPs share a refresh request, mitigating stampedes against the origin.
Trade-off: HTTP caching is a real, programmable cache that you should treat as the first line of defense.
Netflix runs EVCache (a distributed Memcached) as a cache-aside layer in front of Cassandra. Each region has its own EVCache cluster, replicated across availability zones. Total: trillions of operations per day, ~30 ms p99 across regions.
Trade-off: Caches in microservice architectures are usually regional; the cost of cross-region cache invalidation often exceeds the benefit of consistency.
Quick Interview Phrases
Key terms to use in your answer
Common Interview Questions
Questions you might be asked about this topic
Write-through: every write goes to cache and database synchronously. Use for read-heavy data that changes occasionally (user profiles, configuration). Write-back: write to cache only, async flush to database. Use for write-heavy counters, analytics, leaderboards where the cache itself can be made durable. Write-around: writes skip the cache, only populated on read miss. Use for write-once data rarely re-read (logs, audit events). Mention the durability trade-off for write-back and the cold-cache penalty for write-around.
Cache-aside: application code checks cache first, populates on miss. Read-through: cache library handles the miss and fetches from the source; application never sees the database directly. Cache-aside is more common because it is resilient to cache failures (just slower reads) and works with any data source. Read-through is cleaner but couples the cache library to the database. For most web apps, cache-aside is the default.
Estimate working set: say 10M active users x 1 KB per profile = 10 GB. Add a 2x headroom for index/expiration overhead -> 20 GB. Pick replication factor 2 for availability -> 40 GB total. Choose 2 to 4 shards (Redis is single-threaded, so multiple shards parallelize CPU). At 10K QPS read with 95% hit rate, that is 9.5K cache hits + 500 database queries per second - well within both. Mention monitoring: hit rate, eviction count, p99 latency, memory usage.
First check whether the cache key changed - a refactor often renames keys and invalidates the entire cache. Second, check whether the workload changed: a feature that scans many users instead of repeated lookups will tank the hit rate. Third, check eviction count: a memory increase or a TTL change can push out hot keys. Fourth, check for a cache flush or restart in deploy logs. Fix is usually deploy-related: revert, fix the key naming, or pre-warm the cache before reopening traffic.
Two-step pattern: (1) UPDATE the database row; (2) DELETE the cache key (not SET) so the next read repopulates with the fresh data. Why DELETE not SET? Because two concurrent updates can SET in the wrong order, leaving the cache with the older write. DELETE is idempotent. For multi-region setups, publish an invalidation event so every regional cache deletes the key. Add a short TTL (60s) as a safety net in case the invalidation message is lost.
Interview Tips
How to discuss this topic effectively
Always state the read pattern AND the write pattern in the same sentence: 'cache-aside reads with write-through invalidation on the user record'. Saying both signals you have actually shipped a cached system.
Quote a hit rate target. 'I would aim for above 95% hit rate; below that, the extra hop hurts more than it helps' is the kind of number senior engineers throw out without thinking.
Bring up cache stampede before the interviewer does. The moment you mention TTL, mention single-flight locking or stale-while-revalidate as the mitigation. Stampede is a favorite follow-up question.
Pick LRU as your default eviction policy and explain when you would switch to LFU (seasonal hot keys, e.g., a flash sale item). Naming the policy by acronym is a quick credibility win.
For any caching answer, end with 'and we would invalidate by deleting the key on write'. Invalidation is what separates a real design from a textbook answer.
Common Mistakes
Pitfalls to avoid in interviews
Treating cache and database writes as a single atomic operation
Write-through and cache invalidation are NOT atomic across the cache and the database. Always update the database first and then invalidate or update the cache. If you do it the other way, a failed database write leaves the cache holding fictional data.
Picking write-back for user-visible mutable data
Write-back optimizes for write throughput and accepts data loss if the cache crashes. For data the user will see immediately (profile updates, comments), use write-through or cache-aside with explicit invalidation so reads after writes are correct.
Setting a long TTL and forgetting about invalidation
TTL is a safety net, not a freshness strategy. Active invalidation on write keeps the cache correct; TTL only bounds how long an undetected stale entry can live. Combine both: short TTL for safety, deletes for correctness.
Caching everything by default
A low-hit-rate key wastes memory and pays a network round trip on every read for nothing. Profile first; cache only the keys with concentrated read traffic. A 30% hit rate cache is often slower than no cache at all.
Ignoring cache stampedes until they happen in production
The first time a popular cache entry expires under load, your database melts. Build single-flight locking or stale-while-revalidate into the cache layer from day one - it costs almost nothing to add and is painful to retrofit.
