System Design Article

Caching Fundamentals (Write-Through, Write-Back, Write-Around)

Difficulty: Easy

A cache is a small, fast store that holds copies of data so the next request does not pay the cost of fetching it from the source of truth. This lesson covers what a cache is, where it lives in a stack, the four read and write patterns you will be asked about (cache-aside, read-through, write-through, write-back, write-around), eviction policies, and the failure modes (stampedes, hot keys, stale data) that bite real systems. By the end you can pick a caching strategy and defend it in an interview.

Caching Fundamentals (Write-Through, Write-Back, Write-Around)

System Design

Easy

caching

cache-aside

write-through

write-back

write-around

lru

ttl

performance

system-design

beginner

802 views

What is a Cache?

A cache is a smaller, faster copy of data placed close to whoever asks for it, so the next request can skip the expensive trip to the source of truth.

Three numbers explain why caches exist at all (Jeff Dean's classic latency table, rounded for memory):

Operation	Latency
L1 CPU cache reference	1 ns
Main memory reference	100 ns
Read 1 KB from local SSD	150 us
Round trip in same datacenter	500 us
Read 1 MB from disk	1 ms
Cross-region round trip	50 to 150 ms

Each jump is 100 to 1000 times slower than the previous one. Caching is the act of remembering data at one level so future reads do not have to drop down to the next.

Hit, miss, eviction, TTL

Four words you must know cold:

Cache hit: the requested key is in the cache; the slow source is not touched.
Cache miss: the key is absent; the system must fetch from the source and (usually) populate the cache.
Eviction: a key is removed to make room for newer ones (driven by an eviction policy).
TTL (time-to-live): an expiration timestamp. Once it passes, the entry is treated as a miss even if it is still in memory.

The single most important metric for any cache is hit rate: hits divided by (hits + misses). A 95% hit rate means only 5 of every 100 requests reach the database. A 50% hit rate means your cache is barely helping and may even be hurting (extra hop + double the writes).

How a Cache Sits in a Request

A typical read against a cache-aside cache (the most common pattern):

Text

---------- Cache-aside read flow ----------
   client                                                      
     |                                                         
     v                                                         
  [ app server ] - 1. GET key --> [ cache (Redis) ]           
                  <-- 2a. hit ----- (return value, done)       
                  <-- 2b. miss ----                            
                                                               
  [ app server ] - 3. SELECT --> [ database (Postgres) ]      
                  <-- 4. row -----                             
  [ app server ] - 5. SET key --> [ cache (Redis) ]           
     |                                                         
     v                                                         
  return to client

Notice the two important details:

The application code (not the cache) decides when to read from and write to the cache.
On a miss, the application is responsible for populating the cache.

Where Caches Live

A real system stacks several caches, each closer to the user than the last. Understanding the stack matters because each layer has different invalidation, capacity, and consistency rules.

Text

---------- Cache hierarchy ----------
   user device
     | 
     v
  [ browser cache ]      ETag / Cache-Control headers, ~hundreds of MB
     |
     v
  [ CDN edge cache ]     Cloudflare / Fastly / CloudFront, ~hundreds of GB per POP
     |
     v
  [ reverse proxy cache ] NGINX / Varnish, in your datacenter
     |
     v
  [ application cache ]  in-process LRU map, microsecond access
     |
     v
  [ remote cache ]       Redis / Memcached cluster, single-digit ms
     |
     v
  [ database cache ]     Postgres shared_buffers, MySQL InnoDB buffer pool
     |
     v
  [ disk / source of truth ]

A request that misses every layer pays the full cost. The art is to make as few requests as possible reach the bottom.

Read Patterns

There are two ways application code can interact with a cache for reads.

1. Cache-Aside (Lazy Loading)

The application is in charge. It checks the cache first; on a miss, it reads from the database and writes the result back into the cache.

JavaScript

Python

async function getUser(userId) {
    const key = `user:${userId}`;
    let user = await redis.get(key);
    if (user) return JSON.parse(user);

    user = await db.query('SELECT * FROM users WHERE id = $1', [userId]);
    if (user) {
        await redis.set(key, JSON.stringify(user), 'EX', 300); // 5 min TTL
    }
    return user;
}

Pros: simple, only caches data that is actually requested, resilient to cache failures (a down cache means slower reads, not broken reads). Cons: each new key takes a full database round trip on first access (cold cache problem); the application carries the caching logic.

This is the default pattern for most web apps and the pattern Memcached and Redis are usually used with.

2. Read-Through

The cache is in charge. The application asks the cache for a key; if missing, the cache itself fetches from the database and stores the result. The application never talks to the database directly for cached data.

Text

---------- Read-through ----------
  app - GET key --> [ cache library ]
                       | (on miss)
                       v
                     [ database ]
                       |
                       v
  app <-- value ---- [ cache library ] (now populated)

Pros: application code is clean, no boilerplate cache lookups. Cons: requires a cache library that knows how to talk to your database (Hibernate L2, AWS DAX for DynamoDB, Apollo Client for GraphQL). A cache outage is also a database-access outage because the app does not know how to bypass it.

Write Patterns

The interesting design decisions live on the write side. There are three names every interviewer expects you to know.

Write-Through

Every write goes to both the cache and the database synchronously. The write is acknowledged only after both succeed.

Text

---------- Write-through ----------
  client - write --> [ cache ] - write --> [ database ]
                          |                       |
                          v                       v
                       (both updated; ack only after both succeed)

Pros: the cache is always consistent with the database. Subsequent reads always hit fresh data. Cons: writes are slower because they pay two round trips. Unused data is cached unnecessarily (every write populates the cache, even for keys that will never be read again).

Use when: read-heavy workload over data that is updated occasionally and read often. Examples: user profile fields, product catalog, configuration.

Write-Back (also called Write-Behind)

The write goes to the cache only; the cache acknowledges immediately and asynchronously flushes the change to the database in the background.

Text

---------- Write-back ----------
  client - write --> [ cache ] (ack immediately)
                          | (async batch flush)
                          v
                      [ database ]

Pros: very fast writes; the cache batches updates so the database sees fewer, larger writes. Great for write-heavy workloads (analytics counters, view counts, leaderboards). Cons: data loss risk - if the cache crashes before the flush, recent writes are gone. The cache is now part of your durability story; you need replication or a write-ahead log on the cache itself.

Use when: high write throughput, some data loss is acceptable, or the cache is durable (Redis with AOF persistence, for example).

Write-Around

Writes skip the cache and go straight to the database. The cache is populated only when a subsequent read misses.

Text

---------- Write-around ----------
  client - write --> [ database ]
                          (cache untouched on write)
  client - read  --> [ cache ] (miss)
                          |
                          v
                      [ database ] - value --> populate cache

Pros: avoids polluting the cache with write-once data (logs, audit events, sensor readings). Cons: the write-then-immediate-read pattern always misses the cache. May be combined with a short TTL to avoid serving an old cached copy.

Use when: write-heavy data that is rarely re-read. Examples: append-only logs, audit trails, IoT sensor batches.

Decision matrix

Pattern	Latency	Consistency	Risk	When to use
Cache-aside	Slow first read, fast after	App-controlled, can serve stale	Cold-cache misses on every new key	Default for most web apps
Read-through	Same as cache-aside, hidden	Same as cache-aside	Cache outage breaks reads	When cache library can fetch from DB
Write-through	Slower writes	Strong (cache always fresh)	Wastes space on never-read keys	Read-heavy data that changes occasionally
Write-back	Fastest writes	Eventual	Data loss if cache crashes	Write-heavy counters, analytics
Write-around	Normal writes	Risk of stale read after write	None unique	Write-once data rarely read

Eviction Policies

A cache has finite memory. When it fills up, it must evict something to make room. The choice of policy directly drives hit rate.

LRU (Least Recently Used): evict the entry that was accessed longest ago. The default in Redis (allkeys-lru), Memcached, OS page caches, browsers. Works well for skewed workloads where some keys are read repeatedly.
LFU (Least Frequently Used): evict the entry with the fewest accesses. Better than LRU for keys with seasonal popularity (e.g., a flash sale item that should stay cached even if it was just read minutes ago). Available as allkeys-lfu in Redis 4+.
FIFO (First In, First Out): evict the oldest insertion regardless of recent access. Simple but rarely the best fit; mostly seen in queue-like caches.
TTL-based (expiration): not strictly an eviction policy but combined with the above. Redis EXPIRE key 300 says 'this entry self-deletes after 5 minutes'.
Random: evict a random key. Surprisingly competitive when paired with TTL; cheap to implement.

Rule of thumb: start with LRU + a TTL. Move to LFU only if profiling shows certain keys keep getting evicted before their next access.

Failure Modes (the part interviewers love)

Cache Stampede (Thundering Herd)

A popular cache entry expires. A thousand requests arrive in the same second, all miss, all hammer the database with the same query. The database falls over, the cache stays empty, the system collapses.

Text

---------- Cache stampede ----------
  T0   key 'top-products' expires in cache
  T0   1000 concurrent requests arrive
  T0   1000 misses --> 1000 SELECTs against the same row
  T0+1 database overloaded, all requests time out

Mitigations (use one or more):

Locking / single-flight: only one request is allowed to recompute the value; the rest wait for it. Built into Go's singleflight, easy in Redis with SETNX.
Stale-while-revalidate: serve the expired value to most callers while one background worker refreshes it. Used by browsers, NGINX, and the stale-while-revalidate HTTP header.
Probabilistic early expiration: each requester refreshes the value with small probability before the TTL expires, so refreshes are spread out instead of synchronized.
Pre-warm: refresh the entry before it expires (a cron job that recomputes the homepage every minute).

Hot Keys

One key receives a huge fraction of traffic (the homepage of a viral video, the product page of a flash sale). A single cache shard becomes a bottleneck.

Mitigations:

Replicate the hot key across multiple cache nodes; clients pick a replica with a consistent hash on a salt.
Local in-process cache in front of the remote cache for the top-N keys, with a short TTL (10 to 60 seconds).
Read-through CDN for content that can be served from the edge.

Stale Reads

With write-back and TTL-based caches, the cache lags the source. A user updates their profile picture and refreshes; the old picture appears for the next 5 minutes.

Mitigations:

Invalidate on write: after updating the database, delete the key from the cache (redis.del('user:42')). On the next read, the cache repopulates.
Write-through for user-visible mutable data; trade write latency for freshness.
Versioned keys: include a version or timestamp in the key (user:42:v17); writing increments the version, old keys age out via TTL.

Real-World Examples

How real systems implement this in production

Facebook TAO + Memcached

Facebook's social graph runs on TAO, a distributed cache that fronts MySQL. Reads come close to 100% from cache; writes go through TAO (write-through to the cache) and to MySQL. TAO serves more than a billion reads per second across thousands of servers, with a hit rate above 99%.

Trade-off: At extreme read-heaviness, the cache becomes the system and the database becomes a backup of the cache, not the other way around.

Twitter timeline cache

Twitter caches the most recent ~800 tweets per user's home timeline in Redis. On read, Redis is hit first; on miss, the timeline is recomputed by fan-out from the user's follow graph. New tweets are pushed (write-through) into the timeline caches of online followers, but for high-follower-count celebrities the system falls back to read-time merging to avoid a write fan-out storm.

Trade-off: Choose your write pattern based on the cardinality of fan-out.

Cloudflare CDN edge cache

When you visit a site behind Cloudflare, the response can be cached at the nearest edge POP and served to subsequent visitors without ever reaching the origin. Cache-Control and ETag headers tell the edge how long to keep an entry and how to revalidate. Origin shielding adds a tier-2 cache so multiple POPs share a refresh request, mitigating stampedes against the origin.

Trade-off: HTTP caching is a real, programmable cache that you should treat as the first line of defense.

Netflix EVCache

Netflix runs EVCache (a distributed Memcached) as a cache-aside layer in front of Cassandra. Each region has its own EVCache cluster, replicated across availability zones. Total: trillions of operations per day, ~30 ms p99 across regions.

Trade-off: Caches in microservice architectures are usually regional; the cost of cross-region cache invalidation often exceeds the benefit of consistency.

Quick Interview Phrases

Key terms to use in your answer

cache hit rate

cache-aside pattern

write-through vs write-back

TTL and eviction

cache stampede

stale-while-revalidate

Common Interview Questions

Questions you might be asked about this topic

Compare write-through, write-back, and write-around. When would you use each?

Write-through: every write goes to cache and database synchronously. Use for read-heavy data that changes occasionally (user profiles, configuration). Write-back: write to cache only, async flush to database. Use for write-heavy counters, analytics, leaderboards where the cache itself can be made durable. Write-around: writes skip the cache, only populated on read miss. Use for write-once data rarely re-read (logs, audit events). Mention the durability trade-off for write-back and the cold-cache penalty for write-around.

Explain cache-aside vs read-through. Which would you use and why?

Walk me through how you would size a Redis cluster for a 10K-QPS user-profile workload.

Your cache hit rate just dropped from 98% to 60% after a deploy. How do you debug it?

How do you handle cache invalidation when a user updates their profile?

Interview Tips

How to discuss this topic effectively

Always state the read pattern AND the write pattern in the same sentence: 'cache-aside reads with write-through invalidation on the user record'. Saying both signals you have actually shipped a cached system.

Quote a hit rate target. 'I would aim for above 95% hit rate; below that, the extra hop hurts more than it helps' is the kind of number senior engineers throw out without thinking.

Bring up cache stampede before the interviewer does. The moment you mention TTL, mention single-flight locking or stale-while-revalidate as the mitigation. Stampede is a favorite follow-up question.

Pick LRU as your default eviction policy and explain when you would switch to LFU (seasonal hot keys, e.g., a flash sale item). Naming the policy by acronym is a quick credibility win.

For any caching answer, end with 'and we would invalidate by deleting the key on write'. Invalidation is what separates a real design from a textbook answer.

Common Mistakes

Pitfalls to avoid in interviews

Treating cache and database writes as a single atomic operation

Write-through and cache invalidation are NOT atomic across the cache and the database. Always update the database first and then invalidate or update the cache. If you do it the other way, a failed database write leaves the cache holding fictional data.

Picking write-back for user-visible mutable data

Write-back optimizes for write throughput and accepts data loss if the cache crashes. For data the user will see immediately (profile updates, comments), use write-through or cache-aside with explicit invalidation so reads after writes are correct.

Setting a long TTL and forgetting about invalidation

TTL is a safety net, not a freshness strategy. Active invalidation on write keeps the cache correct; TTL only bounds how long an undetected stale entry can live. Combine both: short TTL for safety, deletes for correctness.

Caching everything by default

A low-hit-rate key wastes memory and pays a network round trip on every read for nothing. Profile first; cache only the keys with concentrated read traffic. A 30% hit rate cache is often slower than no cache at all.

Ignoring cache stampedes until they happen in production

The first time a popular cache entry expires under load, your database melts. Build single-flight locking or stale-while-revalidate into the cache layer from day one - it costs almost nothing to add and is painful to retrofit.

Back to System Design