Community Article

LRU Cache From Scratch in 50 Lines

A working LRU cache built from a hash map plus a doubly linked list, the part interview answers leave out (concurrency, eviction callbacks, byte budgets), and when LRU is the wrong eviction policy.

LRU Cache From Scratch in 50 Lines

A working LRU cache built from a hash map plus a doubly linked list, the part interview answers leave out (concurrency, eviction callbacks, byte budgets), and when LRU is the wrong eviction policy.

lru-cache
doubly-linked-list
hash-map
cache-invalidation
data-structures
vikramokoro

By @vikramokoro

January 30, 2026

·

Updated May 20, 2026

346 views

1

4.3 (14)

The LRU cache is the canonical "two data structures, one operation" problem. It shows up in interviews, in standard libraries, and in roughly every backend system that has ever cached anything. The interview version usually constrains you to O(1) get and O(1) put. The production version usually constrains you to "do not let the cache eat all the memory" and "do not corrupt anything under concurrent access". Both versions, at their core, are the same fifty lines of code. Let me actually write those fifty lines instead of waving my hands at them.

The contract

An LRU (Least Recently Used) cache holds at most capacity key-value pairs. When the cache is at capacity and a new key is inserted, the least recently used key is evicted. "Used" means accessed by get or written by put. The two operations are:

  • get(key): return the value, or a sentinel if the key is absent. Counts as a use.
  • put(key, value): insert or overwrite. Counts as a use. May trigger eviction.

Both must run in O(1) time. That last constraint is what forces the data structure choice.

Why the obvious answer is too slow

The first instinct is "use a hash map keyed by key, where the value is the cached value plus a timestamp". On put, when the cache is full, walk the map and evict the entry with the oldest timestamp. That works, but the eviction step is O(n) in the cache size, which means put is O(n) in the worst case. For a cache hit serving a CDN at 100K QPS, O(n) per write at n = 10000 is the difference between "fits in the budget" and "page on the on-call".

The second instinct is "use a sorted structure keyed by access time". A balanced BST gives O(log n) updates, which is closer but still not what we want. The interview asks for O(1), and there is a reason: it is achievable, and the structural insight is genuinely useful.

The trick is to combine two structures. A hash map gives you O(1) lookup from key to a node. A doubly-linked list gives you O(1) move-to-front and O(1) evict-tail. Together, every cache operation reduces to: hash-map lookup to find the node, then constant-cost list manipulation to update its position.

The implementation, in fifty lines

This is the version I have written four times across four different languages, and the structure is identical every time. I will use Python for clarity. The Java/JS versions are a near-mechanical translation.

class _Node:
    __slots__ = ("key", "value", "prev", "next")

    def __init__(self, key, value):
        self.key = key
        self.value = value
        self.prev = None
        self.next = None


class LRUCache:
    def __init__(self, capacity):
        if capacity <= 0:
            raise ValueError("capacity must be positive")
        self.capacity = capacity
        self.cache = {}                    # key -> _Node
        # Sentinels for head and tail simplify the list edge cases.
        self.head = _Node(None, None)      # most recently used end
        self.tail = _Node(None, None)      # least recently used end
        self.head.next = self.tail
        self.tail.prev = self.head

    def get(self, key):
        node = self.cache.get(key)
        if node is None:
            return None
        self._move_to_front(node)
        return node.value

    def put(self, key, value):
        node = self.cache.get(key)
        if node is not None:
            node.value = value
            self._move_to_front(node)
            return
        if len(self.cache) >= self.capacity:
            self._evict_lru()
        node = _Node(key, value)
        self.cache[key] = node
        self._add_to_front(node)

    # --- list helpers ---

    def _add_to_front(self, node):
        node.prev = self.head
        node.next = self.head.next
        self.head.next.prev = node
        self.head.next = node

    def _remove(self, node):
        node.prev.next = node.next
        node.next.prev = node.prev

    def _move_to_front(self, node):
        self._remove(node)
        self._add_to_front(node)

    def _evict_lru(self):
        lru = self.tail.prev
        self._remove(lru)
        del self.cache[lru.key]

That is 51 lines, give or take, including the blank ones. Cut the comments and the __slots__ line and you are at 47. The implementation is small enough to recite from memory, and large enough that the moving parts have to be right.

Why every line is there

The two sentinels (head and tail) are the trick that keeps the list helpers small. Without them, every _remove would need to check whether the node is the first or last in the list and special-case the pointer updates. With them, every real node always has a non-null prev and next, so the helpers are five lines of straight pointer surgery.

__slots__ = ("key", "value", "prev", "next") is a micro-optimization that tells Python not to create a per-instance __dict__ for _Node. For a million-entry cache, that saves enough memory to matter. In other languages, a plain struct or class is already the equivalent.

self.cache.get(key) is the only hash lookup in either operation. Everything after it is constant-cost pointer work. That is what makes both get and put O(1).

The key on the node (node.key) duplicates the key in the hash map, and that duplication is necessary. When we evict the LRU node, we need to know which key to remove from the hash map. Without the key on the node, we would have to scan the hash map to find the entry pointing to the evicted node, which would drag eviction back to O(n).

A worked trace

Walk through put(1, "a"), put(2, "b"), put(3, "c"), get(1), put(4, "d") on a capacity-3 cache.

After put(1, "a"):
  HEAD <-> [1, "a"] <-> TAIL
  cache = {1: node1}

After put(2, "b"):
  HEAD <-> [2, "b"] <-> [1, "a"] <-> TAIL
  cache = {1: node1, 2: node2}

After put(3, "c"):
  HEAD <-> [3, "c"] <-> [2, "b"] <-> [1, "a"] <-> TAIL
  cache = {1: node1, 2: node2, 3: node3}

After get(1) returns "a":
  HEAD <-> [1, "a"] <-> [3, "c"] <-> [2, "b"] <-> TAIL
  cache = {1: node1, 2: node2, 3: node3}
  (node 1 moved to the front; 2 is now the LRU)

After put(4, "d") (cache full, evicts key 2):
  HEAD <-> [4, "d"] <-> [1, "a"] <-> [3, "c"] <-> TAIL
  cache = {1: node1, 3: node3, 4: node4}

Each operation touched at most a handful of pointers and one hash entry. Nothing scaled with the cache size. The wall-clock cost per call is dominated by the hash function, not the list manipulation.

What the interview answer leaves out

Three things that I would never ship the textbook implementation without.

Concurrency. The version above is not thread-safe. A second thread reading self.cache while another thread is in the middle of _move_to_front will see partially updated pointers. The standard fixes are: (1) wrap every operation in a single mutex (simplest, but turns into a contention hot spot under load), (2) shard the cache into N independent LRU instances by hash(key) % N and lock each shard separately (Caffeine's approach in Java), or (3) accept that strict LRU under concurrency is expensive and use an approximation like CLOCK or W-TinyLFU. I have shipped option 2 several times; option 3 once, in a CDN edge cache where the throughput requirements made strict LRU impossible.

Eviction without callback is a bug. When you evict an entry, the caller often needs to know. If the cached object holds a file handle, a database connection, or any external resource, eviction must release it. The textbook version above leaks every external resource the caller put in. Production versions take a callback or fire an event on eviction.

Capacity by bytes, not by count. "Cache up to 10,000 entries" is rarely the actual constraint; "cache up to 256 MB" is. Per-entry size estimation is the part that gets ugly: in any garbage-collected language, true memory usage of a value is hard to know without sampling or tagging. The hack I have shipped is "estimate per-entry size at insertion time, sum it, evict when the running sum exceeds the budget". The estimation is approximate, and the bound is approximate, but it is closer to the actual constraint than counting entries.

When LRU is the wrong eviction policy

LRU sounds universal, and people reach for it by reflex. It is not always right.

For a cache where access patterns are roughly uniform (every key has the same probability of being requested), random eviction is just as good as LRU and significantly cheaper to implement. Memcached has supported pure LRU forever, but the slab allocator that backs it does eviction by walking a list and approximating LRU rather than tracking it strictly.

For a cache where a small "hot set" is hit repeatedly and a long tail of one-shot requests would otherwise pollute the cache, LRU is bad. A scan of a million unique keys evicts the entire hot set even though it will not be requested again. The fix is LFU (Least Frequently Used) or W-TinyLFU, which weighs frequency in addition to recency. Caffeine, the JVM-world LRU library, defaults to W-TinyLFU for this exact reason.

For a cache with TTL semantics, LRU is incomplete. Items have an expiration time, and you want to evict expired items even if they are recent. The right structure is LRU plus a separate timer wheel or expiration heap, and the eviction loop becomes "expire first, then LRU".

Build it, profile it, then decide

The takeaway I want to leave with is action-shaped: write the fifty-line version once, in your language of choice, from scratch. Then measure it. The measurement is the part most engineers skip, and it is the part that decides whether the textbook LRU is what you actually want to ship. On a synthetic benchmark with uniform random keys, the implementation above does about 5 million operations per second per thread on commodity hardware in Python, and roughly 20 million per second in JavaScript or Java. That is fast enough for almost every application cache I have ever built. The cases where it is not fast enough are the cases where you should reach for a library (Caffeine on the JVM, lru-cache on Node, cachetools in Python), and at that point you want to know what their default eviction policy is, what their concurrency story is, and how they handle byte-budgeted capacity. The fifty lines are not the destination. They are the lens that lets you read every other LRU implementation and understand exactly what trade-offs each one has made. Spend an afternoon writing it, and the next time you need a cache, you will know whether to roll your own or pick the right library.

Back to Articles