That is twenty-something lines of Python, and it covers insert, exact lookup, and prefix lookup. It is enough to ship 80% of trie use cases that come up in real backend code, and the remaining 20% is small variations: storing a payload at end-of-word, reference-counting for delete, compressing single-child paths for memory, walking with a Levenshtein row for fuzzy match. Before reading that code carefully, look at the picture it draws.
Reading a trie: what the path actually means
A trie is a tree where each edge is labeled with one character, and a path from the root spells out a string. Insert four words, car, cart, cat, and dog, and the structure looks like this.
Each * marks a node where is_end is true: that path spells a complete stored word. The path to the unstarred r represents the prefix car, but car is also a stored word, so that node has its own *. The path continues past it to cart, also marked. cat shares the same prefix ca as car and cart, then branches at the t. dog lives in its own subtree because no other word starts with d.
The structural rule is that all words sharing a prefix share the same path from the root through that prefix. The cost of looking up a word of length k is O(k), regardless of how many words are in the trie. A hash map gets O(1) average for lookup but charges O(k) to hash the key first, and it cannot answer the question "give me everything starting with car" without scanning every key.
The end-of-word marker is the part the picture makes obvious. Skip it and the structure cannot tell car (a stored word) from car (a prefix of cart and not stored). Every interesting trie variant keeps that bit somewhere.
Hash sets cannot answer prefix questions
The reason to know this structure is not that it beats a hash set on lookup. It does not, on average. A hash set answers "is carpet in the set" in O(1) average; a trie answers it in O(k). For pure containment, the hash set is shorter to write, smaller in memory per entry, and faster.
The reason to know the trie is the questions a hash set cannot answer at all. Are there any words starting with car? A hash set has to scan every entry, which is O(n). A trie walks k edges and looks at the resulting node, which is O(k). Give me every word starting with car. Same answer: the trie walks to the node at the end of the prefix, then DFS the subtree, which is O(k + r) for r results. What is the longest stored prefix of carpenter? The trie walks character by character, remembering the last is_end it passed; the hash set has to enumerate prefixes and check each one.
There is also a memory story, and it cuts both ways. A million 8-character random hex strings have no shared structure, and a trie pays per-node overhead that the hash set does not. Memory loses. A million URL paths, dotted-namespace flag keys, or English words have heavy prefix sharing, and the trie can be smaller than the underlying string list because shared prefixes are stored exactly once. Memory wins, sometimes by a factor of three or four.
The decision rule that has worked for me: if the only operation is exact containment, use a hash set. If the operation list ever includes the word "prefix", reach for the trie. That rule has not pointed wrong yet.
Where prefix matching beats the alternatives
Three concrete shapes where the trie is the answer, in roughly the order of how often they show up in real backend code.
Hierarchical key routing. A feature flag service that grows to a few thousand distinct flag-key patterns with hierarchical namespaces like payments.fraud.us.high-risk-merchants needs to answer "for an incoming key, find the most specific stored pattern that is a prefix of it, return its rules". A list-of-strings loop on roughly 8000 patterns is about 12ms p95, which is bearable for a config tool and unbearable on every API call. A trie keyed on dot-separated tokens (one token per edge, not one character) brings the same query to roughly 0.4ms p95. The structural rewrite pays for itself in a quarter. URL routers in HTTP frameworks do the same thing with path segments instead of dotted tokens; the data structure is identical.
Autocomplete with a top-K precomputed at every node. A search box that fires on every keystroke needs suggestions in well under 100ms total round-trip, which leaves a server roughly 10ms to find the top suggestions. A trie keyed on product name, with a small heap of "top suggestions" stored at each node and updated at insert time, lets the lookup walk be O(k) for the typed prefix length plus a constant for reading the precomputed top-K. The whole catalog fits in memory; per-keystroke server cost stays flat as the catalog grows. The trick is the precomputation: a vanilla trie scan would be O(k + r) where r is the number of words under the prefix, and on a popular short prefix that is too many words.
Fuzzy matching, which is where the trie is irreplaceable. Pure containment is easy, but suggesting fixes ("did you mean: receive?") needs Levenshtein-distance walking. The standard approach maintains a row of edit-distance values per node during a DFS, pruning subtrees whose minimum possible distance already exceeds the threshold. The trie is essential here because subtree pruning eliminates millions of dictionary words with a single comparison; a linear scan over a 100K-word list is millions of distance computations, while the trie walk with pruning is usually under 10K. This is the only way to get sub-100ms typo-tolerant suggestions on a real dictionary in pure Python without dropping to a native extension.
There is also a more exotic case: IP routing. The Linux kernel routes packets through an LC-trie (level-compressed trie), not a plain trie or a Patricia trie, since version 2.6.13. The same prefix-matching insight powers it: for an incoming destination address, find the longest stored network prefix that matches. Naming the structure differently does not change the shape; it is still a tree where the path encodes the key.
Three trie bugs I have shipped
The end-of-word marker is mandatory, and the bug is silent when you forget. A feature flag rule for payments.fraud was matching payments.fraud_audit in production for two days before anyone noticed, because the token-trie at that level had no is_end check and treated every walked node as a stored pattern. The fix was three lines. The bug is the kind code review will not catch unless reviewers already know to look for it, because a trie that does not distinguish "the prefix ends here" from "the path continues" still passes most unit tests.
Delete leaks memory unless the cleanup walks back up. Removing a word means clearing its is_end and then deleting any nodes on its path that have no children and no is_end. Skip the cleanup pass and the trie grows monotonically forever. One production trie I shipped leaked memory for two months because the delete path was implemented as just node.is_end = False. The catch in code review: any delete that is shorter than insert, in a tree structure, deserves a second look. Reference counting is the alternative, and it is also more code.
Concurrent insert is a footgun on a tree of nested dicts. Two threads that both observe a missing child for the same character can both create a new node and clobber each other's writes. The atomic-dict-op intuition that works for a hash map does not transfer; the trie has nested mutation that the language runtime does not protect. The fix is a lock around the whole insert path, a copy-on-write trie that swaps a new root atomically, or rebuild-on-update if the rule set changes infrequently. The flag-routing trie described earlier was rebuilt from a config push roughly every hour, which made rebuild-on-update the cleanest answer for that workload.
Where the trie loses to a hash set
Reaching for the trie is over-engineering in three cases worth naming.
When the keys are random or near-random (UUIDs, hex digests, hashed identifiers), there is no prefix structure to exploit; a trie pays per-node overhead for nothing and loses on both memory and speed compared to a hash set. When the only operation is exact containment and prefix queries are not on the roadmap, the hash set is shorter and faster. When the dataset is small (under a few thousand keys), the constant factor of trie traversal can be larger than the cost of just looping a sorted list, especially in dynamic languages where dict access is already heavily optimized.
The recognition rule is the inverse of the earlier one. If "prefix" never appears in the requirement, use a hash set. If "longest matching prefix" or "all things starting with" appears, the trie is the right shape, and the simple twenty-line version (no Patricia compression, no LC-trie tricks) handles the vast majority of catalog sizes that show up in real backend work. The compressed variants exist for the cases where the simple version's memory is genuinely too high, and that bar is higher than people assume. Crossing it requires measuring first; reaching for it by reflex is how a routing layer ends up taking three weeks to write instead of three days, with no measurable performance gain at the end.
