String Matching
string-matching
Data Structures
Suffix Array / Suffix Tree
Genome alignment tools like BWA and Bowtie find a 100-character read inside a 3-billion-base human reference in milliseconds, not by scanning the genome but by querying an index built once over all of its suffixes. **Suffix Array / Suffix Tree** is the family of data structures behind that workflow, and the same machinery powers full-text search, plagiarism detection, and the BWT-based compression in bzip2. This lesson covers the suffix array (a sorted array of starting positions of every suffix), the binary-search pattern match in `O(m log n)` for a pattern of length `m`, the LCP (Longest Common Prefix) array that captures shared prefixes between adjacent sorted suffixes, and the suite of substring problems the pair solves: longest repeated substring, number of distinct substrings, and longest common substring. You will also meet the suffix tree, a compressed trie of all suffixes that pushes pattern matching down to `O(m)` independent of text length, and weigh the suffix array's smaller memory footprint against the suffix tree's faster query. In **Arrays & Strings**, sorting and binary search worked over numbers; here the same primitives extend to strings, with a comparator that compares characters position by position. **Trie (Prefix Tree)** introduced the prefix-matching tree shape, and a suffix tree is a compressed trie built from every suffix of the input. Next, the curriculum explores data structures organized around a different axis entirely: preserving every historical version of themselves through time.
Not Started
0%
Algorithms
Pattern Matching Algorithms
Boyer-Moore is the algorithm `grep` actually uses, and it has a counterintuitive property: longer patterns make it _faster_, not slower, because the bad-character rule lets it skip whole chunks of the text without inspecting them. The best case is `O(n / m)`, sublinear in the text length. The same insight (start from the right, jump aggressively on mismatch) defines a different family of pattern-matching algorithms from the linear-scan family of KMP and Z. **Pattern Matching Algorithms** rounds out the string-search toolkit. You will implement Boyer-Moore with both the bad-character rule and the good-suffix rule, build a deterministic finite automaton (DFA) from a pattern and use the resulting state-transition table for `O(n)` matching after `O(m * |alphabet|)` preprocessing, and extend pattern search into two dimensions row by row. The lesson closes with a head-to-head comparison of every string-matching algorithm you have seen so far (KMP, Z, Rabin-Karp, Manacher, Aho-Corasick, Boyer-Moore, DFA), so you can pick the right tool for any matching task: long patterns, multi-pattern, streaming, or 2D. In **String Algorithms**, you mastered the failure-function family of linear-time matchers. This lesson adds the right-to-left family and the automaton-construction family. From here, **Advanced Greedy & Data Structures** turns to monotonic stacks.
Not Started
0%
String Algorithms
Naive substring search compares the pattern against every position in the text, restarting from scratch on each mismatch, for `O(n * m)` worst-case time. KMP, by precomputing where to resume after a mismatch, never restarts. The text pointer moves forward strictly monotonically, the pattern pointer is corrected by a failure function, and the whole search runs in `O(n + m)`. That single insight defines a family of linear-time string algorithms that power editors, search engines, and bioinformatics pipelines. **String Algorithms** covers that family. KMP teaches you to build the failure function (prefix function) and use it to skip redundant comparisons. The Z algorithm offers an alternative `O(n)` framework based on the Z-array. Rabin-Karp introduces rolling hashes for multi-pattern search and is the backbone of plagiarism detection. Manacher's algorithm finds the longest palindromic substring in linear time using a clever transform with separators. Aho-Corasick generalizes KMP to multiple patterns simultaneously by adding failure links to a trie, producing a finite automaton that scans the text once. In **Arrays & Strings**, you treated strings as arrays of characters with `O(1)` indexed access. **Hash Map (Dictionary) Basics** gave you the `O(1)` lookup used by Rabin-Karp. This lesson layers algorithmic structure on top of those primitives. Next, **Pattern Matching Algorithms** extends the toolkit with Boyer-Moore, DFA-based matching, and 2D pattern search.
Not Started
0%
Question Banks
String Pattern Matching
Six deeper prompts on KMP failure functions, Rabin-Karp rolling hashes, and the naive matcher's worst case. Compares preprocessing vs search cost across the three approaches.
Community
Tries: The Data Structure I Keep Rediscovering
When a hash set is wrong and a prefix tree is right: autocomplete, namespace routing, and fuzzy spell-check, with the memory and concurrency traps I keep falling into.
KMP and Rabin-Karp Without the Pain
The failure function explained as a memo, rolling hash explained as a sliding window, and the test-case patterns where each algorithm earns its keep.
