Algorithms
Pattern Matching Algorithms
Difficulty: Advanced
Boyer-Moore is the algorithm grep actually uses, and it has a counterintuitive property: longer patterns make it faster, not slower, because the bad-character rule lets it skip whole chunks of the text without inspecting them. The best...
Pattern Matching Algorithms
Boyer-Moore is the algorithm grep actually uses, and it has a counterintuitive property: longer patterns make it faster, not slower, because the bad-character rule lets it skip whole chunks of the text without inspecting them. The best case is O(n / m), sublinear in the text length. The same insight (start from the right, jump aggressively on mismatch) defines a different family of pattern-matching algorithms from the linear-scan family of KMP and Z.
Pattern Matching Algorithms rounds out the string-search toolkit. You will implement Boyer-Moore with both the bad-character rule and the good-suffix rule, build a deterministic finite automaton (DFA) from a pattern and use the resulting state-transition table for O(n) matching after O(m * |alphabet|) preprocessing, and extend pattern search into two dimensions row by row. The lesson closes with a head-to-head comparison of every string-matching algorithm you have seen so far (KMP, Z, Rabin-Karp, Manacher, Aho-Corasick, Boyer-Moore, DFA), so you can pick the right tool for any matching task: long patterns, multi-pattern, streaming, or 2D.
In String Algorithms, you mastered the failure-function family of linear-time matchers. This lesson adds the right-to-left family and the automaton-construction family.
From here, Advanced Greedy & Data Structures turns to monotonic stacks.
Prerequisites:
String AlgorithmsTopics:
What You'll Learn
By the end of this lesson, you will be able to:
Implement Boyer-Moore's bad character rule and explain how it achieves sublinear O(n/m) best-case matching.
Describe the good suffix rule and explain how it complements the bad character rule in Boyer-Moore.
Construct a DFA (deterministic finite automaton) for a given pattern and explain how it matches text in O(n) after O(m * |alphabet|) preprocessing.
Explain two-dimensional pattern matching using row-by-row 1D matching as a building block.
Compare Boyer-Moore, KMP, Z-algorithm, Rabin-Karp, and DFA-based matching across key criteria.
Why This Matters
01
Boyer-Moore is the fastest practical string search algorithm — its bad character rule skips large portions of text, achieving O(n/m) best case, which is why most text editors use it.
02
Finite automata formalize pattern matching as state machines, providing the theoretical foundation for regex engines and lexical analyzers in compilers.
03
Understanding the full spectrum of pattern matching algorithms lets you make informed choices: Boyer-Moore for long patterns, KMP for streaming, Rabin-Karp for multi-pattern.
04
Two-dimensional pattern matching extends 1D techniques to image processing and matrix search, demonstrating how to compose algorithms for higher-dimensional problems.
Key Terms
6 terms you'll encounter in this lesson
Bad Character Rule
In Boyer-Moore, when a mismatch occurs at text[i] and pattern[j], shift the pattern so the rightmost occurrence of text[i] in the pattern aligns with position i. If text[i] doesn't appear in the pattern, shift past the entire mismatch point.
Good Suffix Rule
In Boyer-Moore, when a mismatch occurs after matching a suffix S of the pattern, shift the pattern to align the next occurrence of S in the pattern (or a prefix matching a suffix of S) with the matched text.
DFA (Deterministic Finite Automaton)
A state machine with a fixed number of states, where each state has exactly one transition for each input character. For pattern matching, the DFA has m+1 states and accepts when reaching the final state.
State Transition Table
A 2D table delta[state][char] that defines the next state for each (state, character) pair in a DFA. For pattern matching, it encodes where to continue matching after seeing each possible character.
Sublinear Search
A search algorithm that examines fewer than n characters in the best case. Boyer-Moore achieves O(n/m) by skipping characters that cannot start a match, which is sublinear when m is large relative to n.
Two-Dimensional Pattern Matching
Finding all occurrences of a small matrix (pattern) within a larger matrix (text). Typically solved by matching rows individually with a 1D algorithm, then combining row matches across columns.
Heads Up
Common misconceptions to watch out for
"Boyer-Moore is always faster than KMP"
Boyer-Moore is faster in practice for typical text (natural language, long patterns) due to its sublinear best case. However, its worst case is O(nm) without the good suffix rule, while KMP guarantees O(n+m). For short patterns or adversarial inputs, KMP can be better.
"The bad character rule alone makes Boyer-Moore optimal"
The bad character rule alone gives O(nm) worst case. The good suffix rule is needed to guarantee O(n+m) worst case (though the full good suffix implementation is complex). In practice, the bad character rule handles most cases well.
"DFA-based matching is impractical due to memory"
A DFA for pattern matching uses O(m * |alphabet|) space, which is fine for small alphabets (DNA: 4 characters) or short patterns. For large alphabets, the bad character table (O(|alphabet|)) or KMP's failure function (O(m)) use less space.
