Part XII — String Algorithms

Strings show up everywhere. They drive search engines, power DNA sequencing pipelines, fuel log parsing systems, and anchor the text editors you use every day. Yet many engineers treat strings as “solved” territory — slap on a .contains() call and move on. That mindset crumbles the moment an interviewer asks: “Find all occurrences of a pattern in a 10-million-character text. Now do it fast.”

Why String Algorithms Deserve Your Attention

Consider the simplest possible approach to pattern matching: slide a pattern of length m across a text of length n, comparing character by character. When a mismatch occurs, shift the pattern one position forward and restart the comparison. This naive strategy runs in O(n × m) time — and for inputs where n and m are both large (a genome sequence at 3 billion characters, for instance), that cost becomes catastrophic.

The core issue is redundant work. When the naive algorithm encounters a mismatch, it throws away everything it already learned about the characters it matched. It rewinds the text pointer and starts fresh. Every character in the text gets examined over and over.

Efficient string algorithms eliminate this waste. They extract structural information from the pattern before the search begins, then use that information to skip impossible alignments during matching. The result: algorithms that scan the text exactly once, achieving O(n + m) time regardless of input.

What This Part Covers

Knuth-Morris-Pratt (KMP) — the gold standard of pattern matching algorithms for interviews. KMP builds a failure function (also called a partial match table) that encodes the pattern’s internal structure. When a mismatch occurs, the failure function tells the algorithm exactly where to resume — no backtracking in the text, ever. You’ll build the failure function from scratch, implement the full matching algorithm, and understand why it guarantees O(n + m) performance.

Sieve of Eratosthenes — a shift from strings to numbers, but the algorithmic thinking carries over. The sieve finds all prime numbers up to n by systematically eliminating composites, achieving O(n log log n) time. You’ll implement the basic sieve, optimize it with a BitSet for memory efficiency, and extend it to a segmented sieve that handles ranges too large to fit in memory. Prime-related problems appear frequently in interviews disguised as counting, factorization, or number theory questions.

The Common Thread

Both KMP and the Sieve of Eratosthenes share a crucial design principle: precompute structure to avoid redundant work. KMP precomputes the failure function so the matcher never re-examines a character. The sieve precomputes primality so factorization becomes a table lookup. Recognizing this pattern — invest upfront to save repeatedly — is one of the most transferable skills you’ll develop for interviews.

Turn the page and start building these algorithms piece by piece.