Regular Expression Matching
SummaryBuilds from recursive brute force to memoized top-down...
Builds from recursive brute force to memoized top-down...
Builds from recursive brute force to memoized top-down and bottom-up DP solutions, handling '.' (any single character) and '*' (zero or more of preceding) patterns.
Problem Statement
Given a string s and a pattern p, implement regular expression matching with support for two special characters:
'.'— matches any single character'*'— matches zero or more of the preceding element
The match must cover the entire input string, not just a partial substring.
Examples:
s | p | Result | Explanation |
|---|---|---|---|
"aa" | "a" | false | "a" does not match the entire string "aa" |
"aa" | "a*" | true | '*' means zero or more of 'a', so "aa" matches |
"ab" | ".*" | true | ".*" means zero or more of any character |
"aab" | "c*a*b" | true | "c*" matches zero 'c's, "a*" matches two 'a's, "b" matches 'b' |
"mississippi" | "mis*is*p*." | false | The final "p*." can only match one 'p' plus one char, but we need "ppi" |
Constraints:
1 <= s.length <= 201 <= p.length <= 20pcontains only lowercase letters,'.', and'*'- Each
'*'is guaranteed to have a valid preceding character
Understanding the Rules
The '*' operator is the crux of this problem. Unlike shell globbing where * means “match anything,” here '*' is a quantifier that modifies the character immediately before it:
"a*"→ matches"","a","aa","aaa", … (zero or more'a's)".*"→ matches"","x","xy","xyz", … (zero or more of any character)"ab*"→ matches"a","ab","abb","abbb", …
A '*' can never appear as the first character of a pattern. It always pairs with its predecessor to form a unit. When processing the pattern, we should think in terms of these units:
Pattern "c*a*b" breaks down as:
[c*] [a*] [b]
↓ ↓ ↓
unit unit unit
At each decision point, we ask: does the current pattern unit match the current character in the string? If the unit contains '*', we have a choice: consume a character or skip the unit entirely.
Approach 1: Recursive Brute Force — O(2^(m+n))
The recursive approach mirrors how we naturally reason about pattern matching. At each step we look at the current position in both the string and the pattern and decide what to do.
Key observations:
- If the next character in the pattern is
'*', we have two choices: skip the(char, *)pair entirely (zero occurrences), or if the current characters match, consume one character fromsand keep the pattern in place (allowing more matches). - If the next character is NOT
'*', the current characters must match (either same letter or'.'), and we advance both pointers by one.
boolean isMatch(String s, String p) {
return match(s, p, 0, 0);
}
boolean match(String s, String p, int i, int j) {
// Base case: pattern exhausted
if (j == p.length()) {
return i == s.length();
}
// Check if current characters match
boolean firstMatch = i < s.length()
&& (s.charAt(i) == p.charAt(j) || p.charAt(j) == '.');
// If next pattern char is '*', we have a choice
if (j + 1 < p.length() && p.charAt(j + 1) == '*') {
// Option 1: skip the "x*" pair (zero occurrences)
// Option 2: consume one char (if first chars match) and keep pattern
return match(s, p, i, j + 2)
|| (firstMatch && match(s, p, i + 1, j));
}
// No '*' ahead: first chars must match, advance both
return firstMatch && match(s, p, i + 1, j + 1);
}
Why O(2^(m+n))? Each '*' in the pattern creates a branching point. In the worst case (e.g., s = "aaaa...", p = "a*a*a*a*..."), the recursion tree branches exponentially because each '*' can consume varying numbers of characters.
Approach 2: Top-Down DP with Memoization — O(m×n)
The recursive solution recomputes the same (i, j) subproblems many times. For example, matching positions (3, 4) might be reached through different paths of consuming or skipping characters. Memoization eliminates this redundancy.
We use a HashMap with a record key to cache results:
boolean isMatch(String s, String p) {
record Key(int i, int j) {}
var memo = new HashMap<Key, Boolean>();
return match(s, p, 0, 0, memo);
}
boolean match(String s, String p, int i, int j, Map<?, Boolean> memo) {
var key = new Key(i, j); // Key is the record declared above
if (memo.containsKey(key)) {
return memo.get(key);
}
boolean result;
if (j == p.length()) {
result = i == s.length();
} else {
boolean firstMatch = i < s.length()
&& (s.charAt(i) == p.charAt(j) || p.charAt(j) == '.');
if (j + 1 < p.length() && p.charAt(j + 1) == '*') {
result = match(s, p, i, j + 2, memo)
|| (firstMatch && match(s, p, i + 1, j, memo));
} else {
result = firstMatch && match(s, p, i + 1, j + 1, memo);
}
}
memo.put(key, result);
return result;
}
Complexity: There are at most m × n unique (i, j) states, where m = s.length() and n = p.length(). Each state is computed once in O(1) work, giving us O(m×n) time and O(m×n) space.
Note: Java’s record keyword (stable since Java 16) gives us a clean, immutable key with automatic equals() and hashCode() — perfect for memoization keys.
Approach 3: Bottom-Up DP — O(m×n) time, O(m×n) space
Instead of recursion, we fill a 2D table iteratively. Define:
dp[i][j]=trueifs[0..i-1]matchesp[0..j-1]
We use 1-based indexing so that dp[0][0] represents the empty string matching the empty pattern.
Base Cases
dp[0][0] = true— empty string matches empty patterndp[0][j]— empty string can match patterns like"a*b*c*"where every element is consumed zero times. Specifically,dp[0][j] = trueifp[j-1] == '*'anddp[0][j-2] == true
Transitions
For each cell dp[i][j]:
-
If
p[j-1]is a letter or'.':- Characters match if
p[j-1] == s[i-1]orp[j-1] == '.' - Then
dp[i][j] = dp[i-1][j-1]
- Characters match if
-
If
p[j-1] == '*':- Zero occurrences: skip the
(char, *)pair →dp[i][j] = dp[i][j-2] - One+ occurrences: if
p[j-2] == s[i-1]orp[j-2] == '.', thendp[i][j] = dp[i-1][j] - Combine with OR:
dp[i][j] = dp[i][j-2] || (charsMatch && dp[i-1][j])
- Zero occurrences: skip the
boolean isMatch(String s, String p) {
int m = s.length(), n = p.length();
boolean[][] dp = new boolean[m + 1][n + 1];
// Base case: empty matches empty
dp[0][0] = true;
// Base case: patterns like "a*", "a*b*", "a*b*c*" can match empty string
for (int j = 1; j <= n; j++) {
if (p.charAt(j - 1) == '*') {
dp[0][j] = dp[0][j - 2];
}
}
// Fill the DP table
for (int i = 1; i <= m; i++) {
for (int j = 1; j <= n; j++) {
char sc = s.charAt(i - 1);
char pc = p.charAt(j - 1);
if (pc == '.' || pc == sc) {
// Direct character match
dp[i][j] = dp[i - 1][j - 1];
} else if (pc == '*') {
char preceding = p.charAt(j - 2);
// Zero occurrences of preceding char
dp[i][j] = dp[i][j - 2];
// One or more occurrences (if preceding matches current)
if (preceding == '.' || preceding == sc) {
dp[i][j] = dp[i][j] || dp[i - 1][j];
}
}
// else: mismatch, dp[i][j] stays false
}
}
return dp[m][n];
}
ASCII Visualization of DP Table
For s = "aab", p = "c*a*b":
"" c * a * b
"" [ T ] [ F ] [ T ] [ F ] [ T ] [ F ]
a [ F ] [ F ] [ F ] [ T ] [ T ] [ F ]
a [ F ] [ F ] [ F ] [ F ] [ T ] [ F ]
b [ F ] [ F ] [ F ] [ F ] [ F ] [ T ]
The answer is dp[3][5] = true.
Dry Run
Let’s trace through s = "aab", p = "c*a*b" step by step.
Initialization: dp[0][0] = true
Row 0 (empty string):
j=1:p[0]='c'→ not'*', skipj=2:p[1]='*'→dp[0][2] = dp[0][0] = true(skip"c*")j=3:p[2]='a'→ not'*', skipj=4:p[3]='*'→dp[0][4] = dp[0][2] = true(skip"a*"too)j=5:p[4]='b'→ not'*', skip
Row 1 (s[0]='a'):
j=1:'a'vs'c'→ mismatch →falsej=2:p[1]='*', preceding='c'. Zero occ:dp[1][0]=false. Char match?'c'≠'a'→falsej=3:'a'vs'a'→ match →dp[1][3] = dp[0][2] = truej=4:p[3]='*', preceding='a'. Zero occ:dp[1][2]=false. Char match?'a'=='a'→dp[0][4]=true→dp[1][4] = truej=5:'a'vs'b'→ mismatch →false
Row 2 (s[1]='a'):
j=4:p[3]='*', preceding='a'. Zero occ:dp[2][2]=false. Char match?'a'=='a'→dp[1][4]=true→dp[2][4] = true
Row 3 (s[2]='b'):
j=5:'b'vs'b'→ match →dp[3][5] = dp[2][4] = true✓
Space Optimization
Since each row only depends on the current row and the previous row, we can reduce space from O(m×n) to O(n) using two 1D arrays:
boolean isMatch(String s, String p) {
int m = s.length(), n = p.length();
boolean[] prev = new boolean[n + 1];
boolean[] curr = new boolean[n + 1];
// Base case
prev[0] = true;
for (int j = 1; j <= n; j++) {
if (p.charAt(j - 1) == '*') {
prev[j] = prev[j - 2];
}
}
for (int i = 1; i <= m; i++) {
curr[0] = false;
for (int j = 1; j <= n; j++) {
char sc = s.charAt(i - 1);
char pc = p.charAt(j - 1);
if (pc == '.' || pc == sc) {
curr[j] = prev[j - 1];
} else if (pc == '*') {
char preceding = p.charAt(j - 2);
curr[j] = curr[j - 2];
if (preceding == '.' || preceding == sc) {
curr[j] = curr[j] || prev[j];
}
} else {
curr[j] = false;
}
}
// Swap rows
boolean[] temp = prev;
prev = curr;
curr = temp;
}
return prev[n];
}
Complexity Comparison
| Approach | Time | Space | Notes |
|---|---|---|---|
| Recursive Brute Force | O(2^(m+n)) | O(m+n) stack | Exponential branching at each '*' |
| Top-Down Memoization | O(m×n) | O(m×n) | HashMap + recursion stack |
| Bottom-Up DP | O(m×n) | O(m×n) | Iterative, no stack overhead |
| Space-Optimized DP | O(m×n) | O(n) | Two rolling arrays |
Where m = s.length() and n = p.length().
Edge Cases
- Empty string with
'*'pattern:s = "",p = "a*b*"→true(every'*'unit takes zero occurrences) - Consecutive wildcards:
s = "abcdef",p = ".*.*"→true(first".*"can consume everything, second takes zero) - Single character:
s = "a",p = "."→true;s = "a",p = "b"→false - All wildcards:
s = "anything",p = ".*"→true - Tricky false case:
s = "mississippi",p = "mis*is*p*."→false(the"p*."unit cannot cover"ppi") - Pattern longer than string:
s = "a",p = "a*a*a*"→true(all'*'units take zero except one)
Interviewer Tips
Start by clarifying the '*' rule. Many candidates confuse this with glob matching where * independently means “any sequence.” Here, '*' is a quantifier for the preceding character. State this explicitly before coding.
Draw the DP table. Interviewers respond well to candidates who visualize their approach. Sketch a small table on the whiteboard and fill in a few cells to demonstrate understanding before writing code.
Explain the two choices at '*'. The entire algorithm hinges on this branching logic:
- Zero occurrences → skip two positions in the pattern
- One or more occurrences → consume one character, keep the pattern position
Mention the recursive solution first even if you plan to write the DP version. It shows you understand the problem structure before optimizing.
Common follow-up: Wildcard Matching (LeetCode 44). In that variant, '?' matches any single character (like '.' here) and '*' matches any sequence of characters independently (not tied to a preceding element). The DP recurrence changes because '*' in wildcard matching doesn’t need a preceding character — it acts as a standalone “match anything” unit.