Regular Expression Matching - cracking the tech interview system design and algorithms in java 25 • Dev|Journal

Problem Statement

Given a string s and a pattern p, implement regular expression matching with support for two special characters:

'.' — matches any single character
'*' — matches zero or more of the preceding element

The match must cover the entire input string, not just a partial substring.

Examples:

`s`	`p`	Result	Explanation
`"aa"`	`"a"`	`false`	`"a"` does not match the entire string `"aa"`
`"aa"`	`"a*"`	`true`	`'*'` means zero or more of `'a'`, so `"aa"` matches
`"ab"`	`".*"`	`true`	`".*"` means zero or more of any character
`"aab"`	`"cab"`	`true`	`"c"` matches zero `'c'`s, `"a"` matches two `'a'`s, `"b"` matches `'b'`
`"mississippi"`	`"misisp*."`	`false`	The final `"p*."` can only match one `'p'` plus one char, but we need `"ppi"`

Constraints:

1 <= s.length <= 20
1 <= p.length <= 20
p contains only lowercase letters, '.', and '*'
Each '*' is guaranteed to have a valid preceding character

Understanding the Rules

The '*' operator is the crux of this problem. Unlike shell globbing where * means “match anything,” here '*' is a quantifier that modifies the character immediately before it:

"a*" → matches "", "a", "aa", "aaa", … (zero or more 'a's)
".*" → matches "", "x", "xy", "xyz", … (zero or more of any character)
"ab*" → matches "a", "ab", "abb", "abbb", …

A '*' can never appear as the first character of a pattern. It always pairs with its predecessor to form a unit. When processing the pattern, we should think in terms of these units:

Pattern "c*a*b" breaks down as:
  [c*]  [a*]  [b]
   ↓      ↓     ↓
  unit   unit  unit

At each decision point, we ask: does the current pattern unit match the current character in the string? If the unit contains '*', we have a choice: consume a character or skip the unit entirely.

Approach 1: Recursive Brute Force — O(2^(m+n))

The recursive approach mirrors how we naturally reason about pattern matching. At each step we look at the current position in both the string and the pattern and decide what to do.

Key observations:

If the next character in the pattern is '*', we have two choices: skip the (char, *) pair entirely (zero occurrences), or if the current characters match, consume one character from s and keep the pattern in place (allowing more matches).
If the next character is NOT '*', the current characters must match (either same letter or '.'), and we advance both pointers by one.

boolean isMatch(String s, String p) {
    return match(s, p, 0, 0);
}

boolean match(String s, String p, int i, int j) {
    // Base case: pattern exhausted
    if (j == p.length()) {
        return i == s.length();
    }

    // Check if current characters match
    boolean firstMatch = i < s.length()
            && (s.charAt(i) == p.charAt(j) || p.charAt(j) == '.');

    // If next pattern char is '*', we have a choice
    if (j + 1 < p.length() && p.charAt(j + 1) == '*') {
        // Option 1: skip the "x*" pair (zero occurrences)
        // Option 2: consume one char (if first chars match) and keep pattern
        return match(s, p, i, j + 2)
                || (firstMatch && match(s, p, i + 1, j));
    }

    // No '*' ahead: first chars must match, advance both
    return firstMatch && match(s, p, i + 1, j + 1);
}

Why O(2^(m+n))? Each '*' in the pattern creates a branching point. In the worst case (e.g., s = "aaaa...", p = "a*a*a*a*..."), the recursion tree branches exponentially because each '*' can consume varying numbers of characters.

Approach 2: Top-Down DP with Memoization — O(m×n)

The recursive solution recomputes the same (i, j) subproblems many times. For example, matching positions (3, 4) might be reached through different paths of consuming or skipping characters. Memoization eliminates this redundancy.

We use a HashMap with a record key to cache results:

boolean isMatch(String s, String p) {
    record Key(int i, int j) {}
    var memo = new HashMap<Key, Boolean>();
    return match(s, p, 0, 0, memo);
}

boolean match(String s, String p, int i, int j, Map<?, Boolean> memo) {
    var key = new Key(i, j);  // Key is the record declared above
    if (memo.containsKey(key)) {
        return memo.get(key);
    }

    boolean result;
    if (j == p.length()) {
        result = i == s.length();
    } else {
        boolean firstMatch = i < s.length()
                && (s.charAt(i) == p.charAt(j) || p.charAt(j) == '.');

        if (j + 1 < p.length() && p.charAt(j + 1) == '*') {
            result = match(s, p, i, j + 2, memo)
                    || (firstMatch && match(s, p, i + 1, j, memo));
        } else {
            result = firstMatch && match(s, p, i + 1, j + 1, memo);
        }
    }

    memo.put(key, result);
    return result;
}

Complexity: There are at most m × n unique (i, j) states, where m = s.length() and n = p.length(). Each state is computed once in O(1) work, giving us O(m×n) time and O(m×n) space.

Note: Java’s record keyword (stable since Java 16) gives us a clean, immutable key with automatic equals() and hashCode() — perfect for memoization keys.

Approach 3: Bottom-Up DP — O(m×n) time, O(m×n) space

Instead of recursion, we fill a 2D table iteratively. Define:

dp[i][j] = true if s[0..i-1] matches p[0..j-1]

We use 1-based indexing so that dp[0][0] represents the empty string matching the empty pattern.

Base Cases

dp[0][0] = true — empty string matches empty pattern
dp[0][j] — empty string can match patterns like "a*b*c*" where every element is consumed zero times. Specifically, dp[0][j] = true if p[j-1] == '*' and dp[0][j-2] == true

Transitions

For each cell dp[i][j]:

If p[j-1] is a letter or '.':
- Characters match if p[j-1] == s[i-1] or p[j-1] == '.'
- Then dp[i][j] = dp[i-1][j-1]
If p[j-1] == '*':
- Zero occurrences: skip the (char, *) pair → dp[i][j] = dp[i][j-2]
- One+ occurrences: if p[j-2] == s[i-1] or p[j-2] == '.', then dp[i][j] = dp[i-1][j]
- Combine with OR: dp[i][j] = dp[i][j-2] || (charsMatch && dp[i-1][j])

boolean isMatch(String s, String p) {
    int m = s.length(), n = p.length();
    boolean[][] dp = new boolean[m + 1][n + 1];

    // Base case: empty matches empty
    dp[0][0] = true;

    // Base case: patterns like "a*", "a*b*", "a*b*c*" can match empty string
    for (int j = 1; j <= n; j++) {
        if (p.charAt(j - 1) == '*') {
            dp[0][j] = dp[0][j - 2];
        }
    }

    // Fill the DP table
    for (int i = 1; i <= m; i++) {
        for (int j = 1; j <= n; j++) {
            char sc = s.charAt(i - 1);
            char pc = p.charAt(j - 1);

            if (pc == '.' || pc == sc) {
                // Direct character match
                dp[i][j] = dp[i - 1][j - 1];
            } else if (pc == '*') {
                char preceding = p.charAt(j - 2);

                // Zero occurrences of preceding char
                dp[i][j] = dp[i][j - 2];

                // One or more occurrences (if preceding matches current)
                if (preceding == '.' || preceding == sc) {
                    dp[i][j] = dp[i][j] || dp[i - 1][j];
                }
            }
            // else: mismatch, dp[i][j] stays false
        }
    }

    return dp[m][n];
}

ASCII Visualization of DP Table

For s = "aab", p = "c*a*b":

        ""    c     *     a     *     b
  ""  [ T ] [ F ] [ T ] [ F ] [ T ] [ F ]
   a  [ F ] [ F ] [ F ] [ T ] [ T ] [ F ]
   a  [ F ] [ F ] [ F ] [ F ] [ T ] [ F ]
   b  [ F ] [ F ] [ F ] [ F ] [ F ] [ T ]

The answer is dp[3][5] = true.

Dry Run

Let’s trace through s = "aab", p = "c*a*b" step by step.

Initialization: dp[0][0] = true

Row 0 (empty string):

j=1: p[0]='c' → not '*', skip
j=2: p[1]='*' → dp[0][2] = dp[0][0] = true (skip "c*")
j=3: p[2]='a' → not '*', skip
j=4: p[3]='*' → dp[0][4] = dp[0][2] = true (skip "a*" too)
j=5: p[4]='b' → not '*', skip

Row 1 (s[0]='a'):

j=1: 'a' vs 'c' → mismatch → false
j=2: p[1]='*', preceding='c'. Zero occ: dp[1][0]=false. Char match? 'c'≠'a' → false
j=3: 'a' vs 'a' → match → dp[1][3] = dp[0][2] = true
j=4: p[3]='*', preceding='a'. Zero occ: dp[1][2]=false. Char match? 'a'=='a' → dp[0][4]=true → dp[1][4] = true
j=5: 'a' vs 'b' → mismatch → false

Row 2 (s[1]='a'):

j=4: p[3]='*', preceding='a'. Zero occ: dp[2][2]=false. Char match? 'a'=='a' → dp[1][4]=true → dp[2][4] = true

Row 3 (s[2]='b'):

j=5: 'b' vs 'b' → match → dp[3][5] = dp[2][4] = true ✓

Space Optimization

Since each row only depends on the current row and the previous row, we can reduce space from O(m×n) to O(n) using two 1D arrays:

boolean isMatch(String s, String p) {
    int m = s.length(), n = p.length();
    boolean[] prev = new boolean[n + 1];
    boolean[] curr = new boolean[n + 1];

    // Base case
    prev[0] = true;
    for (int j = 1; j <= n; j++) {
        if (p.charAt(j - 1) == '*') {
            prev[j] = prev[j - 2];
        }
    }

    for (int i = 1; i <= m; i++) {
        curr[0] = false;
        for (int j = 1; j <= n; j++) {
            char sc = s.charAt(i - 1);
            char pc = p.charAt(j - 1);

            if (pc == '.' || pc == sc) {
                curr[j] = prev[j - 1];
            } else if (pc == '*') {
                char preceding = p.charAt(j - 2);
                curr[j] = curr[j - 2];
                if (preceding == '.' || preceding == sc) {
                    curr[j] = curr[j] || prev[j];
                }
            } else {
                curr[j] = false;
            }
        }

        // Swap rows
        boolean[] temp = prev;
        prev = curr;
        curr = temp;
    }

    return prev[n];
}

Complexity Comparison

Approach	Time	Space	Notes
Recursive Brute Force	O(2^(m+n))	O(m+n) stack	Exponential branching at each `'*'`
Top-Down Memoization	O(m×n)	O(m×n)	HashMap + recursion stack
Bottom-Up DP	O(m×n)	O(m×n)	Iterative, no stack overhead
Space-Optimized DP	O(m×n)	O(n)	Two rolling arrays

Where m = s.length() and n = p.length().

Edge Cases

Empty string with '*' pattern: s = "", p = "a*b*" → true (every '*' unit takes zero occurrences)
Consecutive wildcards: s = "abcdef", p = ".*.*" → true (first ".*" can consume everything, second takes zero)
Single character: s = "a", p = "." → true; s = "a", p = "b" → false
All wildcards: s = "anything", p = ".*" → true
Tricky false case: s = "mississippi", p = "mis*is*p*." → false (the "p*." unit cannot cover "ppi")
Pattern longer than string: s = "a", p = "a*a*a*" → true (all '*' units take zero except one)

Interviewer Tips

Start by clarifying the '*' rule. Many candidates confuse this with glob matching where * independently means “any sequence.” Here, '*' is a quantifier for the preceding character. State this explicitly before coding.

Draw the DP table. Interviewers respond well to candidates who visualize their approach. Sketch a small table on the whiteboard and fill in a few cells to demonstrate understanding before writing code.

Explain the two choices at '*'. The entire algorithm hinges on this branching logic:

Zero occurrences → skip two positions in the pattern
One or more occurrences → consume one character, keep the pattern position

Mention the recursive solution first even if you plan to write the DP version. It shows you understand the problem structure before optimizing.

Common follow-up: Wildcard Matching (LeetCode 44). In that variant, '?' matches any single character (like '.' here) and '*' matches any sequence of characters independently (not tied to a preceding element). The DP recurrence changes because '*' in wildcard matching doesn’t need a preceding character — it acts as a standalone “match anything” unit.