Dynamic programming on strings

Learn why every dynamic programming strings problem uses the same 2D table structure. Walkthrough Edit Distance derivation and the three pattern families.

10 minutes

Advanced

What you will learn

Why every string DP problem reduces to a 2D table with string positions as axes

How to construct dp[i][j] base cases and recurrence from scratch

A complete Edit Distance derivation with three-operation logic

Which string DP pattern family a problem belongs to and how to tell

You've solved Coin Change. You've built a 1D table for Longest Increasing Subsequence. Then you open Edit Distance and nothing looks familiar. Dynamic programming strings problems throw you into a 2D table with three recurrence options per cell. It feels like a different subject entirely.

It's not. Every string DP problem follows the same underlying structure, and once you see it, the whole category opens up.

⚡TL;DR

All string DP problems build a 2D table where dp[i][j] represents a property of the first i characters of one string and the first j characters of another. LCS, Edit Distance, and pattern matching are variations on the same foundation. Learn the framework once, and you can derive the recurrence for any string DP problem from scratch.

What makes string DP different

Dynamic programming on strings works by building a 2D table where dp[i][j] represents a property of the first i characters of one string and the first j characters of another. Most explanations jump straight to specific problems without saying this directly.

In 1D DP problems like Coin Change or Climbing Stairs, you've got a single variable tracking progress through one dimension. String DP tracks progress through two strings at the same time. That second dimension is what makes the table 2D, and it's why the recurrence feels more complex.

The complexity is spatial, though, not conceptual. You're still asking the same question at every cell: "Given what I know about shorter prefixes, what's the answer for this prefix pair?"

Why does this category feel harder than 1D DP? Visualising a 1D array is natural. Visualising a 2D table where both axes represent growing prefixes of different strings means holding more state in your head. The actual logic at each cell is usually simpler than something like Knapsack. It's the bookkeeping that trips people up.

ℹ️ Info

When you see a DP problem involving two strings (or one string compared against itself), the form is almost always a 2D table with string positions as axes. This is the first identification trigger for the string DP category.

Building the 2D table from scratch

The construction process is the same across every string DP problem. It takes three steps, same order every time.

Start by defining what dp[i][j] means. This is the hardest part. For Longest Common Subsequence, dp[i][j] is the length of the LCS of the first i characters of string A and the first j characters of string B. For Edit Distance, dp[i][j] is the minimum operations to convert the first i characters of A into the first j characters of B. The definition changes per problem, but the shape doesn't.

Next, set the base cases by filling row 0 and column 0. These always have meaning. dp[0][j] answers "what's the result when the first string is empty?" For LCS, both boundaries are 0 (no common subsequence with an empty string). For Edit Distance, dp[0][j] = j (insert j characters) and dp[i][0] = i (delete i characters). Base cases follow directly from the definition. They're never filler.

Then write the recurrence by asking at each cell (i, j) whether the characters at position i and j match. If they do, one thing happens. If they don't, something else happens. The specifics depend on the problem, but the branching structure is consistent.

Python

That skeleton covers LCS, Edit Distance, Longest Common Substring, Interleaving Check, and Wildcard Pattern Matching. The only things that change are the base case logic, the match handler, and the mismatch handler.

Edit distance: A complete walkthrough

Edit Distance is worth learning first because the recurrence has three branches. You can't just memorise the formula. You have to understand why each branch exists.

Given two strings, find the minimum number of operations (insert, delete, replace) to convert one into the other.

dp[i][j] represents the minimum operations to convert the first i characters of word1 into the first j characters of word2.

For base cases, dp[0][j] = j because converting an empty string into word2[0..j] requires j insertions. dp[i][0] = i because converting word1[0..i] into an empty string requires i deletions.

The recurrence depends on whether word1[i-1] and word2[j-1] match.

When the characters match, no operation is needed. dp[i][j] = dp[i-1][j-1]. You've already handled the first i-1 and j-1 characters optimally, and this pair matches for free.

When they don't match, you pick the cheapest of three operations. Replace changes word1[i-1] to word2[j-1], costing dp[i-1][j-1] + 1. Delete removes word1[i-1], costing dp[i-1][j] + 1. Insert adds word2[j-1] after position i, costing dp[i][j-1] + 1.

What makes this click is seeing why each operation maps to a specific adjacent cell. Replace consumes one character from both strings (diagonal). Delete consumes one from word1 only (move up). Insert adds one to match word2 (move left). The three directions in the table are the three operations. The geometry does the work.

Python

Trace it on word1 = "horse", word2 = "ros". The table fills left-to-right, top-to-bottom. dp[5][3] = 3, achieved by replacing h with r, deleting r at position 2, and deleting e at position 4. You can verify this by tracing backwards through the table from dp[5][3] to dp[0][0], following the minimum-cost path at each step.

“Replace, delete, and insert map to diagonal, up, and left in the 2D table. The table geometry is the algorithm.”

String DP insight

Three dynamic programming strings patterns that show up in interviews

String DP problems in interviews fall into three families. They all share the 2D table layout but differ in what dp[i][j] represents and how the recurrence branches.

The subsequence comparison family (LCS and relatives) covers problems where dp[i][j] counts or measures something about subsequences of both strings. Longest Common Subsequence, Shortest Common Supersequence, and Longest Repeated Subsequence all belong here. You'll recognise these when a problem asks about a non-contiguous selection from two sequences.

The transformation family (Edit Distance and relatives) covers problems where dp[i][j] counts the cost of converting one prefix into another. Edit Distance, Interleaving Check, and deletion-based variants belong here. The tell is when a problem asks about converting, matching, or merging two strings through operations.

The pattern matching family is where dp[i][j] represents whether a prefix of the string matches a prefix of the pattern. Wildcard Pattern Matching and Regular Expression Matching live here. If one of the two "strings" contains special characters (*, ?, .) that change matching rules, you're in this family.

String DP pattern families

Subsequence (LCS)

Non-contiguous selection from two sequences. Recurrence branches on match vs skip.

Transformation (Edit Distance)

Converting one prefix into another. Recurrence branches on operations.

Pattern Matching

Matching string against pattern with wildcards. Recurrence branches on special characters.

These families overlap in places. Interleaving Check has transformation-like recurrence but subsequence-like selection logic. Knowing the three families won't eliminate thinking, but it narrows your search space from "any possible recurrence" to "one of three known frameworks."

Common mistakes in dynamic programming strings

Off-by-one indexing is the most common bug: The table has dimensions (m+1) x (n+1), not m x n. Row 0 and column 0 represent empty prefixes. When you access s1[i-1] and s2[j-1] inside the loop starting at i=1, j=1, you're correctly mapping table indices to string positions. The classic mistake is accessing s1[i] instead of s1[i-1], which shifts every comparison by one character. That error is subtle because the table still fills without crashing. You just get wrong answers.
Wrong base cases will silently corrupt the entire table: Setting dp[0][j] = 0 in Edit Distance (instead of j) breaks everything downstream because every cell depends on the values below and to the left. Before filling any table, ask yourself what it means when one string is empty. The answer to that question is your base case.
Premature space optimisation causes subtle bugs too: Many dynamic programming strings problems can be reduced from O(mn) space to O(min(m,n)) by keeping only two rows. But this only works when the recurrence depends on the three adjacent cells: dp[i-1][j], dp[i][j-1], and dp[i-1][j-1]. If the recurrence reaches further back (like Longest Palindromic Subsequence when formulated with a single string), the two-row trick breaks. Optimise after the solution works, not during derivation.

⚠️ Warning

Space optimisation in string DP discards previous rows. If you need to reconstruct the actual solution (not just the optimal value), you'll need the full table or a separate backtracking pass.

One branch of a larger DP space

String DP is one branch of a larger DP problem space. The same derivation process (define the state, set base cases, write the recurrence) applies to every DP family. What changes is the table shape and what the axes represent.

If you haven't worked through the derivation process for 1D problems yet, the how to identify DP problems guide covers the identification triggers that tell you a problem requires DP in the first place. The complete DP interview guide covers all five pattern families, including string DP, with derivation walkthroughs for each. For the formal definition of optimal substructure that underpins every DP problem, the Wikipedia entry is a solid reference.

Codeintuition's Dynamic Programming course covers string DP problems including Edit Distance, LCS, and Wildcard Pattern Matching, with the recurrence derivation taught before each problem's solution. The Edit Distance lesson traces variable state frame by frame through the 2D table, so you can see exactly how each cell's value gets computed from its neighbours.

The free Arrays course uses the same derivation-first model on simpler patterns like sliding window and two pointers. Sixty-three lessons and 85 problems, permanently free, so you can test whether the approach works before committing to the full DP path at $79.99/year.

Six months from now, you'll open a string DP problem you've never seen. Two strings in the input, some transformation or comparison in the output. You won't search for the specific problem's solution. You'll define dp[i][j], fill in row 0 and column 0, and write the recurrence from the match/mismatch branching logic you've already internalised. The specific problem won't matter. The derivation process will already be familiar.

Do you want to master data structures?

Try our data structures learning path made of highly visual and interactive courses. Get hands on experience by solving real problems in a structured manner. All resources you would ever need in one place for FREE

Start Learning

Dynamic programming on strings is a technique for solving problems that involve comparing, transforming, or matching two strings. It works by building a 2D table where dp[i][j] represents a property of the first i characters of one string and the first j characters of another.

The core principle is identical. You break a problem into overlapping subproblems and store results. The difference is that regular DP often uses a 1D table indexed by a single variable like amount or position. String DP uses a 2D table where both axes represent positions in strings, and the recurrence branches based on whether characters at those positions match. That extra dimension is what makes string DP feel harder, even though the cell-level logic is often simpler.

Edit Distance, Longest Common Subsequence, and Longest Common Substring appear most frequently across FAANG companies. Wildcard Pattern Matching and Regular Expression Matching show up at Google and Amazon specifically. Interleaving String is less common but tests deeper understanding of the 2D table mechanics. If you're short on time, Edit Distance alone covers more interview ground than any other single string DP problem.

Yes. Every string DP problem can be solved top-down with memoization or bottom-up with tabulation. Memoization can feel more natural when the recurrence is complex because you write the recursive logic directly. Tabulation is better when you need space optimisation, since you can reduce the table to two rows.

Look for two signals. First, the input involves two strings (or one string compared against itself, like palindrome problems). Second, the problem asks for an optimal value, a count, or a yes/no answer that depends on comparing characters across the strings. If both signals are present, you're almost certainly looking at a 2D string DP problem.