Dynamic programming on strings
Learn why every dynamic programming strings problem uses the same 2D table structure. Walkthrough Edit Distance derivation and the three pattern families.
Why every string DP problem reduces to a 2D table with string positions as axes
How to construct dp[i][j] base cases and recurrence from scratch
A complete Edit Distance derivation with three-operation logic
Which string DP pattern family a problem belongs to and how to tell
You've solved Coin Change. You've built a 1D table for Longest Increasing Subsequence. Then you open Edit Distance and nothing looks familiar. Dynamic programming strings problems throw you into a 2D table with three recurrence options per cell. It feels like a different subject entirely.
It's not. Every string DP problem follows the same underlying structure, and once you see it, the whole category opens up.
What makes string DP different
Dynamic programming on strings works by building a 2D table where dp[i][j] represents a property of the first i characters of one string and the first j characters of another. Most explanations jump straight to specific problems without saying this directly.
In 1D DP problems like Coin Change or Climbing Stairs, you've got a single variable tracking progress through one dimension. String DP tracks progress through two strings at the same time. That second dimension is what makes the table 2D, and it's why the recurrence feels more complex.
The complexity is spatial, though, not conceptual. You're still asking the same question at every cell: "Given what I know about shorter prefixes, what's the answer for this prefix pair?"
Why does this category feel harder than 1D DP? Visualising a 1D array is natural. Visualising a 2D table where both axes represent growing prefixes of different strings means holding more state in your head. The actual logic at each cell is usually simpler than something like Knapsack. It's the bookkeeping that trips people up.
Building the 2D table from scratch
The construction process is the same across every string DP problem. It takes three steps, same order every time.
Start by defining what dp[i][j] means. This is the hardest part. For Longest Common Subsequence, dp[i][j] is the length of the LCS of the first i characters of string A and the first j characters of string B. For Edit Distance, dp[i][j] is the minimum operations to convert the first i characters of A into the first j characters of B. The definition changes per problem, but the shape doesn't.
Next, set the base cases by filling row 0 and column 0. These always have meaning. dp[0][j] answers "what's the result when the first string is empty?" For LCS, both boundaries are 0 (no common subsequence with an empty string). For Edit Distance, dp[0][j] = j (insert j characters) and dp[i][0] = i (delete i characters). Base cases follow directly from the definition. They're never filler.
Then write the recurrence by asking at each cell (i, j) whether the characters at position i and j match. If they do, one thing happens. If they don't, something else happens. The specifics depend on the problem, but the branching structure is consistent.
Python
That skeleton covers LCS, Edit Distance, Longest Common Substring, Interleaving Check, and Wildcard Pattern Matching. The only things that change are the base case logic, the match handler, and the mismatch handler.
Edit distance: A complete walkthrough
Edit Distance is worth learning first because the recurrence has three branches. You can't just memorise the formula. You have to understand why each branch exists.
dp[i][j] represents the minimum operations to convert the first i characters of word1 into the first j characters of word2.
For base cases, dp[0][j] = j because converting an empty string into word2[0..j] requires j insertions. dp[i][0] = i because converting word1[0..i] into an empty string requires i deletions.
The recurrence depends on whether word1[i-1] and word2[j-1] match.
When the characters match, no operation is needed. dp[i][j] = dp[i-1][j-1]. You've already handled the first i-1 and j-1 characters optimally, and this pair matches for free.
When they don't match, you pick the cheapest of three operations. Replace changes word1[i-1] to word2[j-1], costing dp[i-1][j-1] + 1. Delete removes word1[i-1], costing dp[i-1][j] + 1. Insert adds word2[j-1] after position i, costing dp[i][j-1] + 1.
What makes this click is seeing why each operation maps to a specific adjacent cell. Replace consumes one character from both strings (diagonal). Delete consumes one from word1 only (move up). Insert adds one to match word2 (move left). The three directions in the table are the three operations. The geometry does the work.
Python
Trace it on word1 = "horse", word2 = "ros". The table fills left-to-right, top-to-bottom. dp[5][3] = 3, achieved by replacing h with r, deleting r at position 2, and deleting e at position 4. You can verify this by tracing backwards through the table from dp[5][3] to dp[0][0], following the minimum-cost path at each step.
“Replace, delete, and insert map to diagonal, up, and left in the 2D table. The table geometry is the algorithm.”
Three dynamic programming strings patterns that show up in interviews
String DP problems in interviews fall into three families. They all share the 2D table layout but differ in what dp[i][j] represents and how the recurrence branches.
The subsequence comparison family (LCS and relatives) covers problems where dp[i][j] counts or measures something about subsequences of both strings. Longest Common Subsequence, Shortest Common Supersequence, and Longest Repeated Subsequence all belong here. You'll recognise these when a problem asks about a non-contiguous selection from two sequences.
The transformation family (Edit Distance and relatives) covers problems where dp[i][j] counts the cost of converting one prefix into another. Edit Distance, Interleaving Check, and deletion-based variants belong here. The tell is when a problem asks about converting, matching, or merging two strings through operations.
The pattern matching family is where dp[i][j] represents whether a prefix of the string matches a prefix of the pattern. Wildcard Pattern Matching and Regular Expression Matching live here. If one of the two "strings" contains special characters (*, ?, .) that change matching rules, you're in this family.
These families overlap in places. Interleaving Check has transformation-like recurrence but subsequence-like selection logic. Knowing the three families won't eliminate thinking, but it narrows your search space from "any possible recurrence" to "one of three known frameworks."
Common mistakes in dynamic programming strings
- Off-by-one indexing is the most common bug: The table has dimensions (m+1) x (n+1), not m x n. Row 0 and column 0 represent empty prefixes. When you access s1[i-1] and s2[j-1] inside the loop starting at i=1, j=1, you're correctly mapping table indices to string positions. The classic mistake is accessing
s1[i]instead ofs1[i-1], which shifts every comparison by one character. That error is subtle because the table still fills without crashing. You just get wrong answers. - Wrong base cases will silently corrupt the entire table: Setting dp[0][j] = 0 in Edit Distance (instead of j) breaks everything downstream because every cell depends on the values below and to the left. Before filling any table, ask yourself what it means when one string is empty. The answer to that question is your base case.
- Premature space optimisation causes subtle bugs too: Many dynamic programming strings problems can be reduced from O(mn) space to O(min(m,n)) by keeping only two rows. But this only works when the recurrence depends on the three adjacent cells: dp[i-1][j], dp[i][j-1], and dp[i-1][j-1]. If the recurrence reaches further back (like Longest Palindromic Subsequence when formulated with a single string), the two-row trick breaks. Optimise after the solution works, not during derivation.
One branch of a larger DP space
String DP is one branch of a larger DP problem space. The same derivation process (define the state, set base cases, write the recurrence) applies to every DP family. What changes is the table shape and what the axes represent.
If you haven't worked through the derivation process for 1D problems yet, the how to identify DP problems guide covers the identification triggers that tell you a problem requires DP in the first place. The complete DP interview guide covers all five pattern families, including string DP, with derivation walkthroughs for each. For the formal definition of optimal substructure that underpins every DP problem, the Wikipedia entry is a solid reference.
Codeintuition's Dynamic Programming course covers string DP problems including Edit Distance, LCS, and Wildcard Pattern Matching, with the recurrence derivation taught before each problem's solution. The Edit Distance lesson traces variable state frame by frame through the 2D table, so you can see exactly how each cell's value gets computed from its neighbours.
The free Arrays course uses the same derivation-first model on simpler patterns like sliding window and two pointers. Sixty-three lessons and 85 problems, permanently free, so you can test whether the approach works before committing to the full DP path at $79.99/year.
Six months from now, you'll open a string DP problem you've never seen. Two strings in the input, some transformation or comparison in the output. You won't search for the specific problem's solution. You'll define dp[i][j], fill in row 0 and column 0, and write the recurrence from the match/mismatch branching logic you've already internalised. The specific problem won't matter. The derivation process will already be familiar.
Do you want to master data structures?
Try our data structures learning path made of highly visual and interactive courses. Get hands on experience by solving real problems in a structured manner. All resources you would ever need in one place for FREE