Understanding the bottom-up solution to the longest common substring problem


To solve the longest common substring problem using a bottom-up dynamic programming approach, we build solutions iteratively starting from the smallest subproblems and working up to the final answer, storing results in a table as we progress. As with any bottom-up solution, we process subproblems in an order that guarantees every result we depend on has already been computed and stored before we need it.

The bottom-up solution

When implementing the recurrence relation using a top-down algorithm, i and j are indices into s1 and s2, and the base case occurs when either index drops below 0.

For the recurrence relations, if i < 0 or j < 0, the solution is 0, and this serves as the base case.

However, arrays cannot have negative indices. To handle this cleanly in the bottom-up table, we shift the meaning of i and j so that they represent the number of characters considered from each string rather than indices. This way, the base case becomes i == 0 or j == 0, which maps naturally to row 0 and column 0 of the table.

How does this shift change the recurrence?

The recurrence relation itself does not change, we still check if the last characters match and either extend or terminate. The only difference is that when i represents the number of characters considered, the last character of the prefix is at index i - 1 in the string (instead of index i). This is purely a shift in how we index into the table and the strings, not a change in logic.

We can use i and j to denote the number of items considered from the start, instead of the index of the last item.

We create a 2D array lcs of dimensions (m + 1) × (n + 1), where lcs[i][j] stores the length of the longest common substring that ends at the last character of the length-i prefix of s1 and the length-j prefix of s2. All entries are initialized to 0.

Liking the course? Check our discounted plans to continue learning.