The Bellman optimality equation


In the previous module we learned what a state is and how to discover and verify it. With the nodes of the state space DAG now identified, the natural next question is how those nodes are connected to one another, and how the value at any node is computed from the values at its smaller neighbours. That recurrence relation is the second pillar of a dynamic programming algorithm and is the one that defines these relationships.

The recurrence relation is the second pillar of a dynamic programming algorithm.

The recurrence encodes the problem's optimal substructure, the legal decisions available at each state, and the way local choices combine into global solutions. Once the state is correct, deriving the recurrence is generally a well defined process. In this lesson we will learn about the Bellman optimality equation which is template that applies to all dynamic programming algorithms.

Bellman optimality equation is the generic template for all DP algorithms.

What is the Bellman optimality equation?

The recurrence relation for all dynamic programming problems follows the same template, called the Bellman optimality equation. It is very easy to derive it for a generic dynamic programming problem. Consider that for the state space tree of a dynamic programming problem, we are at some state s, and we want to know dp(s), the optimal value (solution) at this state.

A generic state s in a state space tree of a dynamic programming problem.

For that state s, the set A(s) lists the actions we can take to move to a new (smaller) state.

Liking the course? Check our discounted plans to continue learning.