Dijkstra Algorithm Explained: Why It Actually Works
Dijkstra algorithm explained clearly: the greedy invariant, step by step trace, and when to choose it over Bellman-Ford or BFS.
What makes Dijkstra's algorithm correct (the greedy settlement invariant)
How to trace the algorithm step by step on a weighted graph
Why negative edge weights break the algorithm's guarantee
When to choose Dijkstra over Bellman-Ford, Floyd-Warshall, or BFS
Search for "Dijkstra algorithm explained" and you'll find the same recipe everywhere. Pick the node with the smallest distance, update its neighbours, repeat. You can memorise those steps in twenty minutes. But when an interview problem changes the graph structure or adds a constraint you haven't seen, memorised steps fall apart.
The piece most explanations skip isn't the procedure. It's the invariant that makes the procedure correct. Without that, you're following a recipe you can't adapt, can't prove, and can't debug when it gives wrong answers.
What Dijkstra's algorithm actually solves
Dijkstra's algorithm finds the shortest path from a single source node to every other node in a weighted graph with nonnegative edge weights. It works by repeatedly selecting the unvisited node with the smallest tentative distance and permanently settling it. This greedy approach is correct because nonnegative weights guarantee that no future path through an unsettled node can produce a shorter route.
The problem it solves is single source shortest path on weighted graphs. Not all pairs shortest path (Floyd-Warshall's territory) and not shortest path with negative weights (Bellman-Ford). One specific variant, solved efficiently.
If you've studied BFS and DFS, there's a direct connection. BFS finds shortest paths in unweighted graphs because every edge has equal weight. The first time you reach a node through BFS, that's the shortest path. Dijkstra extends this to weighted graphs by replacing the FIFO queue with a priority queue ordered by tentative distance. Instead of processing nodes in discovery order, you process them in order of accumulated weight.
That substitution is the entire algorithm.
If a problem gives you an unweighted graph, BFS is simpler and equally correct. If edges carry different weights, BFS gives wrong answers and Dijkstra is what you need.
Why Dijkstra's algorithm works: The invariant explained
The claim is simple. When you pop a node from the priority queue, its shortest distance is final. No path discovered later will produce a shorter route to that node.
Why? Because every edge weight is nonnegative. When node u is popped with tentative distance d(u), every other node still in the queue has a tentative distance >= d(u). Any alternative path to u would need to go through one of those unsettled nodes first, adding at least their distance (already >= d(u)) plus the connecting edges (>= 0). The result can't be smaller than d(u).
That's the greedy settlement invariant. Not complicated, but it's the reason the whole algorithm works. It also immediately reveals the limitation: the proof depends on nonnegative edges. Remove that constraint, and the proof collapses.
Why negative weights break everything
Consider three nodes (A, B, C) with edges A→B (weight 1), A→C (weight 3), and C→B (weight -5).
Starting from A, Dijkstra pops B first (distance 1) and settles it. But there's a path A→C→B with total weight 3 + (-5) = -2, which is shorter than 1. Dijkstra never discovers this because B was already settled. The invariant assumed no future path could beat distance 1. With a negative edge, that assumption fails.
This isn't a theoretical edge case. Interview problems sometimes include negative weights specifically to test whether you understand why Dijkstra doesn't work there, not just that it doesn't. The invariant gives you the answer in one sentence. The greedy settlement guarantee requires nonnegative edges because a path through an unsettled node with a negative edge can produce a shorter total distance than the already settled value.
“The algorithm's correctness isn't a property of its steps. It's a property of the weight constraints.”
Dijkstra's algorithm explained step by step
Consider a weighted graph with 5 nodes, labeled A through E.
A→B(4),A→C(1)B→E(4)C→B(2),C→D(5)D→E(1)
Starting from node A, we want the Dijkstra shortest path to every other node.
Python
Walk through each step.
Notice step 3. B's tentative distance dropped from 4 to 3 because the path A→C→B (total weight 3) is shorter than the direct edge A→B (weight 4). The priority queue naturally surfaces the shorter path first.
That's the invariant in action. By the time B is popped at distance 3, no remaining path can beat it.
if u in settled: continue check handles stale entries. When a node's distance improves, the implementation pushes a new entry rather than updating the existing one. The old entry gets skipped when it's eventually popped.Dijkstra time complexity
With a binary heap (Python's heapq), each push and pop takes O(log V). Every edge triggers at most one push, and every node is popped at most once due to the settled check. That gives O((V + E) log V) total, where V is the number of vertices and E is the number of edges.
With a Fibonacci heap, this improves to O(V log V + E), but the constant factors make Fibonacci heaps rarely practical outside textbooks. In interviews, the binary heap version is what you'll implement and what interviewers expect you to analyse.
Reconstructing the actual shortest path
The implementation above returns distances, but interview problems frequently ask for the path itself. You can't recover the route from distances alone. You need a predecessor map that records which node caused each update during relaxation.
The idea is straightforward. Every time dist[v] improves through node u, store prev[v] = u. When the algorithm finishes, backtrack from the destination through the prev pointers until you reach the source.
Python
Using the graph from the earlier trace, calling dijkstra_with_path(graph, 'A', 'E') returns (7, ['A', 'C', 'B', 'E']). The path goes A→C→B→E with total weight 1 + 2 + 4 = 7, which matches the settled distances from the trace.
One thing to watch for: if the target isn't reachable from the source, dist[target] stays at float('inf') and the prev backtrack produces a path containing only the target node. Check for this before returning the path, especially in problems with disconnected components.
The time and space complexity doesn't change. You're adding one dictionary lookup per relaxation, which is O(1) amortised. The backtrack itself takes O(V) in the worst case, which is dominated by the main loop's O((V + E) log V).
Dijkstra variations you'll encounter in interviews
Textbook Dijkstra runs on an explicit adjacency list with a single source. Interview problems rarely present it that cleanly. They'll disguise the graph, add constraints, or twist the objective. Recognising these variations saves you from reinventing the algorithm each time.
Weighted grids as implicit graphs
Grid problems where each cell has a cost (like a terrain penalty or toll) are weighted graphs in disguise. Each cell connects to its neighbours with edges weighted by the destination cell's cost. You don't need to build an adjacency list first. Run Dijkstra directly on the grid using (row, col) tuples as node identifiers. The priority queue holds (distance, row, col) entries, and you relax in the four cardinal directions.
The trap here is reaching for BFS out of habit. BFS works when every move costs the same. The moment cell costs vary, you need the priority queue to settle cells in the right order.
Multiple sources
Some problems ask for the shortest distance from any of several sources to every other node. You don't need to run Dijkstra once per source. Push all source nodes into the priority queue at distance 0 before the loop starts. The algorithm processes them as if they're all connected to a virtual super-source with zero weight edges. The rest of the logic stays identical.
Modified relaxation conditions
Problems like "cheapest flight with at most K stops" add a constraint to the relaxation step. You can't just check whether dist[v] improves. You also need to track how many edges the current path used. The state in your priority queue expands from (distance, node) to (distance, node, stops_remaining). The core invariant still holds within each layer of the expanded state space, but the settled check needs adjustment because the same node can be settled at different stop counts.
This is where understanding the invariant pays off. You're not memorising a new algorithm for each variation. You're adjusting the state definition while preserving the same greedy guarantee. If the new state space still has nonnegative transitions, Dijkstra's logic applies.
When to use Dijkstra (and when not to)
The decision framework for shortest path algorithms is smaller than you'd expect. Three questions narrow it down: Are there negative weights? Do you need single source or all pairs? Is the graph weighted at all?
O((V + E) log V) over O(VE) from Bellman-FordOne thing that trips people up: some interview problems involve weighted grids where each cell has a traversal penalty, and you need the minimum weight path from one corner to the other. That's Dijkstra. If you've only practised BFS on unweighted grids, the instinct is to force BFS here, and it gives wrong answers. The grid cells carry different weights, so the underlying structure is a weighted graph. Once you recognise that "varying cell weights" maps to "weighted edges," the right algorithm becomes obvious.
Most tutorials don't train that recognition. You learn Dijkstra's steps on an abstract graph but never practise spotting it in disguise. On Codeintuition, the identification lesson for Dijkstra focuses on exactly this. Before you solve any Dijkstra problem, you learn the structural triggers that tell you the pattern applies. The problems that follow (Minimum Cost Path, Cheapest Flights, Teleporter Grid) reinforce those triggers under increasing difficulty.
Common mistakes and how to avoid them
- Stale queue entries: Without the settled check, you'll process nodes multiple times. The algorithm still produces correct distances, but time complexity degrades from
O((V + E) log V)to potentiallyO(V^2 log V)on dense graphs. - Negative weights: Dijkstra won't crash with negative edges. It won't throw an error. It'll silently produce wrong answers for some nodes, and you won't know unless you test against a known solution. If you can't guarantee nonnegative weights, switch to Bellman-Ford.
- Confusing Dijkstra with BFS: Both use a queue, both process nodes by distance. But BFS uses a FIFO queue (correct for unweighted graphs) and Dijkstra uses a priority queue (correct for weighted graphs). Wrong queue type, wrong results.
- Wrong initialisation: The source starts at
0, everything else at infinity. Initialise all nodes to0and the relaxation step won't update anything, because no path "improves" on a distance of zero. - Distances vs paths: Dijkstra computes shortest distances. If the problem wants the actual path (the sequence of nodes), you need a predecessor map built during relaxation. Track which node caused each update, then backtrack from the destination to reconstruct the route.
If you want frame by frame visual walkthroughs across different graph configurations, the understanding lesson on Codeintuition traces every variable at every step. The Graph course covers Dijkstra alongside Bellman-Ford and Floyd-Warshall as part of Codeintuition's learning path ($6.67/month with the annual plan), building the full shortest path decision framework in sequence.
Before reading this, you probably knew the steps. Now you know why each step is correct, where the proof breaks, and which problems are Dijkstra in disguise. Next time you see a problem that doesn't look like a textbook shortest path graph, check for nonnegative weighted edges. If they're there, you already know the algorithm.
Ready to derive shortest path algorithms, not memorize them?
Learn Dijkstra's invariant, Bellman-Ford, and the full shortest path decision framework through step by step visual traces. Start with foundational graph patterns for FREE
O((V + E) log V), where V is the number of vertices and E is the number of edges. Each node gets popped at most once and each edge triggers at most one push, giving the V log V + E log V bound. A Fibonacci heap reduces this to O(V log V + E), but binary heaps remain the interview standard because of lower constant factors and simpler implementation. For dense graphs where E approaches V^2, the difference between heap types matters more, but interviewers rarely test that distinction.O(VE) but correctly handles negative edges and detects negative cycles.O((V + E) log V) versus Bellman-Ford's O(VE). Switch to Bellman-Ford only when the graph has negative weights or when you need to detect negative cycles.