Three algorithms, three different runtime profiles, three different things they tolerate. Dijkstra runs in O((V + E) log V) with a binary-heap priority queue. Bellman-Ford runs in O(V * E). Floyd-Warshall runs in O(V^3) and gives you all-pairs shortest paths in one shot. On a graph with V = 10,000 nodes and E = 50,000 edges, those are roughly 700,000 operations, 500,000,000 operations, and 1,000,000,000,000 operations respectively. Six orders of magnitude separate the slowest from the fastest. Picking the wrong one is the kind of mistake that turns a 50-millisecond endpoint into a 5-minute one, and the only reason it does not bite more often is that we usually pick Dijkstra by default and get away with it.
The stance of this article is that Dijkstra is the right default, and the other two are escape hatches for specific problems. Bellman-Ford is the escape hatch when you have negative edge weights. Floyd-Warshall is the escape hatch when you genuinely need all-pairs distances and the graph is small enough that V^3 is cheaper than running Dijkstra from every vertex. There is no fourth case where one of these is "better than Dijkstra in general". If you ever feel like you are choosing between them at random, the rest of this article is the decision tree I run in my head.
Algorithm 1: Dijkstra (the default)
Dijkstra solves single-source shortest paths on a graph with non-negative edge weights. Its core invariant is that when a node is dequeued from the priority queue, its distance is final. The proof relies on the non-negativity assumption: if you popped node u with distance d, then any unprocessed path to u would have to pass through a node currently in the queue with a distance at least d, plus a non-negative edge, so its total would be at least d. Negative edges break that argument.
With a binary-heap priority queue, the runtime is O((V + E) log V). The log V comes from the heap operations: each node is inserted and extracted once (V log V), and each edge can trigger a decreaseKey (E log V). With a Fibonacci heap, decreaseKey is amortised O(1) and the total drops to O(E + V log V), but Fibonacci heaps have a constant-factor overhead that makes them slower than binary heaps in practice for any graph that fits in memory. I have shipped Dijkstra in production maybe a dozen times. I have used a binary heap every single time.
Note two implementation details that matter in practice. First, the lazy-deletion pattern (if (d > dist.get(u)) continue) is faster than implementing a real decreaseKey; we just push duplicates into the heap and skip stale entries when they pop out. Second, the algorithm produces final distances when nodes are dequeued, not when they are first relaxed. If you stop early because you only care about the distance to a target node, stop on dequeue, not on relax.
The failure mode of Dijkstra is silent on graphs with negative edges. It will produce an answer; the answer will be wrong. There is no exception, no panic, no NaN. The output looks plausible. This is why "are there negative edges?" is the first question I ask before writing Dijkstra, and if the answer is "maybe, depending on user input", I either validate the inputs or switch to Bellman-Ford.
Algorithm 2: Bellman-Ford (when negatives are real)
Bellman-Ford solves the same problem as Dijkstra (single-source shortest paths) but tolerates negative edge weights. It can also detect negative cycles, which is the real reason to use it: a graph with a negative cycle has no well-defined shortest path because you can loop the cycle infinitely many times to drive the distance to negative infinity.
The algorithm is brutally simple: relax every edge, repeat V - 1 times. After V - 1 iterations, every shortest path of at most V - 1 edges has been found (and shortest paths cannot have more than V - 1 edges in a graph without negative cycles). A V-th iteration that still finds a shorter distance proves a negative cycle exists.
The runtime is O(V * E). On the example graph from the open (V = 10,000, E = 50,000), that is 500 million operations, which is plausibly a few seconds. Compared to Dijkstra's roughly 700,000, Bellman-Ford is about 700x slower. So you only reach for it when negative edges or negative-cycle detection are actually required.
Real cases where I have wanted Bellman-Ford: arbitrage detection on a currency-exchange graph (negative weights model logarithmic exchange rates that compound to a profit), some flow-with-cost problems where the cost function has a residual-graph component that can go negative, and certain constraint-satisfaction problems that reduce to shortest paths with mixed-sign weights. I have not used it in everyday product engineering. The cases that need it are real but specialised.
The variant called SPFA (Shortest Path Faster Algorithm) speeds Bellman-Ford up in practice by maintaining a queue of nodes whose distances changed. Average-case it is closer to O(k * E) for some small k, often 2-3. Worst-case it is still O(V * E). The reason SPFA is not the default everywhere is that competitive-programming judges have exploited its worst-case to embarrass solutions, so using it in interview or production code is a small risk.
Algorithm 3: Floyd-Warshall (all-pairs, small V)
Floyd-Warshall solves all-pairs shortest paths: for every pair of vertices, what is the shortest distance between them? The classical formulation runs in O(V^3) time and O(V^2) space, and is shockingly simple: a triple-nested loop with a one-line relaxation.
The outer loop iterates over an "intermediate vertex" k. After iteration k, dist[i][j] holds the length of the shortest path from i to j that uses only vertices 0..k as intermediates. After all n iterations, every vertex is allowed as an intermediate, so dist[i][j] is the unrestricted shortest path. The recurrence is dist[i][j] = min(dist[i][j], dist[i][k] + dist[k][j]), which is one of the cleanest DP recurrences in graph algorithms.
The O(V^3) cost is the wall. On V = 1,000, that is a billion operations, which is roughly a second of CPU on a modern machine. On V = 5,000, it is 125 billion, several minutes. On V = 10,000, it is a trillion, hours. The practical limit for Floyd-Warshall is around V = 1,000-2,000 in JavaScript or Python, a bit higher in Java or C++ with a tight loop.
If the graph is denser (E close to V^2), then V Dijkstra runs from each source give you all-pairs in O(V * (V + E) log V) = O(V * V^2 log V) = O(V^3 log V), which is worse than Floyd-Warshall. If the graph is sparse (E ~ V), V Dijkstras give O(V * V log V) = O(V^2 log V), which is better than Floyd-Warshall's V^3. The crossover is around graph density ~V^2 / log V. In practice, I use Floyd-Warshall when V <= 500 and the implementation simplicity is worth the constant factor. Above 500 nodes, I run Dijkstra from each source.
A comparison table I keep on the inside of a notebook
| Dijkstra | Bellman-Ford | Floyd-Warshall | |
|---|---|---|---|
| Problem solved | Single-source shortest path | Single-source shortest path | All-pairs shortest path |
| Time | O((V+E) log V) | O(V * E) | O(V^3) |
| Space | O(V) | O(V) | O(V^2) |
| Negative edges? | No (silent failure) | Yes | Yes |
| Negative cycles? | No | Detects | Detects |
| Practical V limit | Millions | Tens of thousands | Hundreds (low thousands) |
| Implementation complexity | Medium (needs heap) | Low | Lowest (triple loop) |
| Reaches for | Routing, navigation, network distance | Arbitrage, constraint systems | Game-map distance tables, small graphs |
The table is the cheat sheet. Negative weights collapse the choice to two algorithms; small-and-dense graphs collapse to Floyd-Warshall; everything else is Dijkstra.
The decision tree I run in my head
When I face a shortest-path problem in code review or in an interview, I run this in order:
- Are weights non-negative? If yes, Dijkstra. Done.
- Are there negative weights, but no negative cycles, and I need single-source? Bellman-Ford.
- Do I need all-pairs distances, and is V small (under ~500-1000)? Floyd-Warshall.
- All-pairs but V is large? Run Dijkstra from each source. Cache the matrix if reused.
- Edges are unweighted (all weights equal)? BFS. Don't reach for any of these three; BFS is
O(V + E)and beats them all. - The graph is a DAG? Do topological-sort + relax.
O(V + E)regardless of negative edges, because there are no cycles to worry about.
The most common honest mistake I see is reaching for Dijkstra when BFS suffices. Equal-weight graphs are everywhere (social networks, simple road graphs, level-graph problems), and BFS is faster, simpler, and uses less memory. The second most common mistake is reaching for Floyd-Warshall "because it's simple" on a graph with 50,000 nodes; it will hang for hours.
When the right answer is "none of these"
Real navigation and routing systems do not use Dijkstra naively. Google Maps uses a hierarchical road graph with contraction hierarchies, which precomputes shortcuts so query time is sub-millisecond. OSRM and other open-source routers use the same family. The classical algorithms in this article are the textbook foundation; production systems layer caching, hierarchical decomposition, and bidirectional search on top.
A-star is the relevant variant for graphs where you have a heuristic estimate of distance to the target. It is structurally Dijkstra with a different priority key: instead of g(n) (cost from source), you use g(n) + h(n) where h is a heuristic. If h is admissible (never overestimates the true remaining cost), A-star is optimal and typically much faster than Dijkstra. Game pathfinding, robotics planning, and route-optimisation systems lean heavily on A-star.
For extremely large graphs where a single-source Dijkstra is too slow, look at landmark-based algorithms (ALT, contraction hierarchies, transit-node routing). These are research-grade techniques; you do not implement them from scratch in an interview, but knowing they exist is the difference between someone who has read about shortest paths and someone who has actually shipped them.
Default to Dijkstra. Escape only when forced.
The one-sentence summary, restated as a claim: Dijkstra is the right answer for almost every shortest-path problem you will see, and Bellman-Ford and Floyd-Warshall are escape hatches that you use when the problem statement forces you, not when you feel like switching. Negative weights force Bellman-Ford. Genuinely small all-pairs problems force Floyd-Warshall. Everything else is Dijkstra, and if I see a candidate write Floyd-Warshall on a 10,000-node graph or run Dijkstra on a graph with negative edges, I know the decision tree above hasn't been internalised. The runtime gap between these algorithms is huge; picking right is more important than picking elegant. Pick Dijkstra by default, escape only when forced, and you will be correct on 95% of the problems you ever touch.
