Community Article

Graph Algorithms 101

Why most production graph work is modeling, not algorithm; the two traversals you actually use; and the representation table that drives every memory tradeoff.

Graph Algorithms 101

Why most production graph work is modeling, not algorithm; the two traversals you actually use; and the representation table that drives every memory tradeoff.

graph-algorithms

bfs

dfs

graph-representation

algorithms

By @antonmorgan

April 4, 2026

Updated May 18, 2026

739 views

4.4 (9)

Most engineers learn the word "graph" before they learn what counts as one. Then a year into the job they ship a feature that needs to find connected users, or trace a permission chain, or answer "can A reach B through these dependencies", and they realize they have been working with graphs for months without naming them. That gap, between the abstract definition and the moment you recognize a graph in your own code, is what this article is about.

My claim, stated up front so you can argue with it: in product work, picking the algorithm is the easy part. The work that actually decides whether your feature ships in a week or three is modeling. What are the nodes? What are the edges? Are the edges directed? Weighted? Do they live in memory or behind a database query? Get that part right and the algorithm is mostly mechanical. Get it wrong and the cleanest BFS in the world will not save you.

A graph in two sentences

A graph is a set of nodes (vertices) and a set of edges connecting pairs of those nodes. Edges can be directed (a one-way relationship like Twitter follows) or undirected (a two-way relationship like Facebook friendships), and they can carry weights (cost, distance, capacity) or be unweighted.

If you draw users connected by friendships, web pages connected by hyperlinks, cities connected by roads, packages connected by dependencies, or git commits connected by parent pointers, you are drawing a graph. The mathematics underneath is the same in every case. Only the modeling changes.

The two representations and when each one wins

Before you write any algorithm, you have to decide how the graph lives in memory. There are two real options.

Adjacency list. A map from each node to the list of its neighbors. Memory is O(V + E) where V is the vertex count and E is the edge count. Iterating the neighbors of a node is O(degree(node)), which is exactly the work you usually want to do.

Adjacency matrix. A V-by-V grid where matrix[i][j] is non-zero if there is an edge from i to j. Memory is O(V^2). Checking whether a specific edge exists is O(1).

# Adjacency list (Python dict of sets)
graph = {
    "A": {"B", "C"},
    "B": {"A", "D"},
    "C": {"A"},
    "D": {"B"}
}

# Same graph as adjacency matrix (rows/cols indexed A=0,B=1,C=2,D=3)
matrix = [
    [0, 1, 1, 0],
    [1, 0, 0, 1],
    [1, 0, 0, 0],
    [0, 1, 0, 0]
]

The choice between them is the first real decision and most teams get it right by accident. The table below is the one I keep open when designing.

Property	Adjacency List	Adjacency Matrix
Memory	O(V + E)	O(V^2)
Iterate neighbors of v	O(degree(v))	O(V)
Test edge (u,v) exists	O(degree(u))	O(1)
Add an edge	O(1)	O(1)
Add a vertex	O(1)	O(V) (resize)
Best fit	Sparse graphs (E << V^2)	Dense graphs, frequent edge tests

Real product graphs are almost always sparse. A social network with a million users has billions of possible edges and tens of millions of actual ones. An adjacency matrix at that size is 1 terabyte. An adjacency list is a few hundred megabytes. The matrix only makes sense for small dense graphs (say, V under a few thousand) or when you are doing matrix-shaped operations like all-pairs shortest paths.

BFS and DFS, the two workhorses

Ninety percent of the graph code I write in product work is one of two traversals. Breadth-First Search (BFS) explores neighbors before deeper nodes, level by level. Depth-First Search (DFS) explores as far as possible down each branch before backing up. Same input, different visit order, different things they are good at.

from collections import deque

def bfs(graph, start):
    visited = {start}
    queue = deque([start])
    while queue:
        node = queue.popleft()
        # do something with node
        for neighbor in graph[node]:
            if neighbor not in visited:
                visited.add(neighbor)
                queue.append(neighbor)

def dfs(graph, start, visited=None):
    if visited is None:
        visited = set()
    visited.add(start)
    # do something with start
    for neighbor in graph[start]:
        if neighbor not in visited:
            dfs(graph, neighbor, visited)

Both run in O(V + E) time. Both use O(V) extra memory for the visited set. The structural difference is one line: BFS uses a queue, DFS uses a stack (or recursion, which is an implicit stack). That single change is the entire reason to pick one over the other.

Use BFS when you need shortest path in an unweighted graph (the first time you reach a node is via the fewest edges), or when you want to explore by distance from the start. Use DFS when you need to detect cycles, find connected components, do topological sort, or solve any backtracking-shaped problem. In a binary tree, in-order traversal is a DFS variant; level-order is a BFS.

A real grid problem walkthrough

The graph problem I see most often in real code does not look like a graph. Take this one: count the number of islands in a 2D grid where 1 means land and 0 means water, and four-directional adjacency counts as connected.

grid = [
    [1, 1, 0, 0, 0],
    [1, 1, 0, 0, 1],
    [0, 0, 0, 1, 1],
    [0, 0, 0, 0, 0],
    [1, 0, 1, 0, 1]
]

The grid is the graph. Each cell is a node. Each cell has up to four edges to its orthogonal neighbors. "Counting islands" is counting connected components. The algorithm is: for each unvisited land cell, run a DFS or BFS to mark its whole component, and increment a counter once per traversal kicked off.

def count_islands(grid):
    if not grid: return 0
    rows, cols = len(grid), len(grid[0])
    visited = [[False] * cols for _ in range(rows)]

    def dfs(r, c):
        if r < 0 or r >= rows or c < 0 or c >= cols:
            return
        if visited[r][c] or grid[r][c] == 0:
            return
        visited[r][c] = True
        dfs(r+1, c); dfs(r-1, c); dfs(r, c+1); dfs(r, c-1)

    count = 0
    for r in range(rows):
        for c in range(cols):
            if grid[r][c] == 1 and not visited[r][c]:
                dfs(r, c)
                count += 1
    return count

For the grid above, the answer is 5. This pattern (treat the grid as a graph, run a connectivity traversal, count the components) is exactly how you solve flood-fill, region-coloring, image segmentation, and a dozen related problems. Once you see it once, you see it everywhere.

When the algorithm is not BFS or DFS

Three cases I run into where the workhorse traversals are not enough.

Shortest path with weights. Use Dijkstra's algorithm if all edge weights are non-negative, or Bellman-Ford if you have negative weights and need to detect negative cycles. Both build on BFS but use a priority queue (Dijkstra) or relaxation passes (Bellman-Ford) instead of a plain queue. Memory and runtime are higher; do not reach for them unless weights actually matter.

Topological sort. A DAG (directed acyclic graph) ordering where every edge points from earlier in the list to later. Build systems use this for compile order. Package managers use it for install order. The standard algorithm is Kahn's BFS-based version or a DFS post-order version. Either works; I use Kahn's because it makes cycle detection a side effect.

Minimum spanning tree. Connect all nodes with the lowest total edge weight. Kruskal's (sort edges, use union-find) and Prim's (Dijkstra-shaped, but on edges) are the two standard algorithms. This shows up in network design, clustering, and Boruvka-style parallel algorithms.

Most product engineers go years without writing any of these by hand. The library or the database does it for you. But you should recognize the shape of the problem when you see it, because picking up the wrong algorithm (running BFS on a weighted graph and pretending the weights do not exist) is the kind of bug that ships and looks correct on the test data.

The modeling step that decides everything

This is the section I want to spend the longest on, because it is the part that most graph-algorithm articles skip and the part that decides whether your feature ships.

When a product problem looks like a graph, the questions I work through, in order, are:

What are the nodes? Users, files, commits, components, geographic locations, anything else? Pick the noun that the rest of the model will hang off.
What are the edges? What relationship between two nodes counts as an edge? Be specific. "Friendship" is an edge type. "Mutual friend of" is a different edge type. "Has interacted with in the last 30 days" is yet another, and you might need all three for different features.
Are the edges directed? Twitter follows are directed (A follows B does not imply B follows A). Facebook friendships are not. Mixing this up is the source of half the bugs I see.
Are the edges weighted? A weight is any number associated with the edge. Distance, latency, cost, capacity, strength of relationship. If you do not have weights, pretend you do (every edge has weight 1) and use the unweighted algorithms.
Where does the graph live? In memory? In a database? Across a network? The algorithm choice changes drastically. An in-memory BFS over 100k nodes is milliseconds. A BFS that hits a database for each neighbor lookup is seconds at best.

Spending an hour on those five questions before writing any algorithm code has saved me weeks of rework. The temptation is to start coding immediately because the algorithms are well-known. Resist it. The algorithms are well-known given the model. The model is the part that is yours.

A story I keep coming back to

A few years ago I was on a team that needed to detect circular ownership in a permissions system. The setup was: groups can own other groups, users can own groups, groups can own users, and the rules said no cycles allowed. The first version someone wrote was a recursive function that walked from each entity and looked for itself, which was O(V^2) worst case and timed out on the production graph after about ninety seconds.

I rewrote it as a single DFS-based cycle detection over the whole ownership graph, O(V + E). It dropped the runtime from ninety seconds to about two hundred milliseconds. The change in code was maybe forty lines. The win was not picking a fancier algorithm, it was recognizing that the problem was "detect a cycle in a directed graph" rather than "for each node, check if you can reach yourself". Once the modeling shifted, the implementation was a textbook DFS.

The lesson I took away has nothing to do with graphs specifically. It is that most algorithm problems in product code are problems of recognition. The hard part is naming what kind of problem you have. The easy part, the part the algorithms textbook covers, is what you do once you know.

Back to Articles