Community Article

Topological Sort, Three Ways I Have Actually Used It

Build-system DAGs, course-prereq UI ordering, microservice startup choreography. Three production stories from the same Kahn's algorithm boilerplate.

Topological Sort, Three Ways I Have Actually Used It

Build-system DAGs, course-prereq UI ordering, microservice startup choreography. Three production stories from the same Kahn's algorithm boilerplate.

topological-sort
graphs
directed-graphs
algorithms
graph-algorithms
vikramokoro

By @vikramokoro

January 1, 2026

·

Updated May 18, 2026

1,013 views

30

4.3 (13)

The first time I shipped topological sort to production was a Monday morning at a startup, and the trigger was that our build system was deadlocking. We had a graph of about 40 internal packages, written in TypeScript and Python and Rust, and the in-house build orchestrator was naively trying to compile them in alphabetical order. That worked until we added a circular dependency between two utility packages by accident, the build hung on npm run build for 28 minutes, and the on-call engineer (me, that day) had to figure out why. The fix was a five-minute topological sort. The lesson was that this algorithm is not just for textbook problems; it is the load-bearing primitive in three or four production systems I have worked on, and once you have written it once you reach for it constantly.

This article is three of those stories, in the order I encountered them in my career, with the actual code shape we ended up with each time. The algorithm is the same in all three. The interesting part is the surrounding code: how the graph is built, how cycles are detected, and what "sorted" means in each context.

A reminder of what topological sort does

Given a directed acyclic graph, topological sort produces a linear ordering of the vertices such that for every directed edge u -> v, u appears before v in the ordering. There are two standard implementations, and I use them in different contexts:

  • Kahn's algorithm (BFS-based). Maintain in-degree counts. Repeatedly take a vertex with in-degree 0, output it, and decrement the in-degree of its successors. This naturally detects cycles: if you cannot empty the graph, the remaining vertices form one or more cycles.
  • DFS-based. Run DFS, push vertices onto a stack as they finish (post-order). The reversed stack is the topological order.

Both are O(V + E). I default to Kahn's because the cycle detection is built in and the iterative implementation is easier to debug. The DFS version is shorter when you already have DFS infrastructure in place. The three production stories that follow all use Kahn's.

function topoSortKahn(adjacency) {
    const inDegree = new Map();
    for (const node of adjacency.keys()) inDegree.set(node, 0);
    for (const [, neighbors] of adjacency) {
        for (const n of neighbors) inDegree.set(n, (inDegree.get(n) || 0) + 1);
    }
    const queue = [];
    for (const [node, deg] of inDegree) if (deg === 0) queue.push(node);
    const order = [];
    while (queue.length) {
        const node = queue.shift();
        order.push(node);
        for (const next of adjacency.get(node) || []) {
            inDegree.set(next, inDegree.get(next) - 1);
            if (inDegree.get(next) === 0) queue.push(next);
        }
    }
    if (order.length !== adjacency.size) {
        // cycle detected; the unsorted nodes are the culprits
        const cycleNodes = [...inDegree.entries()].filter(([, d]) => d > 0).map(([n]) => n);
        throw new Error(`Cycle detected involving: ${cycleNodes.join(', ')}`);
    }
    return order;
}

This is roughly the version I have copied between four codebases now. The cycle-detection error message is the part that earns its keep at 2am: it tells you which packages, courses, or services are tangled, not just "it broke".

Story 1: building 40 packages in dependency order

The build orchestrator I mentioned at the open. The graph was packages, the edges were "package A imports package B", and the topological order was the build order. Naive build was alphabetical; correct build is topological. The fix was a 50-line addition to the orchestrator that read each package.json, built the dependency graph, and ran topoSortKahn over it.

The interesting part was the cycle-detection error message. We deliberately threw a hard error with the cycle members listed, because the temptation when you find a cycle is to break it silently with some heuristic, and silent cycle-breaking is exactly how you end up with packages that import stale versions of each other. The error message reads like Build cycle detected: utils-shared <-> utils-format. Resolve before continuing. and the engineer who introduced the cycle gets to fix it, not future-you.

A detail I learned the hard way: parallel builds also benefit from the topological order, but you do not strictly need a sorted list, you need a sequence of "frontiers". The frontier is the set of all packages with no remaining unbuilt dependencies; you build the entire frontier in parallel, then advance to the next. The boilerplate is identical to Kahn's, except instead of queue.shift() you drain the queue into a parallel-build worker pool, await all of them, then enqueue the next frontier. We saw a 4x speedup on the 40-package graph by parallelising frontier-by-frontier instead of sequentially walking the topological order.

async function buildInDependencyOrder(adjacency, buildOne) {
    const inDegree = new Map();
    for (const node of adjacency.keys()) inDegree.set(node, 0);
    for (const [, neighbors] of adjacency) {
        for (const n of neighbors) inDegree.set(n, (inDegree.get(n) || 0) + 1);
    }
    let frontier = [...inDegree.entries()].filter(([, d]) => d === 0).map(([n]) => n);
    let built = 0;
    while (frontier.length) {
        await Promise.all(frontier.map(buildOne));
        built += frontier.length;
        const next = [];
        for (const node of frontier) {
            for (const succ of adjacency.get(node) || []) {
                inDegree.set(succ, inDegree.get(succ) - 1);
                if (inDegree.get(succ) === 0) next.push(succ);
            }
        }
        frontier = next;
    }
    if (built !== adjacency.size) throw new Error('Cycle detected in build graph');
}

Same algorithm. Different consumption pattern. This frontier-parallel variant is the version I copy when the work-per-node is expensive (compilation, image processing, document rendering) and the dependency graph is wider than it is deep.

Story 2: course prerequisites in a learning-platform UI

The second time I reached for the algorithm was building a course-listing page where the user sees prerequisites laid out before each course. Mathematics II has Mathematics I as a prerequisite. Linear Algebra has Mathematics II. Discrete Math has Mathematics I. The platform had about 200 courses and the prereq graph was a forest of small DAGs, deepest depth maybe six.

The naive UI was a flat alphabetical list of courses. Users complained because they could not tell which courses they could start with vs which ones required other courses. The fix was to render the courses in topological order, so a course never appeared before its prerequisites in the listing. The algorithm was identical to the build-system case; the only differences were the consumer (a React component) and the data source (a Postgres table joined to a course-prereq join table).

What made this story interesting was a second use of the same sort: "what courses can I take given the courses I have already completed?". This is again a topological sort, but starting from the set of completed courses as roots and walking forward. The frontier of "unlockable" courses is exactly the courses whose prereqs are all in the completed set, and that is the in-degree-0 set of the residual graph. Same algorithm, two different views:

function unlockableCourses(allCourses, prereqEdges, completed) {
    const completedSet = new Set(completed);
    const inDegree = new Map();
    for (const c of allCourses) {
        if (completedSet.has(c)) continue;
        inDegree.set(c, 0);
    }
    for (const [course, prereq] of prereqEdges) {
        if (completedSet.has(course)) continue;
        if (completedSet.has(prereq)) continue;
        inDegree.set(course, (inDegree.get(course) || 0) + 1);
    }
    return [...inDegree.entries()].filter(([, d]) => d === 0).map(([c]) => c);
}

This is just the in-degree-0 frontier of the topological graph after removing the completed nodes. It runs in O(V + E) and the platform recomputed it on every course-listing page-load, which was fine because V + E was a few thousand. If the graph were 100,000 courses we would memoise per user.

The lesson here was that topological sort gives you more than an order; it gives you a partition into levels (the frontiers from the parallel-build version) and that partition is itself useful information for the UI. The flat ordering goes into the listing page; the level structure goes into a tree-shaped "learning path" diagram.

Story 3: starting microservices in dependency order

The third time was a year ago. Our staging environment was deploying about 30 microservices via a custom orchestrator on top of Kubernetes, and the startup order was simply "alphabetical, with retries". This worked because every service had health checks and would retry connecting to its dependencies until they were ready. It worked badly because half the services would log connection errors for 90 seconds during startup, polluting our logs and tripping our alerting on "unusual error rate".

The fix was, again, a topological sort over the service-dependency graph. We declared dependencies in each service's deployment manifest (dependsOn: [auth-service, redis]) and the orchestrator built the graph, ran Kahn's, and started services in batches. The first batch was the in-degree-0 services (databases, caches, the auth service, basically everything stateful or foundational). After they reported healthy, the orchestrator advanced to the next frontier (services that depended only on the first batch), and so on.

The interesting wrinkle in this case was that "healthy" is not the same as "started". A service can be started but still warming up (loading caches, running migrations). The orchestrator needed a definition of "ready" before advancing the frontier. We settled on "the service's /healthz endpoint returns 200 AND the service has been up for at least 5 seconds", and that was enough to make startup feel choreographed instead of chaotic. Logs went from 14,000 connection errors per deployment to 12.

This case also exposed something subtle about cycles. Microservices in production sometimes have circular dependencies that cannot be removed (service A's auth checks call service B, which uses service A's user-lookup endpoint). The fix is not to topologically sort, because there is no valid order; the fix is to bring those services up in a relaxed mode (connection retries enabled, requests buffered) before advancing the frontier. The orchestrator detected the cycle, logged it as a warning, and switched to a fallback startup mode for those services. We marked the cycle in the service registry so the platform team could revisit it later.

Where I would not use a topological sort

When the graph has cycles by design, topological sort fails by definition; you want strongly-connected components instead, which Tarjan's or Kosaraju's algorithm gives you in linear time, and then you topologically sort the SCCs as a meta-DAG. The microservice case above is an example where I could have gone further with SCCs but did not, because three SCCs of size 2 was not worth the complexity over a hardcoded fallback list.

When the order needs to be optimal in some weighted sense (job scheduling with deadlines, critical-path analysis), topological sort gives you a feasible order but not the optimal one. You combine topological sort with DP over the DAG, which is the classic shape for longest-path-in-DAG and earliest-completion-time problems. Critical-path scheduling in project management is exactly this combination, and I have used it once for an ETL pipeline where the goal was "finish the whole batch as fast as possible" given parallelism and varying task durations.

When the problem looks like topological sort but actually needs constraint propagation (sudoku, register allocation), you are in the wrong family. Those want backtracking or constraint solvers.

Three follow-up reads

If you want to go deeper on any of the three stories, I keep these three references in my notes:

  • For the build-system case, the Bazel build system is the production-grade reference. Its query and rdeps commands are essentially user-exposed topological-sort tooling on a project-scale dependency graph, and reading its design doc gives you the vocabulary for incremental builds, action graphs, and "hermetic" execution.
  • For course-prereq UIs, I would point at any of the structured-curriculum learning platforms; Khan Academy's mastery-challenge ordering and Coursera's specialisation-prereqs both expose topological-sort behavior in their UI even if they do not call it that. The interesting detail is how they handle partial completion (level-frontier per user, recomputed on each visit).
  • For microservice choreography, look at the helm chart dependencies field and at Kubernetes operator patterns. Helm doesn't strictly topo-sort installations across charts, but operators in production almost always have a startup-ordering primitive that is Kahn's algorithm dressed up as a controller-reconciliation loop.

The algorithm is fifty lines. The systems built on it are anything but, and what makes them useful is recognising that the build orchestrator, the course UI, and the microservice startup are the same problem in different costumes.

Back to Articles