Community Article

Bitmask DP: The Trick for Subset Problems

Encoding subsets as integers, iterating subsets in O(2^n), and the TSP example that demonstrates the 2^n * n^2 state space. When n is at most 20, bitmask DP is a cheat code.

Bitmask DP: The Trick for Subset Problems

Encoding subsets as integers, iterating subsets in O(2^n), and the TSP example that demonstrates the 2^n * n^2 state space. When n is at most 20, bitmask DP is a cheat code.

bitmask-dp

dynamic-programming

bit-manipulation

combinatorics

algorithms

By @antonmorgan

December 13, 2025

Updated May 18, 2026

1,126 views

Rate

Q: When you say bitmask DP, do you actually mean dynamic programming?

Yes. It's just DP where the state includes a subset of items, and the subset is encoded as an integer with one bit per item. The DP transitions and recurrence are identical to other DP styles; the only twist is that "the set of items I have committed to so far" lives in a single 32-bit number instead of an array of booleans, and that lets you index your memo table by an int instead of a Set.

Q: Why bother with bits when a Set<Integer> would also work?

Three reasons. First, an int is much smaller than a Set and lives on the stack; you can have a million of them in a memo table without hitting GC pressure. Second, set operations like "add this element", "remove this element", "is this element in the set", "iterate over all subsets" are all bit operations, which are roughly 100x faster than HashSet operations. Third, hashing an int is the cheapest possible hash, so memo-table lookups are basically free. For problems where the subset is the whole point, bitmask DP is not a stylistic choice; it is the only representation fast enough to fit in time limits.

Q: When does it stop working?

When n > 30, more or less. The state space has 2^n entries, and 2^30 is a billion, which fits in memory if each entry is a single byte but is too slow to actually populate in time-limit problems. In practice the rule of thumb is n <= 20 for problems that also have a per-state factor (like n or n^2), and n <= 25 for pure subset enumeration. Past that, you need a different DP structure.

This Q&A intro is the working-engineer version of "why bitmask DP exists". The rest of the article is the canonical examples (subset sum, TSP) and the bit tricks I have actually used in production and in competitive programming.

The four bit tricks you must internalise

Before any DP, learn these four operations. They are the basis of every bitmask algorithm.

// Set bit i (add element i to the subset)
mask | (1 << i)

// Clear bit i (remove element i from the subset)
mask & ~(1 << i)

// Test bit i (is element i in the subset?)
(mask >> i) & 1   // or: (mask & (1 << i)) !== 0

// Iterate over all 2^n subsets of an n-element ground set
for (let mask = 0; mask < (1 << n); mask++) { ... }

Three more advanced ones that come up:

// Count set bits (population count)
function popcount(x) {
    let c = 0;
    while (x) { c += x & 1; x >>>= 1; }
    return c;
}
// Modern engines: use the built-in if available; in V8 this is fast.

// Iterate over all subsets of a given mask (this is the subtle one)
for (let sub = mask; sub > 0; sub = (sub - 1) & mask) { ... }
// And don't forget to handle sub = 0 separately if needed.

// Lowest set bit (the position of the rightmost 1)
Math.log2(mask & -mask) | 0

The submask enumeration (sub - 1) & mask is the most surprising one. It walks all 2^k submasks of a mask with k set bits, in O(2^k) time. The proof is short but unintuitive; trust the formula, run it once on mask = 0b1011 and watch it produce 1011, 1010, 1001, 1000, 0011, 0010, 0001. Useful for problems where you need to consider all ways to partition a set.

The gateway problem: subset sum via bitmask

Given a set of n integers, can you pick a subset that sums to a target T? The textbook DP is O(n * T); the bitmask version is O(2^n). For small n the bitmask version is competitive and pedagogically clearer.

function canFormSum(nums, target) {
    const n = nums.length;
    for (let mask = 0; mask < (1 << n); mask++) {
        let sum = 0;
        for (let i = 0; i < n; i++) {
            if (mask & (1 << i)) sum += nums[i];
        }
        if (sum === target) return true;
    }
    return false;
}

This is brute-force enumeration of all 2^n subsets, computing each sum in O(n). Total work O(n * 2^n). For n = 20, that is about 20 million operations, which is fine. For n = 25, 800 million, still feasible. For n = 30, 30 billion, too slow.

The pure DP version (dp[i][s] = true if first i items can sum to s) is O(n * T), which can be much smaller if T is bounded. For typical problem sizes (T up to 10,000), the DP wins. Bitmask wins when T is huge but n is small. Knowing which one to reach for is the practical skill.

The canonical bitmask DP: travelling salesman

TSP is the textbook bitmask DP. Given n cities and an n x n distance matrix, find the shortest tour that visits every city exactly once and returns to the start. The naive enumeration of n! tours is O(n!), which is infeasible for n > 11. The bitmask DP brings it to O(2^n * n^2), which is feasible for n up to about 18-20.

The state is (visited, last) where visited is a bitmask of cities already on the path and last is the most recently visited city. The recurrence: dp[visited][last] = min over (prev) of dp[visited \ {last}][prev] + dist[prev][last].

function tsp(dist) {
    const n = dist.length;
    const FULL = (1 << n) - 1;
    // dp[mask][last] = shortest path that visits the cities in `mask`, ending at `last`
    const dp = Array.from({ length: 1 << n }, () => new Array(n).fill(Infinity));
    dp[1][0] = 0;   // start at city 0; only city 0 is visited

    for (let mask = 1; mask < (1 << n); mask++) {
        if (!(mask & 1)) continue;   // we always start at city 0
        for (let last = 0; last < n; last++) {
            if (!((mask >> last) & 1)) continue;       // last must be in mask
            if (dp[mask][last] === Infinity) continue;
            for (let next = 0; next < n; next++) {
                if ((mask >> next) & 1) continue;       // next not yet visited
                const newMask = mask | (1 << next);
                const candidate = dp[mask][last] + dist[last][next];
                if (candidate < dp[newMask][next]) {
                    dp[newMask][next] = candidate;
                }
            }
        }
    }

    // close the tour: return from any last city to 0
    let best = Infinity;
    for (let last = 1; last < n; last++) {
        best = Math.min(best, dp[FULL][last] + dist[last][0]);
    }
    return best;
}

The state space is 2^n * n. The transition for each state considers up to n next-cities. Total work O(2^n * n * n) = O(2^n * n^2).

For n = 18: 2^18 * 18^2 ~ 85 million operations, runs in roughly a second. For n = 20: 2^20 * 20^2 ~ 420 million operations, runs in 4-5 seconds. For n = 22: 2^22 * 22^2 ~ 2 billion operations, infeasible without heavy constant optimisation.

Note that the state count grows by a factor of 2 for every additional city while the per-state work grows quadratically. The 2x dominates fast. This is why n = 20 is the practical ceiling.

TSP bitmask DP state evolution (n = 4, sketch)

mask = 0001 (only city 0):  dp[0001][0] = 0
mask = 0011 (cities 0,1):   dp[0011][1] = dist[0][1]
mask = 0111 (cities 0,1,2): dp[0111][1] = dist[0][2] + dist[2][1]
                           dp[0111][2] = dp[0011][1] + dist[1][2]
mask = 1111 (all visited):  dp[1111][3] = min over prev paths + dist[prev][3]
close tour:                 best = min over k of dp[1111][k] + dist[k][0]

A comparison table I keep on bitmask vs alternatives

Problem family	Bitmask DP feasible?	Alternative	When alternative wins
Subset sum, n <= 20	Yes (O(2^n) or O(n * 2^n))	DP O(n*T)	T is small
Subset sum, n <= 30	Marginal (meet-in-the-middle: O(2^(n/2)))	DP O(n*T)	Always, if T is bounded
TSP, n <= 18	Yes (O(2^n * n^2))	Branch-and-bound, approximation	n is large or approximation acceptable
Assignment problem (n people to n jobs)	Yes for n <= 20	Hungarian algorithm O(n^3)	n is large
Hamiltonian path in DAG, n <= 20	Yes (O(2^n * n^2))	None better for general graphs	None
Graph coloring, k colors, small n	Sometimes	Backtracking with constraint propagation	Larger n
SteinerTree (small terminal set)	Yes (Dreyfus-Wagner)	Approximation	Approximation acceptable
Dominating set (NP-hard generally)	For n <= 20	None polynomial	Larger n forces approximation

The pattern: bitmask DP is the polynomial-time exact algorithm for problems that are generally NP-hard, IF the relevant set size is small. Once n exceeds 25 or so, you fall back to branch-and-bound, approximation, or heuristics.

Three optimisations that have actually helped me

When the basic bitmask DP is close to the time limit but not quite under it, these three transformations have saved me:

Iterate masks in order of popcount. Process all masks with 1 bit, then all masks with 2 bits, etc. This lets you stream through the DP in stages, which can help cache locality. Not always faster, but sometimes a 30% win on tight problems.
Skip masks that violate problem constraints early. If your problem requires "city 0 is always in the mask", prune masks where bit 0 is unset (if (!(mask & 1)) continue; in the TSP code above is exactly this). The savings are usually a factor of 2 per pruned bit.
Profile-guided constant elimination. The TSP loop above has three nested loops, with the innermost over next. Reorder them, hoist dp[mask][last] out of the inner loop, and use typed arrays (Int32Array instead of regular arrays). On large n, these constant-factor optimisations can be a 5-10x improvement, which is the difference between fitting and not fitting.

A fourth move: meet-in-the-middle for subset enumeration when n is in the 30-40 range. Split the items into two halves of size n/2, enumerate all subsets of each half (2^(n/2) per side), and combine. Total work O(2^(n/2) * f(n)) for some polynomial f, which is dramatically less than O(2^n). This is more of a different technique than an optimisation, but it lives in the same family of bitmask-driven exact algorithms.

Where bitmask DP is heading

The class of problems solvable by bitmask DP (subset-style problems with n small enough) is well-defined and has not grown in 30 years. What has changed is the size of n we can comfortably handle, which has crept from n = 12 (1980s) to n = 22 (modern hardware with careful constant optimisation). The next step probably comes from quantum and parallel hardware: the inherent parallelism of subset enumeration maps well onto SIMD and GPU, and there are research papers showing 50-100x speedups on TSP-style DP using GPU implementations. Whether that translates to common interview or production work is another question; for now, bitmask DP is a CPU-bound, single-threaded technique, and the practical ceiling stays around n = 20-22.

The other forward direction is in heuristic-augmented bitmask DP: combining the exact bitmask transition with a learned cutoff or a learned ordering of the bits. Modern combinatorial optimisation papers occasionally explore reinforcement-learning-augmented branch-and-bound that trims the bitmask state space by orders of magnitude. The exact algorithm is still the foundation; the heuristic is the speedup. If you internalise the bitmask DP cleanly, the research papers on top of it become readable, and that is the surface area I want my future self to be able to engage with. The four bit tricks at the top of this article are what make all of that possible. Learn them once, and the rest of bitmask DP collapses into a familiar recurrence on a small mask space.

Back to Articles