Q: When you say bitmask DP, do you actually mean dynamic programming?
Yes. It's just DP where the state includes a subset of items, and the subset is encoded as an integer with one bit per item. The DP transitions and recurrence are identical to other DP styles; the only twist is that "the set of items I have committed to so far" lives in a single 32-bit number instead of an array of booleans, and that lets you index your memo table by an int instead of a Set.
Q: Why bother with bits when a Set<Integer> would also work?
Three reasons. First, an int is much smaller than a Set and lives on the stack; you can have a million of them in a memo table without hitting GC pressure. Second, set operations like "add this element", "remove this element", "is this element in the set", "iterate over all subsets" are all bit operations, which are roughly 100x faster than HashSet operations. Third, hashing an int is the cheapest possible hash, so memo-table lookups are basically free. For problems where the subset is the whole point, bitmask DP is not a stylistic choice; it is the only representation fast enough to fit in time limits.
Q: When does it stop working?
When n > 30, more or less. The state space has 2^n entries, and 2^30 is a billion, which fits in memory if each entry is a single byte but is too slow to actually populate in time-limit problems. In practice the rule of thumb is n <= 20 for problems that also have a per-state factor (like n or n^2), and n <= 25 for pure subset enumeration. Past that, you need a different DP structure.
This Q&A intro is the working-engineer version of "why bitmask DP exists". The rest of the article is the canonical examples (subset sum, TSP) and the bit tricks I have actually used in production and in competitive programming.
The four bit tricks you must internalise
Before any DP, learn these four operations. They are the basis of every bitmask algorithm.
Three more advanced ones that come up:
The submask enumeration (sub - 1) & mask is the most surprising one. It walks all 2^k submasks of a mask with k set bits, in O(2^k) time. The proof is short but unintuitive; trust the formula, run it once on mask = 0b1011 and watch it produce 1011, 1010, 1001, 1000, 0011, 0010, 0001. Useful for problems where you need to consider all ways to partition a set.
The gateway problem: subset sum via bitmask
Given a set of n integers, can you pick a subset that sums to a target T? The textbook DP is O(n * T); the bitmask version is O(2^n). For small n the bitmask version is competitive and pedagogically clearer.
This is brute-force enumeration of all 2^n subsets, computing each sum in O(n). Total work O(n * 2^n). For n = 20, that is about 20 million operations, which is fine. For n = 25, 800 million, still feasible. For n = 30, 30 billion, too slow.
The pure DP version (dp[i][s] = true if first i items can sum to s) is O(n * T), which can be much smaller if T is bounded. For typical problem sizes (T up to 10,000), the DP wins. Bitmask wins when T is huge but n is small. Knowing which one to reach for is the practical skill.
The canonical bitmask DP: travelling salesman
TSP is the textbook bitmask DP. Given n cities and an n x n distance matrix, find the shortest tour that visits every city exactly once and returns to the start. The naive enumeration of n! tours is O(n!), which is infeasible for n > 11. The bitmask DP brings it to O(2^n * n^2), which is feasible for n up to about 18-20.
The state is (visited, last) where visited is a bitmask of cities already on the path and last is the most recently visited city. The recurrence: dp[visited][last] = min over (prev) of dp[visited \ {last}][prev] + dist[prev][last].
The state space is 2^n * n. The transition for each state considers up to n next-cities. Total work O(2^n * n * n) = O(2^n * n^2).
For n = 18: 2^18 * 18^2 ~ 85 million operations, runs in roughly a second.
For n = 20: 2^20 * 20^2 ~ 420 million operations, runs in 4-5 seconds.
For n = 22: 2^22 * 22^2 ~ 2 billion operations, infeasible without heavy constant optimisation.
Note that the state count grows by a factor of 2 for every additional city while the per-state work grows quadratically. The 2x dominates fast. This is why n = 20 is the practical ceiling.
A comparison table I keep on bitmask vs alternatives
| Problem family | Bitmask DP feasible? | Alternative | When alternative wins |
|---|---|---|---|
| Subset sum, n <= 20 | Yes (O(2^n) or O(n * 2^n)) | DP O(n*T) | T is small |
| Subset sum, n <= 30 | Marginal (meet-in-the-middle: O(2^(n/2))) | DP O(n*T) | Always, if T is bounded |
| TSP, n <= 18 | Yes (O(2^n * n^2)) | Branch-and-bound, approximation | n is large or approximation acceptable |
| Assignment problem (n people to n jobs) | Yes for n <= 20 | Hungarian algorithm O(n^3) | n is large |
| Hamiltonian path in DAG, n <= 20 | Yes (O(2^n * n^2)) | None better for general graphs | None |
| Graph coloring, k colors, small n | Sometimes | Backtracking with constraint propagation | Larger n |
| SteinerTree (small terminal set) | Yes (Dreyfus-Wagner) | Approximation | Approximation acceptable |
| Dominating set (NP-hard generally) | For n <= 20 | None polynomial | Larger n forces approximation |
The pattern: bitmask DP is the polynomial-time exact algorithm for problems that are generally NP-hard, IF the relevant set size is small. Once n exceeds 25 or so, you fall back to branch-and-bound, approximation, or heuristics.
Three optimisations that have actually helped me
When the basic bitmask DP is close to the time limit but not quite under it, these three transformations have saved me:
Iterate masks in order of
popcount. Process all masks with 1 bit, then all masks with 2 bits, etc. This lets you stream through the DP in stages, which can help cache locality. Not always faster, but sometimes a 30% win on tight problems.Skip masks that violate problem constraints early. If your problem requires "city 0 is always in the mask", prune masks where bit 0 is unset (
if (!(mask & 1)) continue;in the TSP code above is exactly this). The savings are usually a factor of 2 per pruned bit.Profile-guided constant elimination. The TSP loop above has three nested loops, with the innermost over
next. Reorder them, hoistdp[mask][last]out of the inner loop, and use typed arrays (Int32Arrayinstead of regular arrays). On largen, these constant-factor optimisations can be a 5-10x improvement, which is the difference between fitting and not fitting.
A fourth move: meet-in-the-middle for subset enumeration when n is in the 30-40 range. Split the items into two halves of size n/2, enumerate all subsets of each half (2^(n/2) per side), and combine. Total work O(2^(n/2) * f(n)) for some polynomial f, which is dramatically less than O(2^n). This is more of a different technique than an optimisation, but it lives in the same family of bitmask-driven exact algorithms.
Where bitmask DP is heading
The class of problems solvable by bitmask DP (subset-style problems with n small enough) is well-defined and has not grown in 30 years. What has changed is the size of n we can comfortably handle, which has crept from n = 12 (1980s) to n = 22 (modern hardware with careful constant optimisation). The next step probably comes from quantum and parallel hardware: the inherent parallelism of subset enumeration maps well onto SIMD and GPU, and there are research papers showing 50-100x speedups on TSP-style DP using GPU implementations. Whether that translates to common interview or production work is another question; for now, bitmask DP is a CPU-bound, single-threaded technique, and the practical ceiling stays around n = 20-22.
The other forward direction is in heuristic-augmented bitmask DP: combining the exact bitmask transition with a learned cutoff or a learned ordering of the bits. Modern combinatorial optimisation papers occasionally explore reinforcement-learning-augmented branch-and-bound that trims the bitmask state space by orders of magnitude. The exact algorithm is still the foundation; the heuristic is the speedup. If you internalise the bitmask DP cleanly, the research papers on top of it become readable, and that is the surface area I want my future self to be able to engage with. The four bit tricks at the top of this article are what make all of that possible. Learn them once, and the rest of bitmask DP collapses into a familiar recurrence on a small mask space.
