Community Problem
Minimum Genetic Mutation
Difficulty: Medium
Transform a gene string to a target one mutation at a time, where each intermediate must be in the bank.
Minimum Genetic Mutation
Transform a gene string to a target one mutation at a time, where each intermediate must be in the bank.
By @lucasmoreau
December 13, 2025
·
Updated May 20, 2026
592 views
19
4.4 (11)
I had this on a Google L4 phone screen, and the candidate flailed because they were trying to model the search as a tree of all 4^8 strings instead of treating the bank as a graph and BFS-ing from start to end. The algorithm collapses the moment you reframe "each mutation" as "each edge in a graph where nodes are gene strings and edges connect those that differ by one character". The catalog covers BFS on grids, but it skipped this string-state BFS twin.
Minimum Genetic Mutation
A gene string can be represented by an 8-character long string, with choices from 'A', 'C', 'G', and 'T'.
Suppose we need to investigate a mutation from a gene string startGene to a gene string endGene where one mutation is defined as one single character changed in the gene string.
For example, "AACCGGTT" --> "AACCGGTA" is one mutation.
There is also a gene bank that records all the valid gene mutations. A gene must be in bank to make it a valid gene string.
Given the two gene strings startGene and endGene and the gene bank, return the minimum number of mutations needed to mutate from startGene to endGene. If there is no such a mutation, return -1.
Note: the starting point is assumed to be valid, so it might not be included in the bank.
Examples
Example 1:
- Input:
startGene = "AACCGGTT",endGene = "AACCGGTA",bank = ["AACCGGTA"] - Output:
1 - Explanation: One mutation transforms the start to the end and the result is in the bank.
Example 2:
- Input:
startGene = "AACCGGTT",endGene = "AAACGGTA",bank = ["AACCGGTA", "AACCGCTA", "AAACGGTA"] - Output:
2 - Explanation: One BFS path:
AACCGGTT -> AACCGGTA -> AAACGGTA. Two mutations.
Example 3:
- Input:
startGene = "AAAAACCC",endGene = "AACCCCCC",bank = ["AAAACCCC", "AAACCCCC", "AACCCCCC"] - Output:
3 - Explanation: BFS finds
AAAAACCC -> AAAACCCC -> AAACCCCC -> AACCCCCC. Three mutations.
Example 4:
- Input:
startGene = "AACCGGTT",endGene = "AACCGGTA",bank = [] - Output:
-1 - Explanation: With no entries in the bank, no mutation is valid.
Constraints
0 <= bank.length <= 10.startGene.length == endGene.length == bank[i].length == 8.startGene,endGene, andbank[i]consist of only the characters'A','C','G', and'T'.
Follow-up
The 4 * 8 = 32 possible single-character mutations from any gene make this BFS very tight: each level expands at most 32 candidates, so the algorithm runs in O(L * 32 * |bank|) where L is the number of mutations needed. For a length-8 alphabet-of-4 string, the state space is bounded at 4^8 = 65536, so even the worst case is fast.
Solution
Hints
Solution Walkthrough
Approach: BFS Over Gene Strings (O(N * L * 4) per level)
Model the problem as a graph: nodes are gene strings, and there is an edge between two strings iff they differ by exactly one character AND both are in the bank (or one is the start). BFS from startGene finds the shortest path (minimum mutations) to endGene.
Algorithm
- If
startGene == endGene, return0. - Put
bankinto a hash set; reject up front ifendGeneis not in it. - Initialize
visited = {startGene}andqueue = [startGene]. - BFS level by level:
- For each
curin the current level, try mutating every position to each ofA,C,G,T. - If the candidate is in
bankand not visited, enqueue it. If it equalsendGene, return the current step count + 1.
- If BFS exhausts, return
-1.
Why It Works
Each BFS level corresponds to one mutation step. The first time endGene is dequeued (or generated), the level number is the minimum mutations. Confining neighbor expansion to bank enforces the validity constraint at every intermediate step. Visited-tracking guarantees each gene string is enqueued at most once, bounding total work.
Complexity
Sample
start=AACCGGTT, end=AAACGGTA, bank=[AACCGGTA, AACCGCTA, AAACGGTA]
Level 1 from AACCGGTT:
Try every position * 3 mutations -> 24 candidates
Of those, AACCGGTA, AACCGCTA are in bank -> enqueue both
Level 2 from {AACCGGTA, AACCGCTA}:
AACCGGTA -> AAACGGTA in bank, target! return 2Metric Value
Time O(|bank| * L * 4) where L is the gene length, since each gene generates 3L mutations
Space O(|bank|) for the hash set and visitedPitfalls
- Forgetting the bank constraint. Without checking
bank_set, BFS would walk through arbitrary intermediate strings. The problem requires every step to be in the bank. - Using BFS on a tree of all 4^8 strings. That state space is large enough to matter on weak hardware, and irrelevant: only the bank-valid strings are reachable.
- Returning the depth instead of mutations. The mutation count is
level - 0if the start counts as level 0. Make sure to return0whenstartGene == endGene(no mutations needed).
Solution Code
