Community JavaScript Snippet
CSV With Quoted Commas: The 30-Line Parser
split(',') gets you fired on the first row that contains a comma inside quotes. Here is a state-machine CSV parser in 30 lines that handles quoted commas, escaped quotes, and CRLF endings.
CSV With Quoted Commas: The 30-Line Parser
split(',') gets you fired on the first row that contains a comma inside quotes. Here is a state-machine CSV parser in 30 lines that handles quoted commas, escaped quotes, and CRLF endings.
By @gracechoi
December 14, 2025
·
Updated May 20, 2026
296 views
8
4.3 (11)
A state machine is the only correct way to parse CSV. The two states are 'in quotes' and 'not in quotes', and every character either advances state or appends to the current field. The trickiest case is the doubled-quote escape ("" inside a quoted field becomes a literal "), which a regex-based parser usually botches. I deliberately do NOT support backslash escapes because RFC 4180 does not, and supporting both produces ambiguous parses. The CRLF handling at the bottom is required for any CSV that came from a Windows tool or from Excel.
The wrapper splits responsibilities: low-level cell extraction stays a pure state machine, and the type-coercion layer is a per-column dictionary that the caller controls. Optional and explicit beats automagic type inference, which is the choice papaparse and PapaParse-likes make and which you regret the day a column with values '01' and '02' becomes numbers. The ?? '' for missing trailing cells matters: short rows happen all the time when an Excel user deletes the trailing comma; treating them as empty strings is friendlier than throwing.
The streaming version preserves the same state machine, but the parser state itself (row, field, inQuotes) IS the buffer: an unterminated tail just lives in this.field until the next chunk arrives. That removes the manual slice-the-buffer dance most handwritten streaming parsers fumble. I split mid-field and mid-CRLF in the demo because those are the two boundary cases that trip up handwritten parsers; if your tests cover only chunk-aligned-to-row-end inputs, the parser passes them and breaks in production. For files larger than ~1GB I do reach for a real CSV library, but knowing this state machine is what lets me audit which one to trust.
