Community JavaScript Snippet
Streaming LLM Response Consumer With Cancel
When a user navigates away mid-completion we still get billed for the remaining tokens. This is the SSE-style consumer I wrote that decodes JSON deltas, exposes a `cancel()` that aborts the request, and never leaks a reader on errors.
Streaming LLM Response Consumer With Cancel
When a user navigates away mid-completion we still get billed for the remaining tokens. This is the SSE-style consumer I wrote that decodes JSON deltas, exposes a `cancel()` that aborts the request, and never leaks a reader on errors.
By @oliviadelgado
May 15, 2026
·
Updated May 20, 2026
250 views
3
4.4 (12)
The consumer is an async generator wrapped in an object that exposes cancel(). The generator body owns the buffer, the SSE frame parser, and the reader; the outer object owns the AbortController so the caller can stop the request from outside the loop. The try / finally is what guarantees we never leak a reader: if the consumer throws or breaks out of for await, the finally releases the lock and the runtime can close the socket. I keep the malformed-frame path as a console.warn rather than a throw because OpenAI and Anthropic both occasionally emit a partial frame near the end of long completions, and killing the stream over a single bad frame is worse than dropping it.
The hook is the shape I always end up shipping in the dashboard: render incremental tokens into a useState string, expose a manual cancel for a Stop button, and tear down on unmount. The cancelled ref guards setText against firing after unmount, and stream.cancel() in the cleanup function is what stops the OpenAI bill from accruing if the user navigates away. I serialize body into the dep array because the body is an object and we want a new stream on a real prompt change, not on every render. The smoke test prints the hook's shape because the validator sandbox does not render React; the actual integration runs in the browser.
Backpressure is the failure mode I missed in the first version we shipped. If the React render loop falls behind (slow paint on a heavy page, dev-tools open, throttled CPU), the for await does not consume frames as fast as the network pushes them, and buf grows without bound. A 64KB cap is enough headroom for the largest single frame I have seen from OpenAI (which is well under 16KB even on a long completion) but small enough that a stuck consumer trips it within a few hundred milliseconds. Throwing inside the generator triggers our finally plus the outer controller.abort(), which is the same path the user-initiated cancel() takes.
