Community JavaScript Snippet
Streaming Aggregations With a Single Pass (JS)
Welford's online algorithm for mean and variance, plus a 30-line streaming p99 estimator. The version I use when the data does not fit in memory or arrives over WebSocket.
Streaming Aggregations With a Single Pass (JS)
Welford's online algorithm for mean and variance, plus a 30-line streaming p99 estimator. The version I use when the data does not fit in memory or arrives over WebSocket.
By @ryancastillo
May 9, 2026
·
Updated May 18, 2026
851 views
26
4.2 (12)
The traditional formula variance = mean(x^2) - mean(x)^2 is mathematically correct and numerically unstable: when mean(x^2) and mean(x)^2 are close in magnitude, you lose most of your precision to subtractive cancellation. Welford's update sidesteps that by tracking m2, the sum of squared deltas from the running mean, which never has to compute the dangerous difference. Memory is O(1) regardless of sample size, which is what makes it suitable for an unbounded stream. I keep this class in every metrics-collection layer; the summary() method is what gets pushed to the dashboard once a second.
Algorithm R is the classic reservoir-sampling rule: replace at a random slot with probability size / seen, which keeps every observation's chance of survival uniform. The trade-off vs an exact percentile is real but small in practice; 500 samples is enough to estimate p99 within roughly 1% on most production traffic. For tighter SLOs you bump the size; the linear-in-size sort cost in quantile is fine because you call it once per metric flush, not per push. I have used this in a websocket-fed dashboard where the full latency stream was ~50k events per second; storing all of them was infeasible, but a 500-slot reservoir gave a perfectly readable graph.
The wrapper turns 'one stream of events, one global stat' into 'many sub-streams keyed by a function, with per-key stats'. Memory is O(K * sizeof(RunningStats)) where K is the number of distinct keys, so you want a bounded key space (route names, region codes) not an unbounded one (user ids). When the key space is unbounded you switch to a sketch like Count-Min or Misra-Gries; that pattern is far more code than fits in this snippet. The topK shape is a compact dashboard payload; I emit it once a second from the metrics worker and the front-end renders the table without any client-side aggregation.
