Community JavaScript Snippet
Dedupe by Key With Last-Write-Wins (JS)
dedupeBy(rows, 'id') is the function I copy into every ETL script. Last-write-wins is the right policy 90% of the time; this version also exposes a configurable resolver for the other 10%.
Dedupe by Key With Last-Write-Wins (JS)
dedupeBy(rows, 'id') is the function I copy into every ETL script. Last-write-wins is the right policy 90% of the time; this version also exposes a configurable resolver for the other 10%.
By @camilarao
April 15, 2026
·
Updated May 20, 2026
561 views
5
4.7 (8)
Six lines, but the trick is the choice of Map over a plain object. Map.set overwrites in place yet preserves the original insertion key in iteration order, which is exactly the last-write-wins-but-keep-arrival-shape semantics that ETL scripts want. The keyOf argument accepts either a string ('id') or a function ((r) => r.user.id) so it handles nested keys without forcing the caller to pre-flatten. I have shipped this verbatim in three different services; the named-function version is enough for almost every dedupe case.
The flaw of last-write-wins shows up the day a backfill emits sparse rows: a fresh event with name: null should NOT erase the existing name. The resolver makes the policy explicit instead of pretending naive overwriting is correct. The merge function I ship most often is the one above, where null and undefined are treated as 'no opinion' and the highest-ts wins per field. I keep merge functions tiny and named so the dedupe call site reads as dedupeBy(rows, 'id', mergeProfiles), which is far easier to grep than an inline arrow.
The Map-based version stores every distinct key in memory, which is fine until the input does not fit. The streaming version keeps only the current run, so memory is O(1) regardless of input size, but it requires the input to be sorted by key. In production I get sorted input by piping sort -k1,1 between stages or by reading from a key-ordered store; the dedupe step becomes pure transform. A common sharp edge is forgetting the trailing yield current after the loop ends, which silently drops the final group. Always test with a single-element input.
