Speed up .sng header parsing (~3.3×) and add readSongIni for metadata-only scans#16
Open
elicwhite wants to merge 6 commits intoGeomitron:masterfrom
Open
Speed up .sng header parsing (~3.3×) and add readSongIni for metadata-only scans#16elicwhite wants to merge 6 commits intoGeomitron:masterfrom
elicwhite wants to merge 6 commits intoGeomitron:masterfrom
Conversation
Before, in-flight header bytes were kept as an array of chunks, and every header-length probe (3 per file) rebuilt a fresh Uint8Array by scanning every byte of every chunk. `mergeUint8Arrays` then reassembled the full buffer with an O(n^2) `buffers.slice(0, i).reduce(...)` inside a forEach. CPU profiling on 204 .sng files showed these two functions accounting for 57ms of self-time — more than the rest of the header parse combined. This commit stores the header as a single `Uint8Array` that grows geometrically as chunks arrive. Length probes become a `byteCount` comparison + `DataView.getBigUint64`, the full-header parse is a zero-copy `subarray`, and leftover bytes for the next file are a single `slice`. The old `getHeaderBuffer`, `mergeUint8Arrays`, and free-function `readBigUint64LE` helpers are gone. Benchmark (204 .sng files, concurrency=32, node 24.11): before: mean 11.51 ms/file, 2540 files/s after: mean 5.98 ms/file, 4785 files/s delta: -48% latency, +88% throughput Verified end-to-end extraction still produces byte-identical output (MIDI/JPEG/Opus magic bytes intact; generated song.ini well-formed).
`parseSngHeader` was constructing three `binary-parser.Parser` instances on every call. `binary-parser` generates its parse function via `new Function()` on first `.parse()` invocation and caches it on the Parser instance, so rebuilding the Parser per call threw away that cache and paid allocation for the method-chain state each time. Move the three Parser definitions to module scope so they're built (and their parse functions compiled) exactly once per process. Benchmark (204 .sng files, concurrency=32, node 24.11): before: mean 5.98 ms/file, 4785 files/s after: mean 5.51 ms/file, 5197 files/s delta: -8% latency, +9% throughput The bulk of the remaining header-parse cost is inside `binary-parser`'s generated code, not in Parser construction; this is a small-but-free incremental win on top of the header-buffer rewrite.
`generateIniFileText` was called on every parsed header, even when
`config.generateSongIni` was false — in which case its output was only
used inside branches gated by the same flag and otherwise discarded.
This meant every header parse allocated a `TextEncoder`, built the ini
text string, and encoded it into a Uint8Array for nothing on the common
scanner path (browser/CLI tools that only read metadata).
Move the call inside the `if (this.config.generateSongIni)` branch, and
restructure the surrounding `emit('header')` / `readFile` dispatch so
the disabled path no longer runs any ini-related work.
Benchmark (204 .sng files with generateSongIni: false, concurrency=32,
node 24.11):
before: mean 5.51 ms/file, 5197 files/s
after: mean 4.97 ms/file, 5648 files/s
delta: -10% latency, +9% throughput
Behavior is unchanged when `generateSongIni: true`; the ini text is
still generated and the song.ini file is still emitted first.
When the header's final read() overshoots into the first file's bytes, `readFile` stored those bytes in `leftoverFileChunk` and the ReadableStream's `start` handler unmasked (XOR'd) them synchronously at construction time. Metadata-only consumers (browser/CLI scanners that only read the `header` event and then `fileStream.cancel()` each file) still paid the full XOR cost on ~40–60 KB of leftover per file before their cancel had a chance to run. CPU profiling on 204 .sng files showed this unmasker loop as the top self-time hotspot at 13ms — 35% of wall time. Move the leftover enqueue into `pull` and short-circuit before any source read. If the consumer cancels before pulling, the leftover is never unmasked. Full-extraction consumers see identical behavior: pull is called immediately on a non-cancelled stream, so the leftover is still processed first. Benchmark (204 .sng files, concurrency=32, node 24.11): before: mean 4.97 ms/file, 5648 files/s after: mean 3.32 ms/file, 8303 files/s delta: -33% latency, +47% throughput Verified end-to-end extraction still produces byte-identical output for a multi-file .sng (MIDI/JPEG/Opus magic bytes intact, sizes unchanged, generated song.ini well-formed).
Add a single new export that reads a .sng's header from a stream,
cancels the source, and returns the song.ini key/value data as a plain
`{ [key: string]: string }` — the same object that lives at
`SngHeader.metadata` on the full header.
This is the right shape for library indexers / song.ini scanners: no
`EventEmitter`, no file-content `ReadableStream`, no chunk-unmasker
allocation, no `readFile` dispatch. The function awaits `reader.read()`
in a tight loop until the header is complete, parses it, and cancels
the source.
Benchmark (204 .sng files, concurrency=32, warm page cache, node 24.11):
mean throughput
SngStream events 2.23 ms 13,510 files/s
readSongIni 1.63 ms 18,820 files/s
delta -27% +39%
Wrapping a direct `fs.open`+`fd.read(1MB)` into a single-chunk
ReadableStream fed into `readSongIni` lands in the same range after the
cold allocation settles — the Web-stream adapter is not the bottleneck,
so no Node-specific helper is needed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A series of focused, independently-reviewable perf improvements to the
.sngheader parse path, plus one small additive API (readSongIni) for the common "I only want the song.ini data" use case.Each commit is scoped to one change, with profiling rationale and before/after benchmarks in the commit body. Nothing breaks existing behavior —
SngStream's public contract is unchanged.Cumulative impact (204
.sngfiles, concurrency=32, node 24.11):SngStream(cold cache)readSongIni)Commits, in review order
Replace chunked header buffer with single growing buffer— the biggest win. The oldgetHeaderBufferscanned every byte of every chunk on each of three length probes, andmergeUint8Arraysused an O(n²).slice(0, i).reduce(...)inside.forEach. Replaced with a single geometrically-growingUint8Array. −48% latency, +88% throughput.Hoist binary-parser Parser definitions to module scope—parseSngHeaderwas constructing threebinary-parser.Parserinstances on every call, throwing awaybinary-parser's internal compiled-function cache. Moved to module scope. −8% latency, +9% throughput.Skip generateIniFileText when generateSongIni is disabled—generateIniFileTextwas building + encoding ini text on every header parse even when the result was discarded. Moved inside thegenerateSongIni: truebranch. −10% latency, +9% throughput.Defer leftover-chunk unmask from start() to pull()— when the header's finalread()overshoots into the first file's bytes,readFile'sReadableStream.startwas synchronously XOR-unmasking those leftover bytes for ~40–60 KB per file, before a metadata-only consumer'scancel()could land. Deferred to the firstpull()so cancel-before-pull skips the XOR entirely. −33% latency, +47% throughput for metadata-only scanners; identical behavior for full-extraction consumers.Add readSongIni() for metadata-only scanners— a single new top-level export. Reads just enough of a stream to parse the header, cancels the source, returns{ [key: string]: string }(same shape asSngHeader.metadata). SkipsEventEmitter, fileReadableStreamconstruction, chunk-unmasker allocation, andreadFiledispatch — all unnecessary when the caller only wants the header. −27% latency, +39% throughput vs. usingSngStream+ cancel for the same workload.Add changeset for header-parse perf improvements— bumps the package minor (new public API).Testing
tsc --noEmitcleantsupbuild succeeds.sngwithgenerateSongIni: trueproduces byte-identical output at each commit (MIDI/JPEG/Opus magic bytes intact, sizes match, generated song.ini well-formed)readSongInireturns the samemetadataobject shape asSngHeader.metadata