Skip to content

feat: Add YouTube caption import and export#303

Open
yunrongy424-oss wants to merge 2 commits into
hyperaudio:mainfrom
yunrongy424-oss:codex/youtube-caption-import-export
Open

feat: Add YouTube caption import and export#303
yunrongy424-oss wants to merge 2 commits into
hyperaudio:mainfrom
yunrongy424-oss:codex/youtube-caption-import-export

Conversation

@yunrongy424-oss
Copy link
Copy Markdown

Add YouTube automatic captions import/export support for issue #160.

The editor now parses word-timestamped YouTube WebVTT and SRV3/TimedText XML into the existing timed transcript spans, preserving per-word timing instead of distributing full cue timings across words.

YouTube Caption Formats

Adds a pure converter for inline VTT timestamps and YouTube XML <p>/<s> and <text> shapes, plus export helpers that write word-timestamped VTT and XML from the current transcript.

Editor Menu

Adds File menu entries for YouTube VTT/XML export and YouTube caption import alongside the existing Hyperaudio, SRT, VTT, and Deepgram actions.

Validated locally with the converter unit test, JavaScript syntax checks, diff whitespace check, and a browser smoke check.

Refs #160

Add YouTube TimedText XML and word-timestamped VTT conversion for the editor so automatic caption files preserve per-word timing when imported and can be exported from the transcript view.

Refs hyperaudio#160

Co-Authored-By: Codex <noreply@openai.com>
@yunrongy424-oss yunrongy424-oss marked this pull request as ready for review May 28, 2026 03:41
@MyTH-zyxeon
Copy link
Copy Markdown

Nice coverage for the YouTube captions half of #160. One small import-path robustness issue I noticed:

getHyperaudioJsonForExport() guards the export buttons when #hypertranscript is missing, but ImportYoutubeCaptions.confirmYoutubeCaptions() writes to hypertranscript.innerHTML unconditionally and then immediately updates #hyperplayer-vtt. If the menu is opened from a view/state where either element is not mounted yet, the import path will throw after the user confirms a file instead of showing the same kind of friendly guard used by export.

Could you add a guard before mutating hypertranscript / #hyperplayer-vtt and cover that branch in a small test or browser-smoke note? That would make the new YouTube import action safer without changing the converter scope.

Show the transcript-view alert when the YouTube caption import controls are used without the transcript or VTT track mounted. This avoids throwing after a caption file is selected while keeping the normal import path unchanged.

Refs hyperaudio#160

Co-Authored-By: Codex <noreply@openai.com>
@yunrongy424-oss
Copy link
Copy Markdown
Author

Thanks for the review. I pushed 26ee958 to add the transcript-view guard before the YouTube caption import mutates #hypertranscript or #hyperplayer-vtt.

Validation run:

  • node --check js\hyperaudio-lite-editor-export.js
  • node test\youtube-caption-converter.test.js
  • browser smoke: selected a VTT file after removing #hypertranscript; import showed the transcript-view alert and did not write a VTT data URL
  • browser smoke: normal VTT import still populated the transcript and #hyperplayer-vtt data URL
  • git diff --check

@MyTH-zyxeon
Copy link
Copy Markdown

Follow-up review-assist pass for #160 / PR #303 after the 26ee958 guard fix.

Thanks for adding the transcript-view guard and smoke notes. I re-read the current diff and the remaining risk looks less like UI wiring now and more like converter round-trip correctness across YouTube caption variants:

  1. It would be useful to add a round-trip fixture for imported YouTube VTT/XML -> Hyperaudio spans -> exported YouTube VTT/XML. Right now the tests assert selected parser/export snippets, but not that a realistic import can be exported without shifting word timings, paragraph boundaries, or escaped text.

  2. The XML path is supporting both SRV3-style <p>/<s> and legacy <text start dur> formats. A regression fixture with mixed entity text (&amp;, <, quotes), punctuation, empty/whitespace-only segments, and a paragraph with no explicit duration would help prove the regex parser does not silently drop or flatten real YouTube payloads.

  3. The VTT word-timestamp path should probably cover multi-line cue text and cues with settings / identifiers. The current parser skips cue metadata well enough, but a fixture with a cue id line, cue settings, two text lines, and word timestamps would lock that behavior down.

  4. Export should document and test its timestamp units at the Hyperaudio JSON boundary. normalizeJsonWords() assumes word.start / word.end are seconds, while the generated spans store data-m / data-d in milliseconds. A small test using the actual htmlToJson() output shape would catch any seconds-vs-ms mismatch before this lands.

  5. For XML export, escapeHtml() covers text nodes, but the test should include words containing &, <, >, and quotes so the exported XML can be parsed back safely.

No live service call is needed here; these can stay as offline fixture tests around youtube-caption-converter.js plus the existing browser smoke notes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants