Skip to content

feat(md, cli): Stream top-level paragraphs incrementally#805

Merged
JeanMertz merged 2 commits into
mainfrom
md-long-paragraph
Jun 29, 2026
Merged

feat(md, cli): Stream top-level paragraphs incrementally#805
JeanMertz merged 2 commits into
mainfrom
md-long-paragraph

Conversation

@JeanMertz

@JeanMertz JeanMertz commented Jun 28, 2026

Copy link
Copy Markdown
Collaborator

Long assistant paragraphs — a single prose line with no internal newlines — previously stalled the terminal until the whole paragraph arrived, sometimes over a minute for a reasoning block. The buffer now streams a top-level paragraph incrementally as its content grows, so output appears as it is generated rather than only when the block terminates.

Streamed output is byte-for-byte identical to whole-paragraph rendering: the renderer accumulates the paragraph's source, re-renders the growing buffer on each chunk, and prints only the stable committed prefix (everything up to the last wrapped newline), holding the in-progress visual line until a later chunk or the terminator commits it. The concatenation of all printed deltas equals format_terminal_with(full_paragraph, opts) exactly.

Several guards manage the ambiguities that would otherwise break that guarantee:

  • Block-start — the partial-line classifier enters paragraph mode as soon as the line's first non-space character rules out every block starter (header, fence, list, HTML, thematic break, reference, table), so a leading # / ` / | / digit / etc. waits for the newline.
  • Setext threshold — a source-byte threshold (~128) holds the first line or two so a short setext underline can still turn a run into a heading.
  • Inline ground state — the committed prefix never ends inside an open inline construct (emphasis, code span, link, angle bracket).
  • Tables — a block whose header line begins with a pipe is kept whole and never streamed, because a table's column widths depend on its later rows and so its rendering is not prefix-stable.
  • Wrap-in-progress — the renderer holds the in-progress visual line until it is committed.

Byte-identity has two documented exceptions, both almost never produced by LLM output: a setext heading whose content exceeds the threshold streams as prose instead of becoming a heading, and a GFM table whose header has no leading pipe (legal per the GFM spec but unexemplified there) is not detected and may stream with mis-padded columns.

Buffer::with_streaming_paragraphs(false) opts out, restoring whole-paragraph Event::Block emission. Event gains #[non_exhaustive] alongside the new Event::ParagraphChunk variant, so this is the last forced match-site breakage. OrphanedFenceFixup derives its embedded-fence flag from the accumulated paragraph source rather than a single Event::Block, since the inline scanner can split an embedded fence across chunk boundaries.

Tested by a byte-identity harness (streamed vs. whole, with per-shape latency assertions) over an adversarial corpus, plus an expanded set of flowing-markdown fixtures cross-validated against comrak; the buffer-half of the guarantee is checked over the fixture corpus in jp_md.

The design and guards are documented in RFD 089.

Long assistant paragraphs — a single prose line with no internal
newlines — previously stalled the terminal until the whole paragraph
arrived, sometimes over a minute for a reasoning block. The buffer now
streams a top-level paragraph incrementally as its content grows, so
output appears as it is generated rather than only when the block
terminates.

The output is byte-for-byte identical to today's whole-paragraph
rendering: the renderer accumulates the paragraph's source, re-renders
the growing buffer on each chunk, and emits only the stable committed
prefix (everything up to the last wrapped newline), holding the
in-progress visual line until a later chunk or the terminator commits
it. The concatenation of all printed deltas equals
`format_terminal_with(full_paragraph, opts)` exactly.

Four guards manage the known ambiguities: the partial-line classifier
enters paragraph mode as soon as the line's opening token rules out
every block starter (header, fence, list, HTML, thematic break,
reference); a source-byte setext threshold holds the first ~128 bytes
so a short setext underline can still turn a run into a heading; an
inline ground-state scan stops the committed prefix before any open
inline construct (emphasis, code span, link, angle bracket); and the
renderer holds the in-progress visual line. The one accepted exception:
a setext heading whose content exceeds the threshold has already begun
streaming as prose and never becomes a heading — realistic short
headings are unaffected.

`Buffer::with_streaming_paragraphs(false)` opts out, restoring
whole-paragraph `Event::Block` emission. `Event` gains `#[non_exhaustive]`
alongside the new `Event::ParagraphChunk` variant so this is the last
forced match-site breakage. `OrphanedFenceFixup` is updated to derive
its embedded-fence flag from the accumulated paragraph chunks rather
than a single `Event::Block`, since the inline scanner can split an
embedded fence across chunk boundaries.

The design and all four guards are documented in RFD 089.

Signed-off-by: Jean Mertz <git@jeanmertz.com>
Signed-off-by: Jean Mertz <git@jeanmertz.com>
@JeanMertz JeanMertz force-pushed the md-long-paragraph branch from fcec6fc to aaa9664 Compare June 29, 2026 14:45
@JeanMertz JeanMertz merged commit 860884c into main Jun 29, 2026
16 checks passed
@JeanMertz JeanMertz deleted the md-long-paragraph branch June 29, 2026 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant