docs: add note on filtering prompt-elicited inline tags (e.g. <thinking>) before TTS#976
Conversation
- 📝 docs(learn): add 'Removing Custom Inline Tags' section to text-to-speech page with PatternPairAggregator + MatchAction.REMOVE snippet and a Tip preferring native extended thinking - 📝 docs(api-reference): cross-link the new section from the Anthropic service Notes Addresses pipecat-ai/pipecat#4901
|
|
||
| - **Prompt caching**: When `enable_prompt_caching` is enabled, Anthropic caches repeated context to reduce costs. Cache control markers are automatically added to the most recent user messages. This is most effective for conversations with large system prompts or long conversation histories. | ||
| - **Extended thinking**: Enabling thinking increases response quality for complex tasks but adds latency. When `type="enabled"`, you must provide a `budget_tokens` value (minimum 1024 with current models). Extended thinking is disabled by default. | ||
| - **Prompt-elicited `<thinking>` tags**: If your system prompt asks the model to reason inside inline tags rather than enabling extended thinking, that reasoning is ordinary text and will be spoken by TTS. Prefer the `thinking` parameter; for inline tags you deliberately keep, see [Removing Custom Inline Tags](/pipecat/learn/text-to-speech#removing-custom-inline-tags). |
There was a problem hiding this comment.
This note makes sense.
Rather than adding a new subsection to the learning guides, it might make sense to just point the developer directly to the PatternPairAggregator. In the PatternPairAggregator, we can add a new, generic section about removing tags, which we can link to. Something like this would do the trick:
### Removing Tagged Content
To drop content from the text stream entirely, register a pattern with `MatchAction.REMOVE`. The tags and everything between them are removed before reaching downstream processors — nothing is spoken by TTS and nothing lands in the conversation context. This is useful when your prompt elicits inline tags whose content is not meant for the user, such as reasoning tags (e.g., `<thinking>...</thinking>`) or annotations intended for other processors:
```python
from pipecat.processors.aggregators.llm_text_processor import LLMTextProcessor
from pipecat.utils.text.pattern_pair_aggregator import MatchAction, PatternPairAggregator
pattern_aggregator = PatternPairAggregator()
pattern_aggregator.add_pattern(
type="thinking",
start_pattern="<thinking>",
end_pattern="</thinking>",
action=MatchAction.REMOVE,
)
# Set the aggregator on an LLMTextProcessor
llm_text_processor = LLMTextProcessor(text_aggregator=pattern_aggregator)
# add the llm_text_processor to your pipeline after the llm and before the tts
# llm -> llm_text_processor -> tts
Because this filters the text stream itself, it works with any LLM provider and any custom inline tag.
There was a problem hiding this comment.
Done in b117685. Used your text as a new "Removing Tagged Content" example on the PatternPairAggregator page, dropped the learn-guide section, and pointed the anthropic note at the new anchor.
| # llm -> llm_text_processor -> tts | ||
| ``` | ||
|
|
||
| ### Removing Custom Inline Tags |
There was a problem hiding this comment.
From the other comment, I think we'll want to remove this.
- 🗑️ remove(learn): drop the new text-to-speech section per review - 📝 docs(api-reference): add 'Removing Tagged Content' usage example to pattern-pair-aggregator - 🔄 refactor(api-reference): point the Anthropic note at the new anchor Review feedback from pipecat-ai#976
markbackman
left a comment
There was a problem hiding this comment.
LGTM! Thanks for taking care of this 🙇
Summary
Follow-up to pipecat-ai/pipecat#4901. When a system prompt asks the LLM to reason inside inline
<thinking>...</thinking>tags with extended thinking off, the reasoning streams back as plain text and gets spoken by TTS. @markbackman's investigation there concluded this belongs in docs rather than provider code: prefer native extended thinking, and strip deliberately-elicited inline tags at the text layer. This adds that note.What changed
pipecat/learn/text-to-speech.mdxwith thePatternPairAggregator+MatchAction.REMOVEsnippet from the issue, plus a Tip pointing at native extended thinking as the preferred path when the goal is genuine reasoning.api-reference/server/services/llm/anthropic.mdxcross-linking the new section, since that's where someone debugging spoken thinking text with Anthropic looks first.Verification
LLMTextProcessor(text_aggregator=...),add_pattern(..., action=MatchAction.REMOVE), andLLMThoughtTextFramerouting inAnthropicLLMService.npx prettierclean on both files.