pipecat-ai · markbackman · Jul 3, 2026 · Jul 2, 2026 · Jul 3, 2026
diff --git a/api-reference/server/services/llm/anthropic.mdx b/api-reference/server/services/llm/anthropic.mdx
@@ -201,6 +201,7 @@ await worker.queue_frame(
 
 - **Prompt caching**: When `enable_prompt_caching` is enabled, Anthropic caches repeated context to reduce costs. Cache control markers are automatically added to the most recent user messages. This is most effective for conversations with large system prompts or long conversation histories.
 - **Extended thinking**: Enabling thinking increases response quality for complex tasks but adds latency. When `type="enabled"`, you must provide a `budget_tokens` value (minimum 1024 with current models). Extended thinking is disabled by default.
+- **Prompt-elicited `<thinking>` tags**: If your system prompt asks the model to reason inside inline tags rather than enabling extended thinking, that reasoning is ordinary text and will be spoken by TTS. Prefer the `thinking` parameter; for inline tags you deliberately keep, see [Removing Tagged Content](/api-reference/server/utilities/text/pattern-pair-aggregator#removing-tagged-content).
 - **Custom clients**: You can pass custom Anthropic client instances (e.g., `AsyncAnthropicBedrock` or `AsyncAnthropicVertex`) via the `client` parameter to use Anthropic models through other cloud providers.
 - **Retry behavior**: When `retry_on_timeout=True`, the first attempt uses the `retry_timeout_secs` timeout. If it times out, a second attempt is made with no timeout limit.
 - **System instruction precedence**: If both `system_instruction` (from the constructor) and a system message in the context are set, the constructor's `system_instruction` takes precedence and a warning is logged.

diff --git a/api-reference/server/utilities/text/pattern-pair-aggregator.mdx b/api-reference/server/utilities/text/pattern-pair-aggregator.mdx
@@ -130,6 +130,31 @@ When a pattern is matched, the handler function receives a `PatternMatch` object
 
 ## Usage Examples
 
+### Removing Tagged Content
+
+To drop content from the text stream entirely, register a pattern with `MatchAction.REMOVE`. The tags and everything between them are removed before reaching downstream processors — nothing is spoken by TTS and nothing lands in the conversation context. This is useful when your prompt elicits inline tags whose content is not meant for the user, such as reasoning tags (e.g., `<thinking>...</thinking>`) or annotations intended for other processors:
+
+```python
+from pipecat.processors.aggregators.llm_text_processor import LLMTextProcessor
+from pipecat.utils.text.pattern_pair_aggregator import MatchAction, PatternPairAggregator
+
+pattern_aggregator = PatternPairAggregator()
+pattern_aggregator.add_pattern(
+    type="thinking",
+    start_pattern="<thinking>",
+    end_pattern="</thinking>",
+    action=MatchAction.REMOVE,
+)
+
+# Set the aggregator on an LLMTextProcessor
+llm_text_processor = LLMTextProcessor(text_aggregator=pattern_aggregator)
+
+# add the llm_text_processor to your pipeline after the llm and before the tts
+# llm -> llm_text_processor -> tts
+```
+
+Because this filters the text stream itself, it works with any LLM provider and any custom inline tag.
+
 ### Voice Switching in TTS
 
 This example demonstrates finding custom `<voice>` tags in streaming text to switch voices dynamically in a TTS service like Cartesia. It removes the tags and the content between them, such that the content is treated as if it does not exist. It will not be spoken by the TTS, it will not be added to the context, and it will not be sent to clients via RTVI. Instead, it simply triggers a voice switch side effect.