Skip to content

docs: add HuggingFace tokenizer-based token text splitter for RAG chunking#1219

Draft
kela4 wants to merge 2 commits intoopen-webui:devfrom
kela4:feat/token_transformers-text-splitter
Draft

docs: add HuggingFace tokenizer-based token text splitter for RAG chunking#1219
kela4 wants to merge 2 commits intoopen-webui:devfrom
kela4:feat/token_transformers-text-splitter

Conversation

@kela4
Copy link
Copy Markdown

@kela4 kela4 commented Apr 25, 2026

Summary

This is related documentation for this pending PR: open-webui/open-webui#24139

Adds documentation for the token_transformers text splitter mode.

Checklist

  • Branch was create from fork's dev branch
  • Target branch is dev because it is related to a new, unreleased feature

Changes

  • env-configuration.mdx: Adds token_transformers as a valid option for RAG_TEXT_SPLITTER, and documents three new
    related env vars — RAG_TOKENIZER_MODEL, RAG_TOKENIZER_MODEL_AUTO_UPDATE and RAG_TOKENIZER_MODEL_TRUST_REMOTE_CODE.
  • rag/index.md: Adds a "Text Splitter Options" section explaining when to use each mode (Character, Token/Tiktoken,
    Token/Transformers), with guidance on configuring the tokenizer model for local vs. external embedding APIs and a tip
    on setting chunk size relative to the model's maximum sequence length.

@pr-validator-bot
Copy link
Copy Markdown

ℹ️ Documentation PR Guidelines

👋 Welcome! This is an automated message posted on all new documentation PRs to help guide our contributors. Just because this comment appeared doesn't mean you have done anything wrong!

Please ensure you're using the correct branches:

Target branch (where you're merging TO):

  • dev branch: For documentation related to upcoming Open WebUI releases (new features, new environment variables, anything dependent on unreleased versions and unreleased features/fixes/changes)
  • main branch: For content that can go live immediately (fixes, tutorials, documentation not dependent on unreleased features)

Source branch (where you're merging FROM):

  • If targeting dev, create your branch from your fork's dev branch
  • If targeting main, create your branch from your fork's main branch
  • ⚠️ Mismatched branches can and will result in unwanted file changes being included in your PR!

If your docs PR depends on a pending PR in open-webui/open-webui:

  • Convert this PR to DRAFT mode!
  • Link to the related main repo PR in your description for clarity
  • We'll review both together once the PR on the main repo is merged

Please adjust your PR target branch, source branch, and/or draft status accordingly if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants