feat(vision): add auxiliary vision model for image description fallback by Johell1NS · Pull Request #157 · grinev/opencode-telegram-bot

Johell1NS · 2026-06-16T09:45:45Z

What

Adds an auxiliary vision model fallback. When the user sends an image to a model that does not support image input, the bot:

Automatically detects the current model lacks vision capability
Sends the image to a vision model configured by the user (via /setvision)
The vision model describes the image (with dynamic prompt: if a caption is provided, describes in context of the user question)
The textual description is forwarded to the main model, which continues processing

Everything is automatic — no user confirmation required. If no vision model is configured, the bot behaves exactly as before (error message + text-only fallback).

Closes #151

Commands

/setvision — shows a menu to select a vision model. If already set, shows the current model with "Change" and "Clear" options
The vision model selection persists in settings.json across restarts

Technical changes

3 new files:

src/app/services/vision-model-service.ts — service: describeImage() creates a temporary session, calls session.prompt() synchronously with the vision model, extracts the text description, deletes the session
src/bot/commands/setvision-command.ts — /setvision command handler
src/bot/callbacks/vision-model-callback-handler.ts — inline menu callback for vision model selection

17 modified files:

src/app/types/settings.ts / src/app/stores/settings-store.ts — new currentVisionModel field
src/bot/handlers/photo-handler.ts — vision fallback in !supportsImage branch
src/bot/handlers/document-handler.ts — fallback for image documents
src/bot/handlers/media-group-handler.ts — fallback for media groups with images
src/bot/routers/command-router.ts / callback-router.ts — command and callback registration
src/bot/menus/inline-menu.ts — added "vision" menu kind
src/bot/commands/definitions.ts — command list registration
src/i18n/*.ts — 11 new i18n keys across 7 languages

Notes

The temporary vision session is created as a child of the foreground session to suppress background session notifications
The vision model call uses session.prompt() (synchronous) rather than promptAsync(), since the full response is needed to continue the flow
The image description prompt is dynamic: if the user wrote a caption, the prompt is "The user asked: [caption]. Please describe the image focusing on the most relevant aspects." — otherwise "Please describe this image in detail."

When the selected model lacks image input capability, the bot now automatically uses a user-configurable auxiliary vision model to describe images. The textual description is then fed to the main model, enabling non-vision models to process image-based queries. Changes: - Add /setvision command to select a vision-capable model via inline menu - Add vision-model-service with synchronous describeImage() using session.prompt() for temporary vision sessions - Modify photo, document, and media-group handlers to fall back to vision model when main model lacks image support - Vision sessions are created as children of the foreground session to suppress background session notifications - Add 11 new i18n keys across all 7 supported languages - Persist vision model selection in settings.json Closes grinev#151

Johell1NS pushed a commit to Johell1NS/opencode-telegram-bot that referenced this pull request Jun 16, 2026

test: combine PR grinev#156 and grinev#157 for testing

442ddd0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(vision): add auxiliary vision model for image description fallback#157

feat(vision): add auxiliary vision model for image description fallback#157
Johell1NS wants to merge 1 commit into
grinev:mainfrom
Johell1NS:feat/vision-model-fallback

Johell1NS commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Johell1NS commented Jun 16, 2026

What

Commands

Technical changes

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant