Add Gemini CLI integration via OpenAI-compatible proxy by wenze2527 · Pull Request #88 · lm-sys/RouteLLM

wenze2527 · 2026-04-18T14:44:03Z

Summary

gemini_proxy.py: FastAPI server that wraps gemini -p (Google Gemini CLI) as an OpenAI-compatible endpoint on :8080. Enables RouteLLM to use a personal OAuth Gemini CLI session as the strong model without needing a direct API key.
start.py: Orchestration launcher that boots the Gemini proxy, waits for its health check, then starts RouteLLM with the mf router on :6060. Supports --threshold, --weak, and --port arguments.
start.bat: Windows batch launcher using a local venv Python.
start_and_test.bat: One-click Windows launcher that starts RouteLLM (or detects it's already running) then runs the smoke test automatically.
test_routing.py: Smoke test that sends easy / medium / hard prompts and prints which model each query was routed to.

Architecture

Your app
  └─► RouteLLM :6060  (OpenAI-compatible router, mf router)
        ├─► Gemini CLI proxy :8080  (strong — wraps `gemini -p`)
        └─► Ollama :11434           (weak  — ollama_chat/*)

Key design decisions

Gemini CLI OAuth: The Gemini CLI uses personal OAuth tokens that lack scope for direct generativelanguage.googleapis.com access. Wrapping gemini -p as a subprocess proxy is the only viable path.
Windows subprocess: Used asyncio.to_thread(subprocess.run()) instead of asyncio.create_subprocess_exec because the latter cannot resolve .cmd wrapper files on Windows.
shutil.which("gemini"): Resolves the full path to the Gemini CLI executable at startup, avoiding PATH lookup failures inside subprocesses.

Test plan

Start with start.bat (or start_and_test.bat for automatic testing)
Verify proxy health: curl http://localhost:8080/health
Verify RouteLLM: curl http://localhost:6060/v1/models
Run smoke test: python test_routing.py — confirm easy queries route to Ollama, hard queries route to Gemini

🤖 Generated with Claude Code

Enables RouteLLM to use Google Gemini CLI (personal OAuth) as the strong model alongside local Ollama models as the weak model. - gemini_proxy.py: FastAPI server wrapping `gemini -p` as an OpenAI-compatible endpoint on :8080; handles message flattening, async subprocess via asyncio.to_thread, and timeout handling - start.py: Orchestration launcher that boots the Gemini proxy, waits for health check, then starts RouteLLM with mf router on :6060; supports --threshold, --weak, --port args - start.bat: Windows batch launcher using hermes venv Python - start_and_test.bat: One-click launcher that starts RouteLLM (or detects it's already running) then runs the smoke test - test_routing.py: Smoke test that sends easy/medium/hard prompts and prints which model each was routed to Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Gemini CLI integration via OpenAI-compatible proxy#88

Add Gemini CLI integration via OpenAI-compatible proxy#88
wenze2527 wants to merge 1 commit intolm-sys:mainfrom
wenze2527:gemini-cli-integration

wenze2527 commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wenze2527 commented Apr 18, 2026

Summary

Architecture

Key design decisions

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant