Skip to content

Add Gemini CLI integration via OpenAI-compatible proxy#88

Open
wenze2527 wants to merge 1 commit intolm-sys:mainfrom
wenze2527:gemini-cli-integration
Open

Add Gemini CLI integration via OpenAI-compatible proxy#88
wenze2527 wants to merge 1 commit intolm-sys:mainfrom
wenze2527:gemini-cli-integration

Conversation

@wenze2527
Copy link
Copy Markdown

Summary

  • gemini_proxy.py: FastAPI server that wraps gemini -p (Google Gemini CLI) as an OpenAI-compatible endpoint on :8080. Enables RouteLLM to use a personal OAuth Gemini CLI session as the strong model without needing a direct API key.
  • start.py: Orchestration launcher that boots the Gemini proxy, waits for its health check, then starts RouteLLM with the mf router on :6060. Supports --threshold, --weak, and --port arguments.
  • start.bat: Windows batch launcher using a local venv Python.
  • start_and_test.bat: One-click Windows launcher that starts RouteLLM (or detects it's already running) then runs the smoke test automatically.
  • test_routing.py: Smoke test that sends easy / medium / hard prompts and prints which model each query was routed to.

Architecture

Your app
  └─► RouteLLM :6060  (OpenAI-compatible router, mf router)
        ├─► Gemini CLI proxy :8080  (strong — wraps `gemini -p`)
        └─► Ollama :11434           (weak  — ollama_chat/*)

Key design decisions

  • Gemini CLI OAuth: The Gemini CLI uses personal OAuth tokens that lack scope for direct generativelanguage.googleapis.com access. Wrapping gemini -p as a subprocess proxy is the only viable path.
  • Windows subprocess: Used asyncio.to_thread(subprocess.run()) instead of asyncio.create_subprocess_exec because the latter cannot resolve .cmd wrapper files on Windows.
  • shutil.which("gemini"): Resolves the full path to the Gemini CLI executable at startup, avoiding PATH lookup failures inside subprocesses.

Test plan

  • Start with start.bat (or start_and_test.bat for automatic testing)
  • Verify proxy health: curl http://localhost:8080/health
  • Verify RouteLLM: curl http://localhost:6060/v1/models
  • Run smoke test: python test_routing.py — confirm easy queries route to Ollama, hard queries route to Gemini

🤖 Generated with Claude Code

Enables RouteLLM to use Google Gemini CLI (personal OAuth) as the
strong model alongside local Ollama models as the weak model.

- gemini_proxy.py: FastAPI server wrapping `gemini -p` as an
  OpenAI-compatible endpoint on :8080; handles message flattening,
  async subprocess via asyncio.to_thread, and timeout handling
- start.py: Orchestration launcher that boots the Gemini proxy, waits
  for health check, then starts RouteLLM with mf router on :6060;
  supports --threshold, --weak, --port args
- start.bat: Windows batch launcher using hermes venv Python
- start_and_test.bat: One-click launcher that starts RouteLLM (or
  detects it's already running) then runs the smoke test
- test_routing.py: Smoke test that sends easy/medium/hard prompts and
  prints which model each was routed to

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant