CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Version Management

Automated Version Bumping (CI/CD)

Version bumping is fully automated via GitHub Actions. On every push to main:

Analyzes commits using Conventional Commits:
- feat: → Minor bump (0.1.x → 0.2.0)
- fix:, perf:, refactor: → Patch bump (0.1.3 → 0.1.4)
- feat!:, BREAKING CHANGE: → Major bump (0.x.x → 1.0.0)
Auto-updates:
- Cargo.toml version
- CHANGELOG.md with categorized changes
- Creates git tag v0.1.4
Triggers release pipeline → builds binaries, Docker images, publishes to crates.io

Commit Message Format

<type>(<scope>): <description>

[optional body]

[optional footer]

Types: feat, fix, docs, style, refactor, perf, test, chore

Examples:

git commit -m "feat(router): add weighted load balancing"     # → minor
git commit -m "fix(auth): resolve JWT validation issue"       # → patch
git commit -m "feat!: redesign provider interface"            # → major

Version Info in Code

Access build information programmatically:

use litellm_rs::{VERSION, GIT_HASH, full_version, build_info};

println!("Version: {}", VERSION);           // "0.1.4"
println!("Full: {}", full_version());       // "0.1.4-a1b2c3d"
println!("Info: {}", build_info());         // "0.1.4-a1b2c3d (built 1704067200 with rustc 1.87)"

Manual Release (if needed)

# 1. Update version
cargo set-version 0.1.4

# 2. Update CHANGELOG.md

# 3. Commit and tag
git add Cargo.toml Cargo.lock CHANGELOG.md
git commit -m "chore(release): bump version to 0.1.4"
git tag -a v0.1.4 -m "Release v0.1.4"
git push && git push --tags

Quick Commands

make version          # Show current version info
cargo pkgid           # Show package identifier with version

Essential Commands

Development Commands

Start development: make dev or cargo run (auto-loads config/gateway.yaml)
Build: cargo build --all-features
Test: cargo test --all-features
Lint: cargo clippy --all-targets --all-features -- -D warnings
Format: cargo fmt --all
Quick start: make start (fastest way to start the gateway)

Testing Commands

All tests: make test
Unit tests only: make test-unit
Integration tests: make test-integration
Test coverage: make test-coverage
Single test: cargo test <test_name> --all-features

Development Services

Start dev services: make dev-services (starts PostgreSQL, Redis)
Stop dev services: make dev-stop
Database migration: make db-migrate
Reset database: make db-reset

Architecture Overview

This is a high-performance AI Gateway written in Rust that provides OpenAI-compatible APIs with intelligent routing across 100+ AI providers. It's a Rust implementation of the Python LiteLLM library, designed for production environments requiring maximum throughput and minimal latency.

Core Components

Gateway Architecture: Modular, trait-based design with dependency injection

src/core/ - Central orchestrator and business logic
src/server/ - Actix-web HTTP server with middleware pipeline
src/auth/ - Multi-layered authentication (JWT, API keys, RBAC)
src/core/providers/ - Pluggable provider system (OpenAI, Anthropic, Azure, Google, etc.)
src/core/router/ - Intelligent routing with multiple strategies
src/core/mcp/ - MCP Gateway for external tool integration (90 tests)
src/core/a2a/ - A2A Protocol for agent-to-agent communication (48 tests)
src/storage/ - Multi-backend storage (PostgreSQL, Redis, S3, Vector DB)
src/monitoring/ - Observability (Prometheus, tracing, health checks)

Key Design Patterns

Async-first: All I/O is non-blocking using Tokio
Trait-based abstractions: Pluggable components via traits
Error handling: Comprehensive error types with context preservation
Configuration: Type-safe config models with Default implementations
No backward compatibility: Break old formats freely - prioritize clean architecture over legacy support

Provider Integration

Unified Provider trait: Common interface for all AI providers
Format conversion: Automatic translation between OpenAI and provider-specific APIs
Health monitoring: Per-provider health checks and failover
Cost calculation: Built-in token counting and cost estimation

Request Flow

HTTP Request → Authentication → Authorization → Router → Provider → Response
Middleware pipeline handles auth, logging, metrics, and transformations
Intelligent routing selects optimal provider based on health, latency, cost

Configuration

Main config: config/gateway.yaml (auto-loaded by default)
Example config: config/gateway.yaml.example
Environment variables: Override config values with ${ENV_VAR} syntax
Config validation: make config-validate

Important Files

src/main.rs - Application entry point
src/lib.rs - Library entry point with core Gateway struct and Python LiteLLM compatible exports
Cargo.toml - Dependencies and features (use --all-features for development)
Makefile - All development commands and workflows
config/gateway.yaml - Main configuration file

Binaries

gateway (default) - Main gateway server
google-gateway - Specialized Google API gateway

Features

The codebase uses Cargo features extensively:

Storage: postgres, sqlite, redis, s3
Monitoring: metrics, tracing
Advanced: vector-db, websockets, analytics, enterprise
Development: Use --all-features flag for full functionality

Database & Storage

Primary DB: PostgreSQL with Sea-ORM migrations
Cache: Redis for high-speed operations
File storage: S3-compatible object storage
Vector DB: Optional Qdrant integration for semantic caching

Testing Architecture

Unit tests in each module (#[cfg(test)])
Test files use inline tests within source files
Postman collections for API testing (tests/*.postman_collection.json)
Mock implementations for external services

Provider Tiers

Providers are split into two tiers based on whether they need custom Rust code.

Tier 1 — Catalog-only (zero code)

A provider belongs in Tier 1 when all of the following are true:

The remote API is OpenAI-compatible (/v1/chat/completions, standard request/response shape)
No custom request transformation is needed (no special headers, param filtering, or model-name mangling)
No custom streaming logic is needed (standard SSE with data: [DONE])
No provider-specific model metadata is required at runtime

How to add a Tier 1 provider: add a single def() entry in src/core/providers/registry/catalog.rs and a commented annotation in src/core/providers/mod.rs:

// in catalog.rs
def("myprovider", "My Provider", "https://api.myprovider.com/v1", "MYPROVIDER_API_KEY"),

// in mod.rs
// myprovider: Tier 1 -> registry/catalog.rs

No other files need to change. The factory in src/core/providers/factory/mod.rs automatically routes Tier 1 names through OpenAILikeProvider.

Tier 2 — Code-based (custom implementation)

A provider requires Tier 2 treatment when any of the following apply:

Non-OpenAI request/response format (e.g., Anthropic, Gemini, Cohere, Bedrock)
Custom HTTP client with auth signing (e.g., AWS SigV4 for Bedrock, SageMaker)
Unique streaming protocol (e.g., non-SSE, multipart, proprietary framing)
Provider-specific model info or capability metadata
Special parameter handling (e.g., tool-call transformation, response_format mapping)
Rerank, embed, image-generation, or audio endpoints with diverging schemas

How to add a Tier 2 provider: create a directory under src/core/providers/<name>/ containing at minimum mod.rs, then add a variant to ProviderType and implement the relevant trait methods. Also add the pub mod <name>; declaration in src/core/providers/mod.rs (guarded by the appropriate feature flag).

Resolving half-migrated providers

If git status shows DU (deleted-by-us, unresolved) files under src/core/providers/:

Decide the tier using the criteria above.
Tier 1: delete the directory and add a catalog entry + mod.rs comment.
Tier 2: restore the directory (git checkout HEAD -- <path>) and complete the implementation, or add stub methods that return ProviderError::not_implemented.
Verify with cargo check --all-features — zero DU files means no unresolved paths.

Common Development Patterns

Adding a Tier 1 provider: add a def() entry in src/core/providers/registry/catalog.rs
Adding a Tier 2 provider: create a provider directory in src/core/providers/<name>/
New API endpoints: add routes in src/server/routes/
Authentication: extend auth modules in src/auth/
Configuration: update models in src/config/models/
Monitoring: add metrics in respective modules
MCP servers: add server configs in src/core/mcp/config.rs
A2A agents: add agent configs in src/core/a2a/config.rs

Protocol Gateways

MCP Gateway (`src/core/mcp/`)

Model Context Protocol for connecting LLMs to external tools:

config.rs - Server configuration, authentication (Bearer, API Key, OAuth 2.0)
transport.rs - HTTP, SSE, WebSocket, stdio transports
protocol.rs - JSON-RPC 2.0 implementation
tools.rs - Tool definitions and invocation
server.rs - Individual server connection management
gateway.rs - Main gateway aggregating servers
permissions.rs - Fine-grained access control

A2A Protocol (`src/core/a2a/`)

Agent-to-Agent communication with multi-provider support:

config.rs - Agent configuration, provider types
message.rs - JSON-RPC 2.0 message format, task states
provider.rs - Provider adapters (LangGraph, Vertex AI, Azure, Bedrock, Pydantic AI)
registry.rs - Agent discovery and health monitoring
gateway.rs - Main gateway for agent management

Docker & Deployment

Docker build: make docker
Development stack: make docker-compose-dev
Production: make docker-compose
Kubernetes: make k8s-apply

Performance Characteristics

Throughput: 10,000+ requests/second
Latency: <10ms routing overhead
Memory: ~50MB base footprint
Architecture: Fully async, connection pooling, zero-copy where possible

Python LiteLLM Compatibility

This Rust implementation maintains API compatibility with the original Python LiteLLM:

Core completion API exposed via src/core/completion.rs
Helper functions: completion(), user_message(), system_message(), assistant_message()
Unified interface for 100+ providers with automatic routing

Agent / Multi-PR Rules

When using AI agents (Claude, Codex, Copilot) to create PRs:

Branch Rules

One issue → one branch → one PR. Never bundle unrelated fixes.
Always branch from latest main. Never fork from another feature branch.
Max 10 files / 500 lines per PR (excluding Cargo.lock, docs). Use scripts/guards/check_pr_scope.sh to verify.
Run overlap check before pushing: scripts/guards/check_pr_overlap.sh detects file conflicts with open PRs.

Agent Isolation

Parallel agents must use git worktree for isolation:

git worktree add /tmp/agent-task-{id} -b fix/issue-{id} main

Two agents must never modify the same file concurrently.

Before Creating PR

cargo fmt --all -- --check
cargo clippy --all-targets --all-features -- -D warnings
cargo test --all-features
bash scripts/guards/check_pr_scope.sh
bash scripts/guards/check_pr_overlap.sh

Toolchain

Rust version is pinned in rust-toolchain.toml. CI uses the same version.
Never use @stable in CI — always reference the pinned version.

Known Issues & Solutions

docs.rs Build Issue

The vector-db feature (which includes qdrant-client) fails to build on docs.rs due to its read-only filesystem. The qdrant-client build script attempts to write files during compilation.

Solution:

In Cargo.toml, the [package.metadata.docs.rs] section explicitly:
- Sets all-features = false to prevent docs.rs from using --all-features
- Lists specific features excluding vector-db
- This allows documentation to build successfully on docs.rs

Testing docs.rs compatibility locally:

env DOCS_RS=1 cargo doc --no-deps --features "postgres sqlite redis s3 metrics tracing websockets analytics"

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

Version Management

Automated Version Bumping (CI/CD)

Commit Message Format

Version Info in Code

Manual Release (if needed)

Quick Commands

Essential Commands

Development Commands

Testing Commands

Development Services

Architecture Overview

Core Components

Key Design Patterns

Provider Integration

Request Flow

Configuration

Important Files

Binaries

Features

Database & Storage

Testing Architecture

Provider Tiers

Tier 1 — Catalog-only (zero code)

Tier 2 — Code-based (custom implementation)

Resolving half-migrated providers

Common Development Patterns

Protocol Gateways

MCP Gateway (src/core/mcp/)

A2A Protocol (src/core/a2a/)

Docker & Deployment

Performance Characteristics

Python LiteLLM Compatibility

Agent / Multi-PR Rules

Branch Rules

Agent Isolation

Before Creating PR

Toolchain

Known Issues & Solutions

docs.rs Build Issue

MCP Gateway (`src/core/mcp/`)

A2A Protocol (`src/core/a2a/`)