LiteLLM-RS Documentation

A high-performance AI Gateway written in Rust that provides unified access to 100+ AI providers through OpenAI-compatible APIs.

📚 Documentation Structure

Architecture & Design

System Overview - Complete system architecture and design patterns
Error System - Unified error handling architecture and patterns
Provider Implementation - Guide for implementing individual providers
Architecture Improvements - Historical improvements and optimizations

Implementation Guides

Getting Started - Quick start guide and basic usage
Configuration - Configuration management and environment setup
Deployment - Production deployment strategies
Testing - Testing strategies and best practices

Provider Documentation

Provider Overview - Supported providers and capabilities
DeepSeek - DeepSeek V3.1 integration guide
OpenAI - OpenAI and compatible providers
Anthropic - Claude models integration
Adding Providers - Step-by-step provider implementation

Protocol Gateways

MCP Gateway - Model Context Protocol integration
A2A Protocol - Agent-to-Agent communication

Examples & Tutorials

Basic Examples - Simple completion examples
Advanced Features - Streaming, function calling, etc.
Integration Examples - Web frameworks and service integrations

🚀 Quick Start

use litellm_rs::{completion, user_message, system_message};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let response = completion(
        "gpt-4",
        vec![
            system_message("You are a helpful assistant."),
            user_message("Hello, how are you?"),
        ],
        None,
    ).await?;
    
    println!("Response: {}", response.choices[0].message.content);
    Ok(())
}

🏗️ Architecture Highlights

High Performance: Built with Rust and Tokio for maximum throughput (10,000+ req/s)
OpenAI Compatible: Drop-in replacement for OpenAI API
100+ Providers: Unified interface to all major AI providers
Intelligent Routing: Smart load balancing and failover
Enterprise Ready: Authentication, monitoring, cost tracking
Type Safety: Compile-time guarantees and zero-cost abstractions
MCP Gateway: Model Context Protocol for external tool integration
A2A Protocol: Agent-to-Agent communication with multi-provider support

📊 Performance Benchmarks

Real benchmark results from our unified router (run with cargo bench):

Single Operation Performance

Operation	Time	Description
Router Creation	39.4 ns	Create empty router instance
Add Deployment	1.04 µs	Insert single deployment
Alias Resolution	31.9 ns	Model name alias lookup
Record Success	47.3 ns	Atomic counter update (lock-free)
Record Failure	65.5 ns	Atomic failure counter update

Routing Strategy Performance (10 deployments)

Strategy	Time	Use Case
RoundRobin	1.24 µs	Equal distribution
LatencyBased	1.81 µs	Lowest latency first
SimpleShuffle	1.85 µs	Random selection
LeastBusy	2.04 µs	Fewest active requests

Get Healthy Deployments (by count)

Deployments	Time	Throughput
1	130 ns	~7.7M ops/s
5	388 ns	~2.6M ops/s
10	694 ns	~1.4M ops/s
50	3.2 µs	~312K ops/s
100	6.3 µs	~159K ops/s

Concurrent Performance (lock-free operations)

Concurrent Tasks	Time	Throughput
10	37.3 µs	~268K ops/s
50	97.7 µs	~512K ops/s
100	172 µs	~581K ops/s
500	721 µs	~693K ops/s

Key Performance Characteristics

Lock-free design: Uses DashMap and atomic operations for zero-lock concurrent access
Static dispatch: Provider enum avoids vtable overhead
Nanosecond-level atomic ops: Record success/failure in ~50ns
Linear scaling: Concurrent throughput scales with task count
Sub-microsecond routing: Most strategies complete under 2µs

Running Benchmarks

# Run all benchmarks
cargo bench

# Run specific benchmark groups
cargo bench -- unified_router      # Router operations
cargo bench -- concurrent_router   # Concurrent performance
cargo bench -- cache_operations    # Cache benchmarks

# Generate HTML report
cargo bench -- --noplot  # Skip plot generation for faster runs

Benchmark results are generated using Criterion.rs and saved to target/criterion/.

📖 Key Concepts

Provider System

LiteLLM-RS uses a trait-based provider system that ensures consistency across all AI providers while allowing for provider-specific optimizations.

Routing Engine

Sophisticated routing with multiple strategies:

Round Robin
Least Latency
Cost Optimized
Health-Based
Custom Weighted

Unified Error Handling

All provider-specific errors are mapped to a unified error system for consistent error handling across the entire system.

🛠️ Development

Prerequisites

Rust 1.70+
PostgreSQL (optional)
Redis (optional)

Essential Commands

# Development
make dev              # Start development server
cargo test --all-features  # Run tests
cargo clippy --all-features  # Lint code

# Production
make build            # Build release binary
make docker           # Build Docker image

🤝 Contributing

Read the Provider Implementation Guide
Check existing issues
Follow the development setup
Submit PRs with tests and documentation

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LiteLLM-RS Documentation

📚 Documentation Structure

Architecture & Design

Implementation Guides

Provider Documentation

Protocol Gateways

Examples & Tutorials

🚀 Quick Start

🏗️ Architecture Highlights

📊 Performance Benchmarks

Single Operation Performance

Routing Strategy Performance (10 deployments)

Get Healthy Deployments (by count)

Concurrent Performance (lock-free operations)

Key Performance Characteristics

Running Benchmarks

📖 Key Concepts

Provider System

Routing Engine

Unified Error Handling

🛠️ Development

Prerequisites

Essential Commands

🤝 Contributing

📄 License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

LiteLLM-RS Documentation

📚 Documentation Structure

Architecture & Design

Implementation Guides

Provider Documentation

Protocol Gateways

Examples & Tutorials

🚀 Quick Start

🏗️ Architecture Highlights

📊 Performance Benchmarks

Single Operation Performance

Routing Strategy Performance (10 deployments)

Get Healthy Deployments (by count)

Concurrent Performance (lock-free operations)

Key Performance Characteristics

Running Benchmarks

📖 Key Concepts

Provider System

Routing Engine

Unified Error Handling

🛠️ Development

Prerequisites

Essential Commands

🤝 Contributing

📄 License