Skip to content

Latest commit

 

History

History
178 lines (136 loc) · 6.34 KB

File metadata and controls

178 lines (136 loc) · 6.34 KB

LiteLLM-RS Documentation

A high-performance AI Gateway written in Rust that provides unified access to 100+ AI providers through OpenAI-compatible APIs.

📚 Documentation Structure

Architecture & Design

Implementation Guides

Provider Documentation

Protocol Gateways

Examples & Tutorials

🚀 Quick Start

use litellm_rs::{completion, user_message, system_message};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let response = completion(
        "gpt-4",
        vec![
            system_message("You are a helpful assistant."),
            user_message("Hello, how are you?"),
        ],
        None,
    ).await?;
    
    println!("Response: {}", response.choices[0].message.content);
    Ok(())
}

🏗️ Architecture Highlights

  • High Performance: Built with Rust and Tokio for maximum throughput (10,000+ req/s)
  • OpenAI Compatible: Drop-in replacement for OpenAI API
  • 100+ Providers: Unified interface to all major AI providers
  • Intelligent Routing: Smart load balancing and failover
  • Enterprise Ready: Authentication, monitoring, cost tracking
  • Type Safety: Compile-time guarantees and zero-cost abstractions
  • MCP Gateway: Model Context Protocol for external tool integration
  • A2A Protocol: Agent-to-Agent communication with multi-provider support

📊 Performance Benchmarks

Real benchmark results from our unified router (run with cargo bench):

Single Operation Performance

Operation Time Description
Router Creation 39.4 ns Create empty router instance
Add Deployment 1.04 µs Insert single deployment
Alias Resolution 31.9 ns Model name alias lookup
Record Success 47.3 ns Atomic counter update (lock-free)
Record Failure 65.5 ns Atomic failure counter update

Routing Strategy Performance (10 deployments)

Strategy Time Use Case
RoundRobin 1.24 µs Equal distribution
LatencyBased 1.81 µs Lowest latency first
SimpleShuffle 1.85 µs Random selection
LeastBusy 2.04 µs Fewest active requests

Get Healthy Deployments (by count)

Deployments Time Throughput
1 130 ns ~7.7M ops/s
5 388 ns ~2.6M ops/s
10 694 ns ~1.4M ops/s
50 3.2 µs ~312K ops/s
100 6.3 µs ~159K ops/s

Concurrent Performance (lock-free operations)

Concurrent Tasks Time Throughput
10 37.3 µs ~268K ops/s
50 97.7 µs ~512K ops/s
100 172 µs ~581K ops/s
500 721 µs ~693K ops/s

Key Performance Characteristics

  • Lock-free design: Uses DashMap and atomic operations for zero-lock concurrent access
  • Static dispatch: Provider enum avoids vtable overhead
  • Nanosecond-level atomic ops: Record success/failure in ~50ns
  • Linear scaling: Concurrent throughput scales with task count
  • Sub-microsecond routing: Most strategies complete under 2µs

Running Benchmarks

# Run all benchmarks
cargo bench

# Run specific benchmark groups
cargo bench -- unified_router      # Router operations
cargo bench -- concurrent_router   # Concurrent performance
cargo bench -- cache_operations    # Cache benchmarks

# Generate HTML report
cargo bench -- --noplot  # Skip plot generation for faster runs

Benchmark results are generated using Criterion.rs and saved to target/criterion/.

📖 Key Concepts

Provider System

LiteLLM-RS uses a trait-based provider system that ensures consistency across all AI providers while allowing for provider-specific optimizations.

Routing Engine

Sophisticated routing with multiple strategies:

  • Round Robin
  • Least Latency
  • Cost Optimized
  • Health-Based
  • Custom Weighted

Unified Error Handling

All provider-specific errors are mapped to a unified error system for consistent error handling across the entire system.

🛠️ Development

Prerequisites

  • Rust 1.70+
  • PostgreSQL (optional)
  • Redis (optional)

Essential Commands

# Development
make dev              # Start development server
cargo test --all-features  # Run tests
cargo clippy --all-features  # Lint code

# Production
make build            # Build release binary
make docker           # Build Docker image

🤝 Contributing

  1. Read the Provider Implementation Guide
  2. Check existing issues
  3. Follow the development setup
  4. Submit PRs with tests and documentation

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.