tgi

Here are 11 public repositories matching this topic...

opea-project / GenAIExamples

Generative AI Examples is a collection of GenAI examples such as ChatQnA, Copilot, which illustrate the pipeline capabilities of the Open Platform for Enterprise AI (OPEA) project.

xeon summarization codegen copilot tgi rag llms genai gaudi2 chatqna

Updated Apr 21, 2026
Shell

zRzRzRzRzRzRzR / lm-fly

Sponsor

Star

大模型推理框架加速，让 LLM 飞起来

mlx tgi openvino llm vllm llm-inference tensorrt-llm

Updated May 10, 2024
Python

Bench360 is a modular benchmarking suite for local LLM deployments. It offers a full-stack, extensible pipeline to evaluate the latency, throughput, quality, and cost of LLM inference on consumer and enterprise GPUs. Bench360 supports flexible backends, tasks and scenarios, enabling fair and reproducible comparisons for researchers & practitioners.

benchmark performance framework energy deployment local optimization engine inference quantization energy-consumption tgi llm vllm llm-inference sglang lmdeploy bench360

Updated Feb 18, 2026
Python

TGI13 / Abi

Star

Zitate & Memes der Klasse TGI13

abitur tgi a-levels

Updated May 17, 2019

oriolrius / sagemaker-gemma3-openwebui

Star

AWS deployment stack for Gemma 3 on SageMaker with HuggingFace TGI, OpenAI-compatible API (Lambda + API Gateway), and OpenWebUI chat interface

aws lambda cloudformation sagemaker tgi huggingface bfloat16 openai-api llm-inference openwebui gemma3

Updated Mar 25, 2026
Python

dzivkovi / vllm-huggingface-bridge

Star

Bridge GitHub Copilot Chat with local vLLM/TGI servers and HuggingFace cloud models. Enterprise-ready VS Code extension for air-gapped AI coding.

enterprise ai vscode-extension tgi huggingface air-gapped github-copilot llm code-assistant vllm

Updated Sep 29, 2025
TypeScript

varad-more / selfhosted-chat-api

Star

Self-hosted FastAPI gateway exposing OpenAI and Anthropic Messages APIs in front of any open-source LLM runtime (vLLM, Ollama, llama.cpp, TGI, SGLang, LocalAI, LM Studio). Streaming, embeddings, metrics, auth, rate limiting.