agent-coderag

The API Knowledge Bridge for AI Coding Agents.
Local, fast, and token-efficient semantic search that eliminates LLM hallucinations by providing real-time local context.

Features • Quick Start • Architecture • AI Agent Guide • Contributing

Why agent-coderag?

In 2026, AI coding agents are limited by stale training data. They hallucinate library calls because they don't know your specific environment.

The Pain: Your agent writes code for Pydantic v1 while you have v2 installed. You waste 5000+ tokens in a "Fail-Fix-Fail" loop.
The Cure: agent-coderag extracts live API signatures and technical intent from your local environment. It feeds the LLM exactly what it needs to see—no more, no less.

Key Features

Instant Startup: Built on onnxruntime and Rust-based tokenizers. Zero PyTorch overhead.
Context Compression: Replace 10,000 lines of raw code with a 200-token semantic summary.
Universal Tree-Sitter Parser: Supports 25+ languages (Python, JS/TS, Rust, Java, C++, Go, Ruby, etc.) with high precision.
API Discovery: On-the-fly extraction of public signatures for 6 core ecosystems (Python, Java, Go, TypeScript, Rust, C#) with build-system awareness.
Local First: All embeddings and data stay on your machine in a high-performance DuckDB VSS index.

Quick Start

Installation

pip install agent-coderag
# Install tree-sitter grammars for your languages on-demand
pip install tree-sitter-python tree-sitter-javascript

Initial Setup

# Download pre-trained multilingual embedding models (~130MB)
agent-coderag setup

# (Optional) Connect your preferred LLM for semantic distillation
# Using Ollama (Local)
agent-coderag config --url "http://localhost:11434" --provider "ollama" --model "qwen2.5-coder"

# Using OpenAI-compatible API (e.g. Groq, OpenRouter, DeepSeek)
agent-coderag config --url "https://api.deepseek.com" --key "your-api-key" --model "deepseek-chat"

Offline Mode (No Provider)

If you don't configure an LLM provider, agent-coderag works in 100% Offline Mode:

Parsing & API Discovery: Still works perfectly using local Tree-Sitter grammars and javap.
Search: Remains fast and accurate.
Distillation: Instead of AI-generated summaries, the system uses code signatures and entity names as fallback metadata. No data ever leaves your machine.

First Sync & Search

# Index your entire project (respects .gitignore automatically)
agent-coderag sync --all

# Perform a semantic search
agent-coderag search "how does the authentication middleware work?"

API Discovery

Verify external library signatures without leaving the CLI:

# Explicit language selection (Recommended for multi-language repos)
agent-coderag api requests --lang python
agent-coderag api lodash --lang typescript
agent-coderag api serde --lang rust

# Built-in auto-detection for common project types (Cargo.toml, package.json, etc.)
agent-coderag api fmt

Supported Ecosystems (Discovery)

Language	Method	Discovery Source
Python	3-Stage Probe	`.pyi` stubs, static source, or runtime `inspect`
Java	Bytecode Reflection	JARs resolved via Maven (`pom.xml`) or Gradle
Go	Standard Tooling	Native `go doc -all` integration
TypeScript/JS	Declaration Maps	`.d.ts` files from `node_modules` or `@types`
Rust	Registry Analysis	Source code from Cargo registry via `cargo metadata`
C#	Assembly Metadata	DLL metadata via `dnfile` and XML documentation

How It Works

agent-coderag creates a semantic map of your codebase using a multi-stage pipeline:

graph LR
    Code[Local Codebase] --> Parser[Multi-Language Parser]
    Parser --> Delta[Delta-Sync SHA-256]
    Delta -- New/Changed --> Distill[LLM Distiller]
    Delta -- Unchanged --> Cache[Local Cache]
    Distill --> Embed[ONNX Embedder]
    Cache --> Embed
    Embed --> DuckDB[(DuckDB VSS)]
    DuckDB --> Agent[AI Agent Response]

Structural Parsing: Identifies classes, methods, and relations (imports).
Technical Distillation: Generates a concise "intent summary" of each code unit.
Vectorization: Local ONNX model creates 384-dimensional embeddings.
VSS Storage: DuckDB enables sub-millisecond similarity search.

Agent-Native Usage

agent-coderag is designed to be the primary tool for your AI agents.

The Protocol:

Search First: Instead of reading files, the agent runs agent-coderag --json search.
Verify Signatures: The agent runs agent-coderag api to get real signatures.
Read Summaries: The agent uses the summary field to decide which files are actually relevant.

Programmatic Output:

agent-coderag --json search "database init" --limit 1

Development & Testing

We maintain a strict quality bar.

# Install development dependencies
make install

# Run full test suite with coverage
make test

# Run linters (Prospector, MyPy, Bandit)
make lint

Contributing

Contributions make the open source community an amazing place to learn, inspire, and create.

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'feat: add AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

Distributed under the MIT License. See LICENSE for more information.

🔝 Back to top

Built for agents. Driven by humans.

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
.github		.github
assets		assets
code_rag		code_rag
docs/plans		docs/plans
e2e_tests		e2e_tests
examples		examples
skills/coderag-intelligence		skills/coderag-intelligence
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.prospector.yaml		.prospector.yaml
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
NOTICE		NOTICE
README.md		README.md
SECURITY.md		SECURITY.md
THIRD_PARTY_LICENSES.md		THIRD_PARTY_LICENSES.md
coverage.xml		coverage.xml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

agent-coderag

Why agent-coderag?

Key Features

Quick Start

Installation

Initial Setup

Offline Mode (No Provider)

First Sync & Search

API Discovery

Supported Ecosystems (Discovery)

How It Works

Agent-Native Usage

The Protocol:

Development & Testing

Contributing

License

About

Uh oh!

Releases 5

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

agent-coderag

Why agent-coderag?

Key Features

Quick Start

Installation

Initial Setup

Offline Mode (No Provider)

First Sync & Search

API Discovery

Supported Ecosystems (Discovery)

How It Works

Agent-Native Usage

The Protocol:

Development & Testing

Contributing

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages