AgentBond AI — Multi-Agent Investigator Engine

title	AgentBond API
colorFrom	indigo
colorTo	purple
sdk	docker
app_port	7860
pinned	false

AgentBond AI — Multi-Agent Investigator Engine

A production-grade, full-stack multi-agent system that accepts an open-ended problem statement, decomposes it into structured hypotheses, investigates each hypothesis against live web data, and verifies the findings for hallucination and context drift.

The investigation use-case is the demonstration surface. The architecture beneath it is a general-purpose, asynchronous agent orchestration runtime.

Live Frontend: agent-bond-ai.vercel.app
Live Backend API: karan6124-agentbond-api.hf.space
API Documentation: karan6124-agentbond-api.hf.space/docs

1. What This Is

AgentBond AI is not a chatbot wrapper. It is an agent orchestration system with the following properties:

Structured inter-agent communication via a shared context store that is authoritative for the entire investigation session.
Asynchronous task execution using Celery workers backed by Upstash Redis as the message broker in production.
A Verifier Agent that independently scores each piece of investigator output for hallucination and context alignment before the result is committed.
Full observability via Prometheus multi-process metrics exported from the FastAPI backend and visualized in Grafana dashboards.
Google OAuth 2.0 authentication with a JWT-secured API, supporting both email/password registration and Google Sign-In.
A React frontend deployed to Vercel, communicating securely with the backend on Hugging Face Spaces via a Vite environment variable.

2. System Architecture

Browser (Vercel — agent-bond-ai.vercel.app)
    |
    | HTTPS REST + JWT
    |
FastAPI API Gateway (Hugging Face Spaces — karan6124-agentbond-api.hf.space)
    |
    |--- Google OAuth 2.0 (accounts.google.com)
    |--- JWT Authentication Middleware
    |
Case Manager Agent  (Google Gemini 2.5 Flash)
    |
    | Celery Task Dispatch
    |
Upstash Redis (Message Broker + Result Backend)
    |
Investigator Agents  (parallel Celery workers)
    |
Shared Context Store  (PostgreSQL via Neon — per-case authoritative state)
    |
Verifier Agent  (hallucination and context drift scoring)
    |
Prometheus Metrics Exporter  (/metrics/ endpoint)
    |
Grafana Cloud  (remote scraping of production HF Space over HTTPS)

3. Agent Roles

Case Manager Agent

Receives the raw problem statement from the user and decomposes it into a structured set of investigable hypotheses. Each hypothesis represents a discrete, testable claim that downstream investigators can act on.

Input:

"Why are Nvidia's stock gains decelerating in Q2 2025?"

Output:

{
  "hypotheses": [
    "Increased competition from AMD and Intel in the data center GPU segment",
    "Slowdown in hyperscaler capital expenditure growth",
    "Export control restrictions limiting China revenue",
    "Market saturation in the consumer GPU segment"
  ]
}

Investigator Agent

Receives a single hypothesis and executes a structured web search using DuckDuckGo. It retrieves relevant evidence, summarises the findings, and writes the result back to the shared context store.

Investigators run as asynchronous Celery tasks, allowing multiple hypotheses to be investigated concurrently within the constraints of the worker pool.

Verifier Agent

The most critical component of the system. After each investigator reports a finding, the Verifier Agent independently evaluates the output for:

Context alignment: Did the agent stay within the defined scope of the investigation?
Hallucination detection: Are the claims grounded in the retrieved evidence, or are they fabricated?
Confidence scoring: A numeric confidence value between 0.0 and 1.0.

Output:

{
  "verdict": "SUPPORTED",
  "confidence": 0.84,
  "reason": "The retrieved evidence from three independent sources confirms a 15% decline in China-region GPU export approvals under the updated BIS regulations."
}

4. Shared Context Store

All agents read from and write to a single authoritative context object scoped to each investigation case. This prevents the context drift that occurs in multi-agent systems where individual agents maintain separate memory representations of the same problem.

Context schema:

{
  "case_id": "cec768e8d57e4b7db5ed9a8248f65fd4",
  "problem": "Why are Nvidia's stock gains decelerating in Q2 2025?",
  "constraints": ["Focus on macroeconomic and regulatory factors only"],
  "facts": [],
  "hypotheses": [],
  "evidence": [],
  "verifications": []
}

The context is persisted in PostgreSQL (Neon, production) and updated transactionally after each agent writes a result. All agents receive the full current state of the context before executing, ensuring coherent multi-step reasoning.

5. Authentication System

AgentBond AI uses a dual-mode authentication system:

Email and Password

Standard registration and login backed by bcrypt password hashing. Successful authentication returns a signed JWT token with a 7-day expiry. The token is stored in localStorage on the frontend and attached to every API request in the Authorization: Bearer header.

Google OAuth 2.0

Login with Google is initiated via a popup window opened by the React frontend. The flow is:

The popup navigates to /api/auth/google on the backend.
The backend redirects to Google's authorization endpoint with a PKCE-style state value stored in the Starlette session.
Google redirects to /api/auth/google/callback with an authorization code.
The backend exchanges the code for tokens using the Google OAuth 2.0 token endpoint.
User profile data is retrieved from Google's userinfo API.
A JWT is generated and returned to the frontend via window.postMessage.
The popup closes automatically and the main window completes the login.

Cross-Domain Cookie Handling: Because the frontend is on vercel.app and the backend is on hf.space, and because hf.space is on the Public Suffix List, modern browsers isolate session cookies between these origins. A fallback mechanism in the callback handler reconstructs the session state from the URL query parameters before Authlib validates it, resolving the mismatching_state error that occurs in cross-domain OAuth flows.

6. Observability Stack

Prometheus

The FastAPI backend exposes a /metrics/ endpoint compatible with the Prometheus exposition format. Because the container runs two processes (FastAPI and Celery) under supervisord, the prometheus_client library is configured in multi-process mode. Metric files are written to /code/prometheus_multiproc_dir (created automatically on startup) and aggregated at scrape time.

The following custom metrics are tracked:

Metric	Type	Description
`llm_tokens_input_total`	Counter	Total input tokens consumed across all LLM calls
`llm_tokens_output_total`	Counter	Total output tokens generated across all LLM calls
`agent_latency_seconds`	Histogram	End-to-end latency per agent type
`agent_runs_total`	Counter	Total agent invocations by type and outcome

Grafana Cloud

Metrics are visualised on Grafana Cloud — a fully managed, hosted Grafana instance. A Grafana Cloud-managed Prometheus scrape job pulls directly from the production backend over HTTPS using a configured scrape target:

scrape_configs:
  - job_name: "agent-engine-api"
    metrics_path: "/metrics/"
    scheme: "https"
    static_configs:
      - targets: ["karan6124-agentbond-api.hf.space"]

This scrape job is configured inside the Grafana Cloud UI under Connections → Add new connection → Prometheus. No local Prometheus or Grafana process is required.

Dashboards are published inside the AgentBond Analytics folder on Grafana Cloud:

LLM Token Consumption Rate — input and output token throughput over time (tokens/second)
Average Agent Latency — rolling average latency per agent type in seconds

7. Project Structure

AgentBond-AI/
|
├── app/
│   ├── api/
│   │   ├── routes.py              # Case and hypothesis REST endpoints
│   │   └── auth.py                # Google OAuth + JWT authentication routes
│   |
│   ├── agents/
│   │   ├── case_manager.py        # Decomposes problem into hypotheses via Gemini
│   │   ├── investigator.py        # Investigates a single hypothesis via web search
│   │   └── verifier.py            # Scores investigator output for hallucination
│   |
│   ├── services/
│   │   ├── llm.py                 # Gemini API client abstraction
│   │   ├── database.py            # SQLAlchemy session factory and engine
│   │   └── context_manager.py     # Shared context read/write operations
│   |
│   ├── workers/
│   │   └── celery_worker.py       # Celery app configuration and task definitions
│   |
│   ├── observability/
│   │   └── metrics.py             # Prometheus metric definitions
│   |
│   ├── models/
│   │   ├── schemas.py             # Pydantic models for all API data contracts
│   │   └── database_models.py     # SQLAlchemy ORM models
│   |
│   └── main.py                    # FastAPI application entry point, middleware, metrics
|
├── frontend/
│   ├── public/
│   │   └── favicon.svg            # Custom AgentBond investigator emblem
│   ├── src/
│   │   ├── components/            # Navbar, AuthModal, ConfirmModal, UI components
│   │   ├── context/
│   │   │   └── AuthContext.jsx    # JWT state, Google OAuth popup flow
│   │   ├── sections/              # WorkspaceSection, LandingSection
│   │   └── main.jsx               # React application entry point
│   ├── index.html                 # Page title and font imports
│   └── vite.config.js             # Vite build configuration
|
├── alembic/                       # Database migration scripts
├── tests/                         # Unit tests for agents and context manager
├── Dockerfile                     # Production container image definition
├── supervisord.conf               # Process manager: FastAPI + Celery in one container
├── docker-compose.yml             # Local infrastructure: Redis, Prometheus
├── prometheus.yml                 # Prometheus scrape configuration
├── pyproject.toml                 # Python project metadata and dependencies
└── .env.example                   # Environment variable template

8. Tech Stack

Layer	Technology
Frontend Framework	React 18 with Vite
Frontend Hosting	Vercel
API Framework	FastAPI
Task Queue	Celery 5
Message Broker (Production)	Upstash Redis (TLS)
Message Broker (Local)	Redis 7 via Docker
Database	PostgreSQL via Neon (serverless, production)
ORM	SQLAlchemy 2 with psycopg2
Data Validation	Pydantic v2
LLM Provider	Google Gemini 2.5 Flash
OAuth Library	Authlib (Starlette integration)
Authentication	JWT (python-jose) + bcrypt
Metrics	Prometheus Client (multi-process mode)
Dashboards	Grafana Cloud (hosted)
Container Runtime	Docker
Process Manager	Supervisord
Backend Hosting	Hugging Face Spaces (Docker SDK)
Package Manager	uv
Migrations	Alembic

No LangChain or agent framework dependency. Orchestration, context propagation, and inter-agent communication are implemented directly.

9. Production Deployment

The production system uses two separate hosting platforms connected via environment variables.

Backend — Hugging Face Spaces

The backend runs as a Docker container on a Hugging Face Space at karan6124-agentbond-api.hf.space.

The Dockerfile produces a python:3.12-slim image that installs all Python dependencies, copies the application code, creates the Prometheus metrics directory, and boots supervisord as PID 1.

supervisord manages two concurrent processes inside the single container:

uvicorn app.main:app --host 0.0.0.0 --port 7860 — the FastAPI web server on the port Hugging Face exposes publicly.
celery -A app.workers.celery_worker worker --loglevel=info --concurrency=2 — the Celery task worker connected to Upstash Redis.

All secrets (database credentials, Redis URL, Google OAuth credentials, Gemini API key, JWT secret) are configured as Secrets under the Space's Settings tab, never in the committed codebase.

To deploy a new version of the backend:

git push hf main

Hugging Face automatically detects the push, rebuilds the Docker image, and redeploys the container.

Frontend — Vercel

The React frontend is deployed to Vercel from the frontend/ subdirectory. The only environment variable required is:

VITE_API_URL=https://karan6124-agentbond-api.hf.space

This is configured in the Vercel project settings under Environment Variables. Vite bakes this value into the production build at compile time, and all API requests in the frontend target this URL.

To deploy a new version of the frontend:

git push origin main

Vercel detects the push and automatically rebuilds and redeploys the frontend.

Git Remotes

This repository is connected to two separate git remotes:

Remote	URL	Purpose
`origin`	`https://github.com/Edge-Explorer/AgentBond-AI.git`	Source of truth, triggers Vercel deploys
`hf`	`https://huggingface.co/spaces/Karan6124/agentbond-api`	Triggers Hugging Face backend deploys

10. Local Development Setup

Prerequisites

Docker Desktop for running local Redis and Prometheus
Python 3.12
uv for Python dependency management
Node.js 18+ for the frontend
A Google Cloud project with an OAuth 2.0 Web Application client
A Google Gemini API key from Google AI Studio

Backend Setup

Clone the repository:

git clone https://github.com/Edge-Explorer/AgentBond-AI.git
cd AgentBond-AI

Copy the environment template and populate all values:
```
cp .env.example .env
```
Create the Python virtual environment and install all dependencies:
```
uv sync
```

Activate the virtual environment:

# Windows
.venv\Scripts\activate

# macOS / Linux
source .venv/bin/activate

Start the local infrastructure services (Redis and Prometheus):
```
docker compose up -d
```
Run database migrations:
```
alembic upgrade head
```

Start the API server:

uvicorn app.main:app --reload --port 8000

Start the Celery worker in a separate terminal:

celery -A app.workers.celery_worker worker --loglevel=info

The API will be available at http://localhost:8000. Interactive API documentation is at http://localhost:8000/docs.

Frontend Setup

Navigate to the frontend directory:
```
cd frontend
```
Install Node dependencies:
```
npm install
```
Create a .env.local file for local development:
```
VITE_API_URL=http://localhost:8000
```
Start the development server:
```
npm run dev
```

The frontend will be available at http://localhost:5173.

Observability (Production — Grafana Cloud)

Production metrics are visualised on Grafana Cloud. The hosted Grafana Cloud Prometheus agent is configured to scrape the live Hugging Face Space directly:

Grafana Cloud Dashboard: grafana.com/orgs/your-org — log in to view the AgentBond Analytics folder.
No local Prometheus or Grafana process is needed to view production metrics.

Observability (Local)

For local development, Docker Desktop spins up a lightweight Prometheus sidecar that scrapes the local backend:

Prometheus: http://localhost:9090 — scrapes metrics from host.docker.internal:8000

To view local metrics in Grafana Cloud, you can temporarily point your Grafana Cloud scrape configuration at your local backend (requires a public tunnel such as ngrok) or use the Grafana Agent in push mode.

11. Environment Variables

Variable	Required	Description
`DATABASE_URL`	Yes	PostgreSQL connection string with `sslmode=require` for Neon
`REDIS_URL`	Yes	Redis or Upstash Redis connection string (`rediss://` for TLS)
`GEMINI_API_KEY`	Yes	API key from Google AI Studio
`GEMINI_MODEL`	No	Model name, defaults to `gemini-2.5-flash`
`GOOGLE_CLIENT_ID`	Yes	OAuth 2.0 client ID from Google Cloud Console
`GOOGLE_CLIENT_SECRET`	Yes	OAuth 2.0 client secret from Google Cloud Console
`SESSION_SECRET_KEY`	Yes	Random secret for Starlette session cookie signing
`JWT_SECRET_KEY`	Yes	Secret key for signing JWT access tokens
`FRONTEND_URL`	Yes	The deployed frontend URL for OAuth popup postMessage targeting
`APP_ENV`	No	`development` or `production`
`LOG_LEVEL`	No	Logging verbosity, defaults to `info`

The FRONTEND_URL variable controls where the OAuth callback sends the postMessage after a successful Google Sign-In. In production this is https://agent-bond-ai.vercel.app.

12. Design Decisions

Why no LangChain or agent framework?

Building orchestration, context propagation, state management, and inter-agent communication from scratch provides complete transparency into what the system is actually doing at every step. It also produces a codebase that is significantly easier to reason about, debug, and extend. LangChain and similar frameworks introduce layers of abstraction that obscure failures and make debugging multi-agent flows substantially harder.

Why a Verifier Agent?

In a multi-agent pipeline, errors from upstream agents compound downstream. An investigator that generates a partially hallucinated claim feeds that claim into subsequent reasoning steps, and the final output inherits the error with increased confidence. A dedicated verification step with an explicit confidence score makes the system self-auditing. Cases where the verifier marks a hypothesis as unsupported are visible to the user rather than silently propagated.

Why a shared context store instead of individual agent memory?

Individual agent memory leads to context drift — each agent develops a slightly different internal model of the problem based on the subset of information it has processed. A single authoritative context object, shared and updated by all agents and persisted in PostgreSQL, forces every agent to operate on the same ground truth. It also makes the full state of an investigation inspectable at any point in time.

Why Celery with Redis rather than async background tasks in FastAPI?

FastAPI's background tasks run in the same process as the web server. Long-running LLM inference calls in background tasks block the event loop and degrade API responsiveness. Celery workers are separate processes with independent memory and CPU allocation. Combined with Redis as a message broker, this produces a genuinely decoupled system where the API remains responsive regardless of how many investigation tasks are in flight.

Why Hugging Face Spaces for the backend?

Hugging Face Spaces provides free persistent Docker container hosting with a public HTTPS endpoint and git-based deployment. The Docker SDK allows a completely custom container image, which is necessary to run both FastAPI and Celery under supervisord in a single container — a deployment pattern that is otherwise impractical on platforms like Render or Railway's free tiers.

Why Upstash Redis instead of a local Redis container in production?

A local Redis container cannot run alongside the application on Hugging Face Spaces without significantly increasing container complexity. Upstash provides a managed, serverless Redis instance with TLS support (rediss://) and a free tier sufficient for the throughput of this application. The connection is shared between the FastAPI process and the Celery worker without any additional configuration.

Why Grafana Cloud instead of a local Grafana instance?

The production backend runs on Hugging Face Spaces, which is a remote host. A local Grafana instance can scrape the production /metrics/ endpoint over HTTPS, but this approach requires keeping a local machine permanently on and connected to the internet. Grafana Cloud eliminates this dependency entirely: the hosted scrape agent runs 24/7 from Grafana's infrastructure and persists metric history across sessions. Dashboards are accessible from any browser without needing a local Docker environment, which is the appropriate setup for a production observability stack.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
alembic		alembic
app		app
frontend		frontend
prometheus_multiproc_dir		prometheus_multiproc_dir
tests		tests
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
alembic.ini		alembic.ini
docker-compose.yml		docker-compose.yml
inject_key.py		inject_key.py
prometheus.yml		prometheus.yml
pyproject.toml		pyproject.toml
supervisord.conf		supervisord.conf
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

AgentBond AI — Multi-Agent Investigator Engine

Table of Contents

1. What This Is

2. System Architecture

3. Agent Roles

Case Manager Agent

Investigator Agent

Verifier Agent

4. Shared Context Store

5. Authentication System

Email and Password

Google OAuth 2.0

6. Observability Stack

Prometheus

Grafana Cloud

7. Project Structure

8. Tech Stack

9. Production Deployment

Backend — Hugging Face Spaces

Frontend — Vercel

Git Remotes

10. Local Development Setup

Prerequisites

Backend Setup

Frontend Setup

Observability (Production — Grafana Cloud)

Observability (Local)

11. Environment Variables

12. Design Decisions

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages