A full-stack local AI chat application. Chat in real time with a locally running LLM through a streaming .NET 9 API and an Angular 21 frontend. All conversations stay on your machine — no data reaches any external AI service.
- Real-time streaming — Token-by-token rendering via Server-Sent Events with requestAnimationFrame batching at 60 fps
- Agentic tool use — Models with tool-calling support (qwen2.5, llama3.1) can browse the web to answer questions using real content
- Domain allowlist — Per-user control over which external domains the AI can fetch. 23 defaults pre-seeded (MDN, Angular, React, Stack Overflow, GitHub, Wikipedia, etc.)
- Per-conversation model selection — Switch Ollama models per chat session without restarting the API
- JWT authentication — Signup/login with bcrypt-hashed passwords
- Conversation history — Persistent sessions with full message history in MongoDB
- SSRF protection — All outbound web fetches go through a multi-stage security validation pipeline (scheme check → private-IP rejection → allowlist → content-type → size cap)
- Internal documentation — Built-in
/docspage with live Mermaid architecture diagrams
| Layer | Technology |
|---|---|
| Frontend | Angular 21 (standalone components + signals) |
| Backend | .NET 9 Web API (C#, Clean Architecture) |
| Database | MongoDB 7+ |
| AI Engine | Ollama — qwen2.5:7b (default) |
| Auth | JWT Bearer tokens |
| Styling | Tailwind CSS v4 |
- .NET 9 SDK
- Node.js 20+ and npm 10+
- MongoDB Community Server — running on
localhost:27017 - Ollama — with at least one model pulled (see the Models & Hardware section for recommendations)
OpenChatAi runs entirely with local LLMs via Ollama. Choosing the right model matters: too small and answers are weak; too big and your machine can't load it. This section is here to save you the trial-and-error.
LLMs need to fit in memory (RAM or VRAM). A rough guide:
| Model size | RAM needed | What runs it | Use case |
|---|---|---|---|
| 1-3B | 2-4 GB | Any laptop | Toy / experimentation |
| 7-8B | 6-8 GB | Mid-range laptop (16 GB) | Sweet spot for this project |
| 14B | 10-12 GB | High-end laptop (32 GB) | Better reasoning, slower |
| 32B | ~20 GB | Workstation | Genuinely strong local model |
| 70B | ~45 GB | Multi-GPU rig | Near GPT-3.5 quality |
| 400B+ | ~250 GB+ | GPU cluster | Frontier-tier, not for personal use |
For OpenChatAi, the 7-8B range is recommended. Smaller models struggle with tool calling; larger ones may not fit your machine.
These work well with OpenChatAi's agentic tool calling system:
| Model | Size | Why |
|---|---|---|
llama3.1:8b |
4.9 GB | Most obedient for tool calling. Best default. |
qwen2.5:7b |
4.7 GB | Good general quality, decent tool calling. |
mistral |
4.1 GB | Fast, lighter alternative. |
deepseek-r1:8b |
5.2 GB | Reasoning model — "thinks before answering". |
# Pull the recommended default
ollama pull llama3.1:8bThese are still useful for basic chat, but the agentic web fetching feature is auto-disabled when they're selected:
| Model | Size | Note |
|---|---|---|
llama3:latest |
4.7 GB | Older — predecessor of llama3.1 |
phi3:mini |
2.3 GB | Lightweight, for resource-constrained machines |
gemma2 |
5.4 GB | Google's model |
The model deepseek-r1 in Ollama is not the same as the full DeepSeek-R1
that you may have heard about. The Ollama versions are distilled: smaller
models (Qwen, Llama) trained to imitate parts of the reasoning behavior of the
real DeepSeek-R1.
| Variant | Size | What it actually is |
|---|---|---|
deepseek-r1:1.5b |
1.1 GB | Qwen2.5-1.5B distilled |
deepseek-r1:7b |
4.7 GB | Qwen2.5-7B distilled |
deepseek-r1:8b |
5.2 GB | Llama3.1-8B distilled |
deepseek-r1:14b |
9.0 GB | Qwen2.5-14B distilled |
deepseek-r1:32b |
20 GB | Qwen2.5-32B distilled |
deepseek-r1:70b |
43 GB | Llama3.3-70B distilled |
deepseek-r1:671b |
404 GB | The actual full DeepSeek-R1 |
Only the 671B variant is the original model. The smaller ones are useful and they inherit some reasoning style, but they are not the same beast.
See: https://ollama.com/library/deepseek-r1
You may come across announcements of impressive open-source models on Hugging Face like DeepSeek-V4-Flash (284B params) or DeepSeek-V4-Pro (1.6T params). These match closed-source models like Claude Opus and GPT-5 on many benchmarks.
They do not run on personal hardware:
- DeepSeek-V4-Flash requires ~170 GB of VRAM (typically 2× H200 GPUs)
- DeepSeek-V4-Pro requires ~860 GB of VRAM (a real GPU cluster)
- Even with aggressive quantization, you need a Mac Studio with 192 GB unified memory ($6,000+) at minimum
If you want to use these models without buying datacenter hardware, your options are the official DeepSeek API, OpenRouter, or other hosted inference providers — but that means giving up local execution, which defeats the privacy point of this project.
For a learning project on a normal laptop, stick with 7-8B models in the recommended list above. The architecture lessons in this codebase apply identically regardless of model size.
- Hugging Face is "the GitHub of AI models". Millions of models in their original formats. Powerful but requires more setup.
- Ollama packages selected models in an optimized format (GGUF), with one-command install. Curated and easy.
For this project, Ollama is the right tool. Hugging Face is worth exploring if you outgrow the Ollama catalog or want to fine-tune models yourself.
You can also pull GGUF-format models from Hugging Face directly into Ollama:
ollama pull hf.co/<author>/<model>A PowerShell script at the project root starts the API and UI together:
.\dev.ps1| Key | Action |
|---|---|
R |
Restart API |
U |
Restart UI |
Q |
Quit all |
# Windows — run as a service after install
net start MongoDB
# Or run directly
mongod --dbpath C:\data\dbcd backend/OpenChat.API
dotnet run --launch-profile httpRuns on http://localhost:5124 — Swagger UI available at /swagger in Development mode.
cd frontend/openchat-ui
npm install # first time only
npm startRuns on http://localhost:4200.
backend/OpenChat.API/appsettings.json:
{
"MongoDb": {
"ConnectionString": "mongodb://localhost:27017",
"DatabaseName": "OpenChatAi"
},
"Ollama": {
"BaseUrl": "http://localhost:11434",
"Model": "qwen2.5:7b"
},
"Jwt": {
"Secret": "ch4ng3-th1s-t0-a-str0ng-r4nd0m-64-char-s3cr3t-k3y-b3f0r3-g01ng-pr0d!!",
"ExpiryDays": "7"
}
}Security:
Jwt.Secretis a placeholder. Replace it with a strong random 64-character key before any non-local deployment. For machine-specific overrides, createappsettings.local.json(gitignored).
| Method | Route | Description |
|---|---|---|
POST |
/auth/signup |
Register a new user |
POST |
/auth/login |
Authenticate and receive a JWT |
| Method | Route | Description |
|---|---|---|
POST |
/chat/stream |
Stream response via Server-Sent Events |
POST |
/chat |
Single-shot response (no streaming) |
GET |
/chat/conversation/:id/messages |
Load last 100 messages |
| Method | Route | Description |
|---|---|---|
GET |
/conversation/:userId |
List all conversations for a user |
PUT |
/conversation/:id/model |
Update the model for a conversation |
DELETE |
/conversation/:id |
Delete conversation and all its messages |
| Method | Route | Description |
|---|---|---|
GET |
/models |
List available Ollama models |
| Method | Route | Description |
|---|---|---|
GET |
/api/allowlist |
Get all allowed domains |
POST |
/api/allowlist |
Add a domain |
PUT |
/api/allowlist/:id |
Update a domain |
DELETE |
/api/allowlist/:id |
Remove a domain |
PATCH |
/api/allowlist/:id/toggle |
Enable or disable a domain |
GET |
/api/allowlist/test?url= |
Test whether a URL passes the allowlist |
The /chat/stream endpoint emits Server-Sent Events:
| Event | Payload | Description |
|---|---|---|
token |
string |
A single text token |
tool_start |
{ tool, args } |
Tool call started |
tool_end |
{ tool, ok, sourceUrl, preview } |
Tool call completed |
done |
{ conversationId, tokensUsed, toolCallsUsed } |
Response complete |
error |
{ message } |
Fatal error |
Models that support tool calling can invoke fetch_url to retrieve web content. Each request is validated through:
- Scheme check — only
http://andhttps://allowed - SSRF protection — private IP ranges and reserved hostnames rejected
- Domain allowlist — per-user, memory-cached
- 10 s timeout / 2 MB response cap
- HTML → Markdown extraction via AngleSharp + ReverseMarkdown
- Truncation to 8 000 characters
The model is limited to 3 tool calls per response turn.
| Collection | Index | Description |
|---|---|---|
users |
— | Accounts with bcrypt-hashed passwords |
conversations |
(userId ASC, updatedAt DESC) |
Chat sessions with model and token count |
chatmessages |
(conversationId ASC, timestamp ASC) |
Messages including tool call records |
logs |
— | Token usage per assistant response |
allowed_domains |
— | Per-user domain allowlist |
OpenChatAi/
├── backend/
│ └── OpenChat.API/
│ ├── Controllers/ # HTTP entry points (Chat, Auth, Conversation, Models, Allowlist)
│ ├── Services/ # Business logic (ChatService, AgenticChatService, OllamaService, ...)
│ ├── Repositories/ # MongoDB access (Chat, Conversation, Log, User, AllowedDomain)
│ ├── Tools/ # Tool definitions (fetch_url, WebFetcherService, ToolRegistry)
│ ├── Models/ # Domain entities and DTOs
│ └── appsettings.json
├── frontend/
│ └── openchat-ui/
│ └── src/app/
│ ├── features/ # auth/, docs/, settings/ (with allowed-domains/)
│ ├── components/ # chat, sidebar, model-selector, tool-indicator, sources-footer, ...
│ ├── services/ # chat, conversation, model, allowlist, auth, excel, confirm
│ └── models/ # TypeScript interfaces
├── dev.ps1 # Quick-start script (starts API + UI)
└── README.md
| Problem | Fix |
|---|---|
| AI not responding | Run ollama serve and confirm ollama list shows your model |
| No web tool use | Model must support tool calling — use qwen2.5:7b or llama3.1:8b |
| MongoDB error | Ensure mongod is running on port 27017 |
| CORS error | Ensure backend is on port 5124 |
| Slow responses | Normal for local LLMs on CPU — GPU strongly recommended for 7B+ models |
Open /docs in the running app for interactive architecture diagrams (Mermaid), data model ER diagram, agentic flow sequence, and full API reference.



