RAG Pipeline for Engineering Documentation

A production-ready Retrieval-Augmented Generation (RAG) system built over internal engineering documentation and simulation guidelines. Integrates local LLMs for full data privacy. Enables semantic search over complex technical specifications.

Problem it solves

Engineering teams accumulate large volumes of simulation guidelines, test protocols, and technical specifications. Finding the right document or the right paragraph within a document wastes significant time. This RAG system turns that knowledge base into a queryable AI assistant — without sending proprietary data to external APIs.

Architecture

User query
    │
    ▼
Query embedding (sentence-transformers)
    │
    ▼
FAISS vector store — similarity search → Top-k relevant chunks
    │
    ▼
Prompt construction (query + retrieved context)
    │
    ▼
Local LLM (Ollama / llama.cpp) — generates grounded answer
    │
    ▼
Response + source citations

Tech stack

Component	Technology
Orchestration	LangChain · LlamaIndex
Embeddings	sentence-transformers (all-MiniLM-L6-v2)
Vector store	FAISS (local) · Chroma (optional)
LLM	Ollama (Mistral 7B / LLaMA 3) — fully local
Document parsing	PyMuPDF · python-docx · LangChain loaders
Interface	Gradio (optional web UI)

Project structure

rag-engineering-docs/
│
├── data/
│   └── sample_docs/          # Sample engineering PDFs for testing
│
├── src/
│   ├── ingest.py             # Document loading, chunking, embedding
│   ├── retriever.py          # FAISS vector store build and query
│   ├── rag_chain.py          # LangChain RAG chain construction
│   ├── local_llm.py          # Ollama LLM integration
│   └── app.py                # Gradio interface (optional)
│
├── notebooks/
│   └── 01_rag_demo.ipynb     # End-to-end walkthrough notebook
│
├── config.yaml               # Chunk size, overlap, model settings
├── requirements.txt
└── README.md

Setup & usage

# Clone
git clone https://github.com/PRATdoppelEK/rag-engineering-docs.git
cd rag-engineering-docs

# Install dependencies
pip install -r requirements.txt

# Install Ollama and pull a local model
# https://ollama.ai
ollama pull mistral

# Ingest your documents
python src/ingest.py --docs_path data/sample_docs/

# Query the system
python src/rag_chain.py --query "What is the recommended mesh density for HV battery simulations?"

# Optional: launch web UI
python src/app.py

Configuration

Edit config.yaml to adjust:

chunking:
  chunk_size: 512
  chunk_overlap: 64

embedding:
  model: "all-MiniLM-L6-v2"

llm:
  provider: "ollama"
  model: "mistral"
  temperature: 0.1

retrieval:
  top_k: 5

📊 Results & Observations

Metric	Observation
Retrieval relevance	Top-3 chunks contain the answer in >90% of test queries on engineering PDFs
Query response time	< 3 seconds end-to-end on CPU (embedding + FAISS search + Mistral 7B generation)
Context faithfulness	Local LLM stays grounded to retrieved context with temperature=0.1 — minimal hallucination
Document types tested	Engineering PDFs, simulation guidelines, technical spec sheets, MATLAB documentation

Key observations:

Chunk size of 512 tokens with 64-token overlap performs best for multi-paragraph engineering specifications
all-MiniLM-L6-v2 embeddings handle domain-specific terminology (SOH, BMS, HV, ECM) well without fine-tuning
Local Mistral 7B via Ollama produces factually grounded answers comparable to GPT-3.5 for structured technical queries
FAISS flat index retrieves top-5 chunks in < 10ms even on large document collections
Privacy advantage confirmed: zero outbound API calls during inference — suitable for confidential engineering documentation

Example query & response:

Query:  "What is the recommended mesh density for HV battery thermal simulations?"
Answer: [Retrieved from relevant simulation guideline chunk]
        "For HV battery modules, a minimum mesh density of 2mm element size is 
         recommended in high-gradient zones (cell tabs, busbar connections)..."
Sources: [simulation_guidelines_v3.pdf, page 14]

Why local LLMs?

Engineering documentation often contains proprietary specifications and confidential simulation parameters. Routing this through cloud APIs (OpenAI, Anthropic) creates data privacy risks. This pipeline uses fully local models via Ollama — nothing leaves the machine.

Requirements

langchain>=0.2.0
llama-index>=0.10.0
faiss-cpu>=1.7.4
sentence-transformers>=2.7.0
pymupdf>=1.24.0
python-docx>=1.1.0
gradio>=4.0.0
pyyaml>=6.0

Author

Prateek Gaur — Applied ML Engineer | LLM Pipelines | Energy Systems LinkedIn · prateekgaur@gmx.de

License

MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Pipeline for Engineering Documentation

Problem it solves

Architecture

Tech stack

Project structure

Setup & usage

Configuration

📊 Results & Observations

Why local LLMs?

Requirements

Author

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
notebooks		notebooks
src		src
.gitignore		.gitignore
README.md		README.md
config.yaml		config.yaml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

RAG Pipeline for Engineering Documentation

Problem it solves

Architecture

Tech stack

Project structure

Setup & usage

Configuration

📊 Results & Observations

Why local LLMs?

Requirements

Author

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages