Transaction Parser is an AI-powered add-on for ERPNext that automatically extracts data from PDFs and creates draft documents (Sales Order / Purchase Invoice) . It supports multiple document types and regions, making it easier to digitize and process business documents.
AI-Powered Extraction: Uses advanced AI models (OpenAI, DeepSeek, Google Gemini, Anthropic) to extract structured data from PDFs
- Multi-Document Support: Handles Sales Orders and Purchase Invoices (Expenses)
- Regional Support: Special handling for India-specific requirements (GSTIN, PAN, HSN codes)
- Email Integration: Automatically processes documents from incoming emails
- Customizable Schemas: Flexible field mapping and custom schema support
- Smart Item Matching: Automatically matches items from previous invoices
1. Enable Transaction Parser Navigate to Transaction Parser Settings and configure:
2. Default AI Model: Select from available models:
-
DeepSeek Chat
-
DeepSeek Reasoner
-
OpenAI gpt-4o
-
OpenAI gpt-4o-mini
-
OpenAI gpt-5
-
OpenAI gpt-5-mini
-
Google Gemini Pro-2.5
-
Google Gemini Flash-2.5
-
Claude Haiku-4.5
3. API Keys Setup
Add your API keys for the AI services:
| Service Provider | Models Supported |
|---|---|
| OpenAI | gpt-4o, gpt-4o-mini, gpt-5, gpt-5-mini |
| DeepSeek | deepseek-chat, deepseek-reasoner |
| gemini-2.5-pro, gemini-2.5-flash | |
| Anthropic | claude-haiku-4-5 |
4. Email Configuration (Optional) To automatically process documents from emails:
-
Parse Incoming Emails: Enable email processing
-
Incoming Email Accounts: Configure which email accounts to monitor
-
Party Emails: Map email addresses to specific customers/suppliers
5. Transaction Configuration
- Navigate to Sales Order or Purchase Invoice list view
- Click on Actions → Parse Sales Order/Expense Invoice
- Upload your PDF file
- Select:
- AI Model: Choose the AI model to use
- Country: Select India or Other
- Page Limit: (Optional) Limit pages to process
- Click Submit
TransactionParser.1.mp4
The system will:
- Extract text from the PDF
- Send it to the AI model for processing
- Create a draft document with extracted data
- Attach the original PDF to the created document
When enabled, the system automatically:
- Monitors configured email accounts
- Extracts PDF attachments from emails
- Processes them based on sender and configuration
- Creates draft documents
| Model | Provider | Best For | Speed | Cost |
|---|---|---|---|---|
| gpt-5 | OpenAI | State-of-the-art accuracy, complex multi-page documents | Medium | High |
| gpt-5-mini | OpenAI | Efficient reasoning, cost-effective | Fast | Medium |
| gpt-4o | OpenAI | Complex documents, high accuracy | Medium | Medium-High |
| gpt-4o-mini | OpenAI | Cost-effective, good accuracy | Fast | Low |
| gemini-2.5-pro | Advanced reasoning, large context window | Medium | Medium | |
| gemini-2.5-flash | Fast processing, bulk documents | Very Fast | Low | |
| deepseek-chat | DeepSeek | General purpose extraction | Fast | Low |
| deepseek-reasoner | DeepSeek | Complex reasoning tasks | Slow | Medium |
| claude-haiku-4-5 | Anthropic | Fast, lightweight tasks | Fast | Low |
The Transaction Parser app includes robust support for Indian business requirements through integration with the India Compliance app. These features enable automatic handling of GST regulations, Indian business identifiers, and region-specific validation requirements.
- India Compliance App: Must be installed for India-specific features to work
Enhanced Data Extraction - When processing documents with the India region selected, the AI models are enhanced to extract:
- GST Identification Numbers (GSTIN)
- Permanent Account Numbers (PAN)
- HSN/SAC Codes
- Tax Components
GSTIN-Based Supplier Creation When enabled in settings, the system can automatically create suppliers:
- Configuration
- Enable "Auto Create Supplier" in Transaction Parser Settings
- Requires valid GSTIN in the invoice
- Transaction Parser supports three PDF processors for text extraction.
- Only PDFtoText (the default) is installed as a required dependency.
- The other two are optional.
# Install OCRMyPDF
env/bin/pip install -e "apps/transaction_parser[ocrmypdf]"
# Install Docling
env/bin/pip install -e "apps/transaction_parser[docling]"
# Install all optional processors
env/bin/pip install -e "apps/transaction_parser[all]"Layout-preserving text extraction using pdftotext.
Important
Install OS dependencies before running bench setup requirements or pip install, otherwise the pdftotext Python package will fail to build.
OS Dependencies (Debian/Ubuntu):
sudo apt install build-essential libpoppler-cpp-dev pkg-config python3-devFor other operating systems, see pdftotext OS dependencies.
OCR-based text extraction using OCRmyPDF. Useful for scanned or image-based PDFs.
OS Dependencies (Debian/Ubuntu):
sudo apt-get install -y tesseract-ocr ghostscriptAdvanced document understanding using Docling with EasyOCR for OCR support.
See Docling OCR engines for more details.
Post-install fix for headless servers:
After installing the docling extra, replace opencv-python with the headless variant:
bench pip uninstall opencv-python
bench pip install opencv-python-headlessThis is required because opencv-python depends on libGL.so.1, which is unavailable on headless servers:
ImportError: libGL.so.1: cannot open shared object file: No such file or directory| Processor | Dependency Type | OS Packages Required | OCR |
|---|---|---|---|
| PDFtoText | Required | build-essential libpoppler-cpp-dev pkg-config python3-dev |
No |
| OCRMyPDF | Optional | tesseract-ocr ghostscript |
Yes |
| Docling | Optional | None | Yes |

