Transaction Parser

Overview

Transaction Parser is an AI-powered add-on for ERPNext that automatically extracts data from PDFs and creates draft documents (Sales Order / Purchase Invoice) . It supports multiple document types and regions, making it easier to digitize and process business documents.

Features

AI-Powered Extraction: Uses advanced AI models (OpenAI, DeepSeek, Google Gemini, Anthropic) to extract structured data from PDFs

Multi-Document Support: Handles Sales Orders and Purchase Invoices (Expenses)
Regional Support: Special handling for India-specific requirements (GSTIN, PAN, HSN codes)
Email Integration: Automatically processes documents from incoming emails
Customizable Schemas: Flexible field mapping and custom schema support
Smart Item Matching: Automatically matches items from previous invoices

Configuration

1. Enable Transaction Parser Navigate to Transaction Parser Settings and configure:

Enable: Check to activate the app

2. Default AI Model: Select from available models:

DeepSeek Chat
DeepSeek Reasoner
OpenAI gpt-4o
OpenAI gpt-4o-mini
OpenAI gpt-5
OpenAI gpt-5-mini
Google Gemini Pro-2.5
Google Gemini Flash-2.5
Claude Haiku-4.5

3. API Keys Setup

Add your API keys for the AI services:

Service Provider	Models Supported
OpenAI	gpt-4o, gpt-4o-mini, gpt-5, gpt-5-mini
DeepSeek	deepseek-chat, deepseek-reasoner
Google	gemini-2.5-pro, gemini-2.5-flash
Anthropic	claude-haiku-4-5

4. Email Configuration (Optional) To automatically process documents from emails:

Parse Incoming Emails: Enable email processing
Incoming Email Accounts: Configure which email accounts to monitor
Party Emails: Map email addresses to specific customers/suppliers

5. Transaction Configuration

Invoice Lookback Count: Number of past invoices to consider for item matching (default: 5)

Usage

Manual Document Processing

Navigate to Sales Order or Purchase Invoice list view
Click on Actions → Parse Sales Order/Expense Invoice
Upload your PDF file
Select:
- AI Model: Choose the AI model to use
- Country: Select India or Other
- Page Limit: (Optional) Limit pages to process
Click Submit

TransactionParser.1.mp4

The system will:

Extract text from the PDF
Send it to the AI model for processing
Create a draft document with extracted data
Attach the original PDF to the created document

Automatic Email Processing

When enabled, the system automatically:

Monitors configured email accounts
Extracts PDF attachments from emails
Processes them based on sender and configuration
Creates draft documents

Model Comparison

Model	Provider	Best For	Speed	Cost
gpt-5	OpenAI	State-of-the-art accuracy, complex multi-page documents	Medium	High
gpt-5-mini	OpenAI	Efficient reasoning, cost-effective	Fast	Medium
gpt-4o	OpenAI	Complex documents, high accuracy	Medium	Medium-High
gpt-4o-mini	OpenAI	Cost-effective, good accuracy	Fast	Low
gemini-2.5-pro	Google	Advanced reasoning, large context window	Medium	Medium
gemini-2.5-flash	Google	Fast processing, bulk documents	Very Fast	Low
deepseek-chat	DeepSeek	General purpose extraction	Fast	Low
deepseek-reasoner	DeepSeek	Complex reasoning tasks	Slow	Medium
claude-haiku-4-5	Anthropic	Fast, lightweight tasks	Fast	Low

India-Specific Features

The Transaction Parser app includes robust support for Indian business requirements through integration with the India Compliance app. These features enable automatic handling of GST regulations, Indian business identifiers, and region-specific validation requirements.

Prerequisites

India Compliance App: Must be installed for India-specific features to work

India-Specific AI Model Enhancements

Enhanced Data Extraction - When processing documents with the India region selected, the AI models are enhanced to extract:

GST Identification Numbers (GSTIN)
Permanent Account Numbers (PAN)
HSN/SAC Codes
Tax Components

Automatic Supplier Creation

GSTIN-Based Supplier Creation When enabled in settings, the system can automatically create suppliers:

Configuration
- Enable "Auto Create Supplier" in Transaction Parser Settings
- Requires valid GSTIN in the invoice

PDF Processor Setup

Transaction Parser supports three PDF processors for text extraction.
Only PDFtoText (the default) is installed as a required dependency.
The other two are optional.

Installing Optional PDF Processors

# Install OCRMyPDF
env/bin/pip install -e "apps/transaction_parser[ocrmypdf]"

# Install Docling
env/bin/pip install -e "apps/transaction_parser[docling]"

# Install all optional processors
env/bin/pip install -e "apps/transaction_parser[all]"

1. PDFtoText (Default)

Layout-preserving text extraction using pdftotext.

Important

Install OS dependencies before running bench setup requirements or pip install, otherwise the pdftotext Python package will fail to build.

OS Dependencies (Debian/Ubuntu):

sudo apt install build-essential libpoppler-cpp-dev pkg-config python3-dev

For other operating systems, see pdftotext OS dependencies.

2. OCRMyPDF (Optional)

OCR-based text extraction using OCRmyPDF. Useful for scanned or image-based PDFs.

OS Dependencies (Debian/Ubuntu):

sudo apt-get install -y tesseract-ocr ghostscript

3. Docling (Optional)

Advanced document understanding using Docling with EasyOCR for OCR support.

See Docling OCR engines for more details.

Post-install fix for headless servers:

After installing the docling extra, replace opencv-python with the headless variant:

bench pip uninstall opencv-python
bench pip install opencv-python-headless

This is required because opencv-python depends on libGL.so.1, which is unavailable on headless servers:

ImportError: libGL.so.1: cannot open shared object file: No such file or directory

Summary

Processor	Dependency Type	OS Packages Required	OCR
PDFtoText	Required	`build-essential libpoppler-cpp-dev pkg-config python3-dev`	No
OCRMyPDF	Optional	`tesseract-ocr ghostscript`	Yes
Docling	Optional	None	Yes

License

GNU General Public License (v3)

Name		Name	Last commit message	Last commit date
Latest commit History 454 Commits
.github		.github
transaction_parser		transaction_parser
.editorconfig		.editorconfig
.eslintrc		.eslintrc
.flake8		.flake8
.git-blame-ignore-revs		.git-blame-ignore-revs
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
commitlint.config.js		commitlint.config.js
license.txt		license.txt
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transaction Parser

Overview

Features

Configuration

Usage

Manual Document Processing

Automatic Email Processing

Model Comparison

India-Specific Features

Prerequisites

India-Specific AI Model Enhancements

Automatic Supplier Creation

PDF Processor Setup

Installing Optional PDF Processors

1. PDFtoText (Default)

2. OCRMyPDF (Optional)

3. Docling (Optional)

Summary

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Transaction Parser

Overview

Features

Configuration

Usage

Manual Document Processing

Automatic Email Processing

Model Comparison

India-Specific Features

Prerequisites

India-Specific AI Model Enhancements

Automatic Supplier Creation

PDF Processor Setup

Installing Optional PDF Processors

1. PDFtoText (Default)

2. OCRMyPDF (Optional)

3. Docling (Optional)

Summary

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages