This project aims to extract entity values (such as weight, volume, dimensions) from product images using machine learning techniques. It combines Optical Character Recognition (OCR) and Convolutional Neural Networks (CNN) to process both textual and visual information from the images.
- Clone the repository
- Install dependencies:
pip install -r requirements.txt - Download and preprocess the dataset using
data_preparation.py
- Prepare the data:
python data_preparation.py - Extract features:
python feature_extraction.py - Train the model:
python model_training.py - Generate predictions:
python predict.py
The model uses a hybrid architecture:
- OCR branch: Embedding layer followed by LSTM
- CNN branch: Pre-extracted features processed by fully connected layers
- Combined output: Concatenated features passed through fully connected layers
- Validation Accuracy: 87%
- F1 Score: 0.85
- Implement data augmentation techniques
- Explore more advanced OCR methods
- Fine-tune hyperparameters using techniques like Bayesian optimization
For any questions or issues, please open an issue in the GitHub repository.