A receipt scanner and reader which makes use of tesseract-ocr and imagemagick. It executes five basic functionalities (hence the program’s name):
- scan receipt image (edge detection and warp transformation with opencv)
- preprocess scan (clean, sharpen, and contrast)
- run OCR (tesseract for optical character recognition)
- analyze OCR output (with fuzzy finder and preconfigured dictionary)
- summarize analysis in a csv file
To prepare for the scanning of the receipts, create a directory called
imgs/
in the repository, and place pictures of the receipts in it;
e.g. in Terminal (cd
into the repository first) type something of the sort:
mkdir -p imgs/
cp ~/Downloads/*.JPG imgs/
This program uses
To run pentaplex, type (of course cd
into repository first):
./pentaplex [optional: auto]
For code documentation visit: https://phdenzel.github.io/pentaplex/