tesseract-trainer

This is a set of two tools used to generate OCR training files for Tesseract. It is particularly designed for image files with small numbers of characters. It will help you create box files, assuming the name of the image file reflects the text contained in the image.

To run the tesseract trainer, you need to point it at a directory containing a set of image files and a set of box files with corresponding file names. e.g. You might have a directory containing:

asdf.png
asdf.box
qwerty.png
qwerty.box

Where the file names correspond to the characters that the image contains.

This will produce a trained font file "traineddata.cap" (if you're using the default font name 'cap')

Put this file in /usr/local/share/tessdata to make the font available

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data		data
jquery-ui		jquery-ui
.gitignore		.gitignore
README.md		README.md
createBoxes.html		createBoxes.html
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tesseract-trainer

About

Releases

Packages

Languages

REMitchell/tesseract-trainer

Folders and files

Latest commit

History

Repository files navigation

tesseract-trainer

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages