Skip to content

Tools used to generate training files for the Tesseract OCR project

Notifications You must be signed in to change notification settings

REMitchell/tesseract-trainer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tesseract-trainer

This is a set of two tools used to generate OCR training files for Tesseract. It is particularly designed for image files with small numbers of characters. It will help you create box files, assuming the name of the image file reflects the text contained in the image.

To run the tesseract trainer, you need to point it at a directory containing a set of image files and a set of box files with corresponding file names. e.g. You might have a directory containing:

  • asdf.png
  • asdf.box
  • qwerty.png
  • qwerty.box

Where the file names correspond to the characters that the image contains.

This will produce a trained font file "traineddata.cap" (if you're using the default font name 'cap')

Put this file in /usr/local/share/tessdata to make the font available

About

Tools used to generate training files for the Tesseract OCR project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published