Mareh OCR - "An One Time OCR"

This project's objective is to allow ocr of any language and font (handwriting also). The basic idea is instead of relaying on a general algorithm trained on huge datasets, in this project the training (or transfer learning) will be done on the current dataset (i.e. first pages of a book)

Current state

a basic tool for marking classifying and viewing the data is ready.
a basic NN model for detecting letters added (based on EAST word detection network)
a basic NN model for identifying letters added (simplest vanilla cnn used)
for gui tkinter was used, for NN pytorch was used

Todo's by categories (some are optional)

App

improve mvc
add support for moving letters
support marking just part of page
add visualization for training and inference process
add duplication detection

DeepLearning

General

investigate strange loss graphs
chose wisely networks
add gt page visualization (boxes as image)

Detector net

ignore misses and false of lettres detection in boundary

Identifier net

split letters by logic in train
add automatic letters clustering
add support for punctuation (also nikud like in hebrew)

Post processing

add lettres on page into words and lines tool
detect spaces

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
app		app
for future use		for future use
letter_classifier		letter_classifier
letter_detector		letter_detector
tessarect		tessarect
.gitignore		.gitignore
Mareh OCR.ico		Mareh OCR.ico
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mareh OCR - "An One Time OCR"

Current state

Todo's by categories (some are optional)

App

DeepLearning

General

Detector net

Identifier net

Post processing

About

Releases

Packages

Languages

eliyash/OneTimeOcr

Folders and files

Latest commit

History

Repository files navigation

Mareh OCR - "An One Time OCR"

Current state

Todo's by categories (some are optional)

App

DeepLearning

General

Detector net

Identifier net

Post processing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages