Skip to content

To search for words in an image using OCR, Spell check and correction using Peter Norvig's Algorithm

Notifications You must be signed in to change notification settings

pareddy113/image-word-finder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Most of the documents, images, newspaper we see are paper based, and it’s always frustrating to search through thousands of words in an image or book or paper for a word. Wouldn’t it be great if we can search through them and find the locations of all the occurrences of the word in the image? How about if we can check if the detected works in the image/ new paper are further processed to be accurate? How does spell check and correction of the output sound? 

This project targets and features all the above stated problems. It takes an image as an input, process the image and detects the text in the image based on super powerful Tesseract OCR engine, then does the spell check and correction to further corroborate the consistency in the output detected. The spell check and correction is based on Peter Norvik’s Algorithm which is easy but has an accuracy of 90% and can process around 10 words a second. And you can input the words that you want to search, it searches through the entire image and not only gives the if the word is present but also the exact location of occurrences of all the words which will make your life easy.

Pre-requisites:
Python 3.5
OpenCV for Python3
Numpy
Pillow 3.x
Tesseract & pytesseract

Packages Installation commands for Mac OSX: 
pip3 install opencv-python → lib for Python 3.x
pip3 install opencv-contrib-python
pip3 install pillow → PIL for Python 3.x, dependency for tesseract OCR
brew install tesseract → OCR engine
pip3 install pytesseract → Python 3.x wrapper around tesseract OCR

Run the program:
Keep the download folder in any folder. Run the following command from the folder:
				python3 main.py

spell.py —> Used for spell check and correction using Peter Norvig's Algorithm
corpus.txt —> most used words for spell check and correction

Peter Norvig's Algorithm: http://norvig.com/spell-correct.html

you can interact with the program and search the word. The search image is 4.png by default, you can change it to your required image.eou can interact with the program and search the word. The search image is 4.png by default, you can change it to your required image.

About

To search for words in an image using OCR, Spell check and correction using Peter Norvig's Algorithm

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages