Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there any easiler way to use this (OCR post-correction tool ) in python likewise we can easily use tesseract-OCR in python ? #53

Open
NavpreetDevpuri opened this issue Jun 20, 2020 · 2 comments

Comments

@NavpreetDevpuri
Copy link

NavpreetDevpuri commented Jun 20, 2020

I want a simple way to use this aswome library in python.
likewise in python we can use tesseract-OCR see here how easy it is to use.

If it is possible to use it in python then we can also use it on windows.
i am using windows 10 64bit

@bertsky
Copy link
Collaborator

bertsky commented Jun 30, 2020

@NavpreetDevpuri What do you mean by simple way?

This repo contains an OCR post-correction tool along with a much improved version of Ocropy 1 and ocrolib, but only for OCR-D – as the description/documentation says.

If you want non-OCR-D CLIs, you'll have to use the ocropus-* tools from old Ocropy 1 (which is Python 2 only).

For Tesseract API in Python, I recommend tesserocr instead of pytesseract.

I don't see how your OS choice is relevant here.

Can we close this?

@NavpreetDevpuri
Copy link
Author

NavpreetDevpuri commented Jun 30, 2020

thanks for your reply.
i want to know that is there any way to use this OCR post-correction tool in python likewise we can easily use tesseract-OCR (OCR tool) in python ?
it seems like i need to setup Docker as mentioned user_guide
i want to use it in python without Docker likewise tesseract.

i want to use methods mentioned at workflows in a easiler way
something like

import ocrd
import cv2 

config = {
    "ocrd-olena-binarize": {"impl": "sauvola"},
    "ocrd-anybaseocr-crop": None,
    "ocrd-olena-binarize": {"impl": "kim"},
    "ocrd-cis-ocropy-denoise": {"level-of-operation":"page"},
    "ocrd-tesserocr-deskew": {"operation_level":"page"},
    "ocrd-tesserocr-segment-region": None,
    "ocrd-segment-repair": {"plausibilize": True},
    "ocrd-cis-ocropy-deskew": {"level-of-operation":"region"},
    "ocrd-cis-ocropy-clip": {"level-of-operation":"region"},
    "ocrd-tesserocr-segment-line": None,
    "ocrd-segment-repair": {"sanitize": True},
    "ocrd-cis-ocropy-dewarp": None,
    "ocrd-calamari-recognize": {"checkpoint":"/path/to/models/*.ckpt.json"}
}

img = cv2.read("someimage.jpg")

# Doing the post-correction magic
processed_img = ocrd.process(img, config)

# Now i can use pytesseract to get text from processed_img
text = pytesseract.image_to_string(processed_img)
print(text)

This tool is awsome but it should be easy to use.

@NavpreetDevpuri NavpreetDevpuri changed the title Is there any way to use it in python likewise we can easily use tesseract-OCR in python ? Is there any easiler way to use this (OCR post-correction tool ) in python likewise we can easily use tesseract-OCR in python ? Jun 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants