Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no correction with ocrd-cis-postcorrect #51

Open
EEngl52 opened this issue Jun 3, 2020 · 10 comments
Open

no correction with ocrd-cis-postcorrect #51

EEngl52 opened this issue Jun 3, 2020 · 10 comments

Comments

@EEngl52
Copy link

EEngl52 commented Jun 3, 2020

I'm running ocrd-cis-postcorrect on the aligned OCR-output of Calamari and Tesserocr. So far, the output seems to be completely identical with the input even though there are quite some differences between the results of the two OCR engines. See e.g. the attached example.
postcorrect.zip

How can I achieve some correction results?

@finkf
Copy link
Contributor

finkf commented Jun 4, 2020

Thanks for reporting. I am having a look.

@finkf
Copy link
Contributor

finkf commented Jun 4, 2020

It appears that both files are line-segmented. The post-correction needs word-segmented input.
Anyway you could try to set the OCR to output word segments (as well as line segments).

@EEngl52
Copy link
Author

EEngl52 commented Jun 5, 2020

thanks for your quick reply! I'll try it again with word segments and report back

@EEngl52
Copy link
Author

EEngl52 commented Jul 30, 2020

I finally tried ocrd-cis-postcorrect again, this time with two OCR results from Tesseract and Calamari boeth segmented on word level (and aligned beforehand). Unfortunately I now run into an error (see attachment), there are no output files produced at all.

stderr.txt

@finkf
Copy link
Contributor

finkf commented Jul 30, 2020

From a quick glance I suspect problems with the profiling. Can you rerun the same command with --log-level DEBUG? I'll take a closer look later today.

@EEngl52
Copy link
Author

EEngl52 commented Jul 30, 2020

thx a lot for your quick reply! there's the log file

stderr.txt

@finkf
Copy link
Contributor

finkf commented Jul 31, 2020

In order to run our post correction, both our profiler and an according language backend has to be installed on the system. The configuration variable profilerPath (which should be named profilerCommand more appropriately) must point to the profiler executable and the profilerConfig variable must point to the according language configuration file. There is a manual for the profiler and the language backend in our repositories.

The other way is to use the profiler that is installed in this project's Dockerfile using docker. You can execute the following steps to build and test the docker container:

$ cd path/to/ocrd_cis                            # Change into ocrd_cis directory.
$ sudo docker build -t ocrd_cis .                # Build the ocrd_cis docker image (this will take some time).
$ sudo docker run ocrd_cis /apps/profiler --help # Check the profiler command in the image.
$ echo 'Theyle' | sudo docker -i run ocrd_cis /apps/profiler \
  --config /etc/profiler/languages/german.ini \
  --sourceFormat TXT --sourceFile /dev/stdin --simpleOutput

Then you can write a shellscript that executes sudo docker -i run ocrd_cis /apps/profiler $@, set the profilerPath to this script and the profilerConfig to e.g. /etc/profiler/languages/german.ini (a language configuration file within the docker container).

The third option is to run the post correction directly from the built docker image. I see that these points are not very clear in the documenation for the post correction. I will improve the documentation to make the configuration of the profiler more clear.

@finkf
Copy link
Contributor

finkf commented Jul 31, 2020

And I forgot to mention, that the error you are getting is due to a bad profiler configuration.

@EEngl52
Copy link
Author

EEngl52 commented Aug 4, 2020

thanks for your help! I'm using a native installation of ocrd_all and assumed that it included everything I need to run ocrd-cis-postcorrect (except the model). But then I guess I still need to install profiler and language backend

@finkf
Copy link
Contributor

finkf commented Aug 4, 2020

If you use a native installation, you need to install the profiler as well. I have little experience with python's installation setup. But maybe it is possible to install the profiler alongside with ocrd_cis. Maybe @kba can help here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants