OCRmyIA

Perform OCR operations on PDFs and then compress them with the Internet Archive's code.

ABout the code

I am terrible at BASH and even worse at Python so I welcome and encourage any and all pull requests to improve my code. It's essentially a workflow recipe that depends on other pieces of code to work. I will probably add more scripts here later that do different things. I originally intended to license it as GPL-2, but I've decided to release it as AGPL-3.

Prerequisites

GNU/Linux OS with GNU Core Utils
OCRmyPDF (Tested on >= 1.4.0)
archive-pdf-tools Note: Due to a bug you should really modify the requirements.txt to install pymupdf v1.21.0 and not the latest version. This bug appears to only affect the archive-pdf-tools script.
archive-hocr-tools(these should install when installing archive-pdf-tools with pip)

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSE		LICENSE
README.md		README.md
pdf_ia-compress-ocr_workflow_AGPL-v3.sh		pdf_ia-compress-ocr_workflow_AGPL-v3.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OCRmyIA

ABout the code

Prerequisites

About

Releases

Packages

Languages

License

TDavLinguist/OCRmyIA

Folders and files

Latest commit

History

Repository files navigation

OCRmyIA

ABout the code

Prerequisites

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages