GitHub

monocheck

Reference

Codes are refered from https://towardsdatascience.com/how-to-cluster-images-based-on-visual-similarity-cd6e7209fe34

Description

Clustering is a machine learning technique used to filter out incorrectly cropped line images in applications like document scanning or OCR systems. By grouping images based on features like text alignment and white space, clustering algorithms can identify clusters of likely erroneous crops. This allows for automated detection of improperly cropped images, enhancing the efficiency and accuracy of digital document processing without needing labeled training data.

Pecha image example:

Line image detection model outputs:>

So, now what ?

The clustering analyzes these segmented line images by examining features that differentiate well-cropped lines from poorly cropped ones. Features may include the consistency of text alignment, the uniformity of text height, and the absence of cut-off text. The clustering algorithm groups similar line images together based on these features.

The primary benefit of this clustering tool is its ability to automatically identify and filter out the "bad" clusters—those groups that likely contain incorrectly cropped images. By examining the characteristics of these clusters, the system can flag these as erroneous without manual intervention. This filtering significantly improves the quality of the data input into further processing stages, such as text recognition in OCR systems, by ensuring only correctly cropped lines are used, thereby enhancing both accuracy and efficiency in document digitization workflows.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.github/workflows		.github/workflows
docs		docs
non-coding-work-reports		non-coding-work-reports
src/monocheck		src/monocheck
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

monocheck

Reference

Description

Pecha image example:

Line image detection model outputs:>

So, now what ?

About

Releases

Packages

Languages

License

tenzin3/monocheck

Folders and files

Latest commit

History

Repository files navigation

monocheck

Reference

Description

Pecha image example:

Line image detection model outputs:>

So, now what ?

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages