How to run only detection model to get the bounding boxes. (Handwritten text) #1734

sanjay-nit · 2024-09-26T20:24:06Z

sanjay-nit
Sep 26, 2024

Is there any way to run only the detection model and get the bounding boxes.
So, Idea is to run it for hand written documents and it seems like TrOCR works well on handwritten but it only works of one liner text images. So, I'm thinking if I can run only the detection model to get the line/word bounding boxes so that I can crop it and send it to TrOCR. Thanks!

Answered by felixdittrich92

Sep 27, 2024

Hi @sanjay-nit 👋,
Sure the ocr_predictor instance is at the end only a wrapper around the detection_predictor / recognition_predictor and crop_orientation_predictor / page_orientation_predictor.

Here you go (example code):

import requests
import cv2
import numpy as np

from doctr.io import DocumentFile
from doctr.models import detection_predictor
from doctr.utils.geometry import detach_scores


# Convert relative coordinates to absolute pixel values
def _to_absolute(geom, img_shape: tuple[int, int]) -> list[list[int]]:
    h, w = img_shape
    if len(geom) == 2:  # Assume straight pages = True -> [[xmin, ymin], [xmax, ymax]]
        (xmin, ymin), (xmax, ymax) = geom
        xmin, xmax = i…

View full answer

felixdittrich92 · 2024-09-27T07:26:02Z

felixdittrich92
Sep 27, 2024
Maintainer

Hi @sanjay-nit 👋,
Sure the ocr_predictor instance is at the end only a wrapper around the detection_predictor / recognition_predictor and crop_orientation_predictor / page_orientation_predictor.

Here you go (example code):

import requests
import cv2
import numpy as np

from doctr.io import DocumentFile
from doctr.models import detection_predictor
from doctr.utils.geometry import detach_scores


# Convert relative coordinates to absolute pixel values
def _to_absolute(geom, img_shape: tuple[int, int]) -> list[list[int]]:
    h, w = img_shape
    if len(geom) == 2:  # Assume straight pages = True -> [[xmin, ymin], [xmax, ymax]]
        (xmin, ymin), (xmax, ymax) = geom
        xmin, xmax = int(round(w * xmin)), int(round(w * xmax))
        ymin, ymax = int(round(h * ymin)), int(round(h * ymax))
        return [[xmin, ymin], [xmax, ymin], [xmax, ymax], [xmin, ymax]]
    else:  # For polygons, convert each point to absolute coordinates
        return [[int(point[0] * w), int(point[1] * h)] for point in geom]


url = "https://www.francetvinfo.fr/pictures/uGwaNE-aJq7zHLhZJdzdCd9nyjE/1200x900/2021/03/16/phpCDwGn0.jpg"

det_predictor = detection_predictor(
    arch="fast_base",
    pretrained=True,
    assume_straight_pages=True,
    symmetric_pad=True,
    preserve_aspect_ratio=True,
) #.cuda().half()  # Uncomment this line if you have a GPU

det_predictor.model.postprocessor.bin_thresh = 0.3
det_predictor.model.postprocessor.box_thresh = 0.65

docs = DocumentFile.from_images([requests.get(url).content])
results = det_predictor(docs)

image = cv2.imdecode(np.frombuffer(requests.get(url).content, np.uint8), cv2.IMREAD_COLOR)

for doc, res in zip(docs, results):
    img_shape = (doc.shape[0], doc.shape[1])
    # Detach the probability scores from the results
    detached_coords, prob_scores = detach_scores([res.get("words")])

    for i, coords in enumerate(detached_coords[0]):
        coords = coords.reshape(2, 2).tolist() if coords.shape == (4, ) else coords.tolist()

        # Convert relative to absolute pixel coordinates
        points = np.array(_to_absolute(coords, img_shape), dtype=np.int32).reshape((-1, 1, 2))

        # Draw the bounding box on the image
        cv2.polylines(image, [points], isClosed=True, color=(255, 0, 0), thickness=2)

    # Save the modified image with bounding boxes
    cv2.imwrite("output.jpg", image)

2 replies

sanjay-nit Sep 27, 2024
Author

Thanks @felixdittrich92 for your quick reply. You are really a life saver, man.
Well, I don't see detach_scores method in doctr.utils.geometry.py file. FYI: using python-doctr==0.8.1

felixdittrich92 Sep 27, 2024
Maintainer

@sanjay-nit I suggest to upgrade to 0.9.0 because there was a fix included which solved a significant bug by using the standalone detection_predictor :)

ref.: #1627

sanjay-nit · 2024-09-29T21:13:20Z

sanjay-nit
Sep 29, 2024
Author

Hi @felixdittrich92,

This may not be directly related to this discussion, but I wanted to express my satisfaction with the auto-rotating pages feature.

I have some rotated images on which I want to apply OCR, and I also need to straighten the pages. I'm pleased with the results when I use assume_straight_pages=False and straighten_pages=True together in the ocr_predictor. However, when I export the result, I don't receive the page rotation angle, even though it works beautifully when I use the result.show() method.

Is it possible to get the rotation angle? Additionally, it would be great if I could obtain the auto-rotated image that the OCR model processed so I can apply the bounding boxes generated by the result.export() method.

Thank you!

0 replies

felixdittrich92 · 2024-09-30T05:36:46Z

felixdittrich92
Sep 30, 2024
Maintainer

Hi @sanjay-nit 👋,

Glad to hear you like it 🤗

You can pass detect_orientation=True, to the ocr_predictor to get the estimated angle :)

To get the straightened page you can do the following:

result = model(doc)

# get straightened pages
corrected_pages = [page.page for page in result.pages]

This results in a list of numpy arrays which can easily saved with opencv :)

Hope this helps 👍

Best regards,
Felix

0 replies

felixdittrich92 · 2024-09-30T05:38:25Z

felixdittrich92
Sep 30, 2024
Maintainer

@sanjay-nit Do you deal with "random" rotated documents and images where the text can also have any direction (horizontal, vertical) or it's more like small rotations (range: -45 to 45 degrees) and only horizontal text ? :)

10 replies

felixT2K Oct 14, 2024

@sanjay-nit Which one have you tested ? The blog post ?
With some small changes it should also work for pdf files or multible images to create one pdf/a from all without the need of any merging :)
The OCRmyPDF plugin would still be cool 😅 Do you know if OCRmyPDF can also handle creating pdf/a files from images/pdfs which contains rotated text ?

felixT2K Oct 14, 2024

I think we should update our tutorial notebook also to use OCRmyPDF instead of the own implementation 🤔 Is tesseract a fixed dependency in OCRmyPDF or optional ?

sanjay-nit Oct 14, 2024
Author

@sanjay-nit Which one have you tested ? The blog post ? With some small changes it should also work for pdf files or multible images to create one pdf/a from all without the need of any merging :) The OCRmyPDF plugin would still be cool 😅 Do you know if OCRmyPDF can also handle creating pdf/a files from images/pdfs which contains rotated text ?

Hi @felixT2K
Yes I tested the blogpost code with image and single page pdf.
Yes, I tested OCRmyPDF for auto page rotation and what I found that most of the times it doesn't work perfectly.

sanjay-nit Oct 14, 2024
Author

I think we should update our tutorial notebook also to use OCRmyPDF instead of the own implementation 🤔 Is tesseract a fixed dependency in OCRmyPDF or optional ?
@felixT2K
That would be great 😄 😄
They use tesseract internally but have option to write plugins. OCRmyPDF plugin

felixdittrich92 Oct 16, 2024
Maintainer

@sanjay-nit mindee/notebooks#20
CC @odulcy-mindee

hanshupe007 · 2024-10-29T14:57:47Z

hanshupe007
Oct 29, 2024

Is there any TrOCR integration planned for docTr or other support for handwritten texts?

4 replies

felixdittrich92 Oct 29, 2024
Maintainer

Hi @hanshupe007 👋🏼,

TrOCR support isn't planned because we have architectures like parseq | master | sar_resnet31 which definitely have the capacity to handle handwritten recognition :)
But you can easily combine docTR's detection_predictor with TrOCR on your own.
In general handwritten support is planned but it's a long term issue because we don't have such a dataset unfortunately.

hanshupe007 Oct 30, 2024

Thanks! Before trying it out, does it mean that the pretrained models mentioned above work on handwritten texts, or only if I would train in on a related dataset?

felixdittrich92 Oct 30, 2024
Maintainer

For some good readable (printed writing) it should already work partially with the current pretrained models.
For a "robust" handwritten model which can also handle cursive writing you need a dataset to fine tune on.

felixdittrich92 Oct 30, 2024
Maintainer

I found an dataset which looks interesting for such first experiments:

https://huggingface.co/datasets/shreyansh1347/GNHK-Synthetic-OCR-Dataset/viewer/default/test

This would only require to convert into the docTR required format :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to run only detection model to get the bounding boxes. (Handwritten text) #1734

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 16 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

How to run only detection model to get the bounding boxes. (Handwritten text) #1734

sanjay-nit Sep 26, 2024

Replies: 5 comments · 16 replies

felixdittrich92 Sep 27, 2024 Maintainer

sanjay-nit Sep 27, 2024 Author

felixdittrich92 Sep 27, 2024 Maintainer

sanjay-nit Sep 29, 2024 Author

felixdittrich92 Sep 30, 2024 Maintainer

felixdittrich92 Sep 30, 2024 Maintainer

felixT2K Oct 14, 2024

felixT2K Oct 14, 2024

sanjay-nit Oct 14, 2024 Author

sanjay-nit Oct 14, 2024 Author

felixdittrich92 Oct 16, 2024 Maintainer

hanshupe007 Oct 29, 2024

felixdittrich92 Oct 29, 2024 Maintainer

hanshupe007 Oct 30, 2024

felixdittrich92 Oct 30, 2024 Maintainer

felixdittrich92 Oct 30, 2024 Maintainer

sanjay-nit
Sep 26, 2024

Replies: 5 comments 16 replies

felixdittrich92
Sep 27, 2024
Maintainer

sanjay-nit Sep 27, 2024
Author

felixdittrich92 Sep 27, 2024
Maintainer

sanjay-nit
Sep 29, 2024
Author

felixdittrich92
Sep 30, 2024
Maintainer

felixdittrich92
Sep 30, 2024
Maintainer

sanjay-nit Oct 14, 2024
Author

sanjay-nit Oct 14, 2024
Author

felixdittrich92 Oct 16, 2024
Maintainer

hanshupe007
Oct 29, 2024

felixdittrich92 Oct 29, 2024
Maintainer

felixdittrich92 Oct 30, 2024
Maintainer

felixdittrich92 Oct 30, 2024
Maintainer