Feature/assume straight text #1723

milosacimovic · 2024-09-12T21:12:20Z

Modifying the ocr_predictor API to support assume_straight_text as an argument.
When used with assume_straight_pages=False this reduces the reliance on an unreliable crop orientation model when the text is almost straight and additionally reduces speed of execution. It should alleviate the issue mentioned in #1455 if the use-case is one where the text is straight i.e. no rotations of 90, 180 and 270 degrees.

The main contribution to the pipeline is the logic around a new geometry function which extracts the crops while dewarping the images based on the corners of the text detection, which returns polygons (when assume_straight_pages=False and assume_straight_text=True).

… reduces the relience on unreliable crop orientation models and reduces speed of execution

…ht_text to a detection_predictor

codecov · 2024-09-13T03:27:53Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.45%. Comparing base (9045dcf) to head (f1128b7).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1723      +/-   ##
==========================================
+ Coverage   96.40%   96.45%   +0.05%     
==========================================
  Files         164      164              
  Lines        7782     7818      +36     
==========================================
+ Hits         7502     7541      +39     
+ Misses        280      277       -3

Flag	Coverage Δ
unittests	`96.45% <100.00%> (+0.05%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

felixdittrich92

Hi @milosacimovic 👋
Thanks a lot for the quick PR 👍

A problem i see is the increasing complexity with ocr_predictor / kie_predictor .
From my experience adding another assume_ kwarg would users more and more confuse.
Additional it makes it only possible to disable the crop_orientation_predictor actually.

So two suggestions from my view:

Option 1:
Advantages:

We can avoid that the orientation models needs to be initialized
Clear about what it does

Disadvantages:

Needs also modifications in demo and api
2 additional ocr_predictor / kie_predictor args (could also be passed as kwargs maybe !?)

In this case disable_page_orientation has only an effect in combination with assume_straight_pages=False and or detect_orientation=True and or straighten_pages=True where it then can handle only small rotations in the range between ~ -45 and 45 degrees
And disable_crop_orientationwould have only an effect with assume_straight_pages=False so that it everytime results in a "prediction" of 0 and 1.0 as probability (or None)

model = ocr_predictor(
    pretrained=True,
    assume_straight_pages=False,
    straighten_pages=True,
    detect_orientation=True,
    disable_page_orientation=True, # maybe as kwarg ? Then can handle only small rotations
    disable_crop_orientation=True, # maybe as kwarg ? Then returns always 0 and as prob 1.0 or None (prefered None ?)
)

Option 2:
Advantages:

Encapsulated from ocr_predictor and cleaner handling by specific predictor

Disadvantages:

Needs also modifications in demo and api
Possible no way that the orientation models are loaded once into RAM before removal

predictor = ocr_predictor(
    pretrained=True,
    assume_straight_pages=False,
    straighten_pages=True,
    detect_orientation=True,
)

# Overwrite the orientation models - disable
predictor.crop_orientation_predictor = crop_orientation_predictor(disabled=True)
predictor.page_orientation_predictor = page_orientation_predictor(disabled=True)

pseudo code:

doctr/doctr/models/classification/zoo.py

Line 37 in 9045dcf

    
           def _orientation_predictor(arch: Any, pretrained: bool, model_type: str, **kwargs: Any) -> OrientationPredictor:

if kwargs.get("disabled", False):
     return OrientationPredictor(None, None)

doctr/doctr/models/classification/predictor/pytorch.py

Line 18 in 9045dcf

class OrientationPredictor(nn.Module):

class OrientationPredictor(nn.Module):

    def __init__(
        self,
        pre_processor: Optional[PreProcessor] = None,
        model: Optional[nn.Module] = None,
    ) -> None:
        super().__init__()
        self.pre_processor = pre_processor
        self.model = model.eval() if model else model

    @torch.inference_mode()
    def forward(
        self,
        inputs: List[Union[np.ndarray, torch.Tensor]],
    ) -> List[Union[List[int], List[float]]]:
        # Dimension check
        if any(input.ndim != 3 for input in inputs):
            raise ValueError("incorrect input shape: all inputs are expected to be multi-channel 2D images.")

        if model is None:
            in_length = len(inputs)
            return [[0] * in_length, [0] * in_length, [1.0] in_length]

        processed_batches = self.pre_processor(inputs)
        _params = next(self.model.parameters())
        self.model, processed_batches = set_device_and_dtype(
            self.model, processed_batches, _params.device, _params.dtype
        )
        predicted_batches = [self.model(batch) for batch in processed_batches]
        # confidence
        probs = [
            torch.max(torch.softmax(batch, dim=1), dim=1).values.cpu().detach().numpy() for batch in predicted_batches
        ]
        # Postprocess predictions
        predicted_batches = [out_batch.argmax(dim=1).cpu().detach().numpy() for out_batch in predicted_batches]

        class_idxs = [int(pred) for batch in predicted_batches for pred in batch]
        classes = [int(self.model.cfg["classes"][idx]) for idx in class_idxs]
        confs = [round(float(p), 2) for prob in probs for p in prob]

        return [class_idxs, classes, confs]

Only quick and dirty to "visualize" the idea i have in mind 😅

If we could realize option 1 and use kwargs to pass these values i think i would orefer the first idea.

Additional in every case we need some entry in the documentation for the new logic.

@milosacimovic @odulcy-mindee wdyt ? 🤗

felixdittrich92 · 2024-09-13T06:45:52Z

So i totally agree with the feature but we should take care of both the crop and page orientation predictors and we should take care not to miss stuff:

Needs to work for both ocr_predictor and kie_predictor
New logic needs also to be added in demo and api
An well formed entry in the documentation to describe and showcase the new logic. (Should be part in https://github.com/mindee/doctr/blob/main/docs/source/using_doctr/using_models.rst)

😃

Converting to draft in the meanwhile 👍

milosacimovic · 2024-09-13T07:02:20Z

Hi @felixdittrich92 ,
Thank you so much for considering my PR so quickly and for the immensely valuable feedback on the API changes.
I will look into the first option. However, what I would still like to get from you is your thoughts on the subtle differences between

def extract_dewarped_crops(
    img: np.ndarray, polys: np.ndarray, dtype=np.float32, channels_last: bool = True
) -> List[np.ndarray]:
    """Created cropped images from list of skewed/warped bounding boxes,
    but containing straight text

and currently used

def extract_rcrops(
    img: np.ndarray, polys: np.ndarray, dtype=np.float32, channels_last: bool = True
) -> List[np.ndarray]:
    """Created cropped images from list of rotated bounding boxes

From my experience the extract_rcrops has issues when extracting crops from slightly rotated documents (-45, 45) where it rotates the crops even though it should keep them straight.

This was actually my main complaint about the current implementation.

felixdittrich92 · 2024-09-13T07:18:42Z

Hi @felixdittrich92 , Thank you so much for considering my PR so quickly and for the immensely valuable feedback on the API changes. I will look into the first option. However, what I would still like to get from you is your thoughts on the subtle differences between
def extract_dewarped_crops(
    img: np.ndarray, polys: np.ndarray, dtype=np.float32, channels_last: bool = True
) -> List[np.ndarray]:
    """Created cropped images from list of skewed/warped bounding boxes,
    but containing straight text
and currently used
def extract_rcrops(
    img: np.ndarray, polys: np.ndarray, dtype=np.float32, channels_last: bool = True
) -> List[np.ndarray]:
    """Created cropped images from list of rotated bounding boxes
From my experience the extract_rcrops has issues when extracting crops from slightly rotated documents (-45, 45) where it rotates the crops even though it should keep them straight.

This was actually my main complaint about the current implementation.

I will take a look into asap 👍 But all the stuff points to the same issue so we can combine both in your PR 👍

felixdittrich92 · 2024-09-13T10:44:05Z

@milosacimovic Tested your function and yeah it works better for smaller rotated pages (between -45 and 45).
It's also a bit slower but not as much.
I think that's something to combine:

With disable_page_orientation=True (where we expect only small rotated pages) + your function
Otherwise: the current function

Wdyt ?

felixdittrich92 · 2024-09-19T08:52:56Z

Hi @milosacimovic 👋,

I quickly prototyped this feature.
Wdyt about the changes: main...felixdittrich92:doctr:disable-orient-prototype ?

Feel free to test and update your PR with my changes if everything works as expected then only the docs part (maybe some optimizations from a users view 😅) and maybe additional tests + mypy/format fixes would be open 🤗

felixdittrich92 · 2024-09-27T09:27:44Z

#1735

milosacimovic added 3 commits September 12, 2024 22:41

Modifying the ocr_predictor API to support assume_straight_text which…

00f866e

… reduces the relience on unreliable crop orientation models and reduces speed of execution

fix: a fix in a test for pytorch model zoo; wrongly set assume_straig…

6ca5898

…ht_text to a detection_predictor

fix: a fix in a test for pytorch model zoo; wrongly set assume_straig…

f1128b7

…ht_text to a detection_predictor

felixdittrich92 requested changes Sep 13, 2024

View reviewed changes

felixdittrich92 requested a review from odulcy-mindee September 13, 2024 06:41

felixdittrich92 self-assigned this Sep 13, 2024

felixdittrich92 added this to the 0.10.0 milestone Sep 13, 2024

felixdittrich92 marked this pull request as draft September 13, 2024 06:47

felixdittrich92 mentioned this pull request Sep 27, 2024

Disable page and crop orientation #1735

Merged

felixdittrich92 closed this Sep 27, 2024

felixdittrich92 linked an issue Sep 27, 2024 that may be closed by this pull request

Flipped text recognition prediction. #1455

Closed

felixdittrich92 removed a link to an issue Sep 27, 2024

Flipped text recognition prediction. #1455

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/assume straight text #1723

Feature/assume straight text #1723

milosacimovic commented Sep 12, 2024

codecov bot commented Sep 13, 2024 •

edited

Loading

felixdittrich92 left a comment

felixdittrich92 commented Sep 13, 2024 •

edited

Loading

milosacimovic commented Sep 13, 2024

felixdittrich92 commented Sep 13, 2024

felixdittrich92 commented Sep 13, 2024

felixdittrich92 commented Sep 19, 2024 •

edited

Loading

felixdittrich92 commented Sep 27, 2024

Feature/assume straight text #1723

Feature/assume straight text #1723

Conversation

milosacimovic commented Sep 12, 2024

codecov bot commented Sep 13, 2024 • edited Loading

Codecov Report

felixdittrich92 left a comment

Choose a reason for hiding this comment

felixdittrich92 commented Sep 13, 2024 • edited Loading

milosacimovic commented Sep 13, 2024

felixdittrich92 commented Sep 13, 2024

felixdittrich92 commented Sep 13, 2024

felixdittrich92 commented Sep 19, 2024 • edited Loading

felixdittrich92 commented Sep 27, 2024

codecov bot commented Sep 13, 2024 •

edited

Loading

felixdittrich92 commented Sep 13, 2024 •

edited

Loading

felixdittrich92 commented Sep 19, 2024 •

edited

Loading