Text Detection using the detection_predictor model is producing incorrect sequence of bounding boxes #1754
-
Hello, I tried the detection_predictor model and it was working well, however I noticed after putting the text of the sequence in which the boxes were predicted(correct if me I am wrong or missing something). And this is the output. I was planning to build a pipeline for it and use TROCR for the recognition model and worked fine until I was parsing it into a string, receiving rather unsequenced words. Essentially, I just want the model to be sequenced from top to bottom, left to right properly, so I was wondering if there was a built in function for this already similar to when using the ocr-predictor, results.export(). That is all thank you. This was the code I used which were provided in a similar discussion before def _to_absolute(geom, img_shape: tuple[int, int]) -> list[list[int]]: image_path = r"name_section.jpg" det_predictor = detection_predictor( .cuda().half()det_predictor.model.postprocessor.bin_thresh = 0.01 docs = DocumentFile.from_images(image_path) results = det_predictor(docs) for doc, res in zip(docs, results):
cv2.imwrite("output_with_labels.jpg", image) |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 5 replies
-
Hi @stiffmeister923 👋, The sorting algorithm is not part of the Hope this helps 🤗 |
Beta Was this translation helpful? Give feedback.
-
Hello @felixdittrich92 I've got this error when I add the ---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Cell In[7], [line 14](vscode-notebook-cell:?execution_count=7&line=14)
[12](vscode-notebook-cell:?execution_count=7&line=12) # Convert relative to absolute pixel coordinates
[13](vscode-notebook-cell:?execution_count=7&line=13) points = np.array(_to_absolute(coords, img_shape), dtype=np.int32).reshape((-1, 1, 2))
---> [14](vscode-notebook-cell:?execution_count=7&line=14) doc_builder._resolve_lines(points)
[15](vscode-notebook-cell:?execution_count=7&line=15) # Draw the bounding box on the image
[16](vscode-notebook-cell:?execution_count=7&line=16) cv2.polylines(single_img, [points], isClosed=True, color=(255, 0, 0), thickness=2)
File ~lib/python3.11/site-packages/doctr/models/builder.py:120, in DocumentBuilder._resolve_lines(self, boxes)
[109](/lib/python3.11/site-packages/doctr/models/builder.py:109) """Order boxes to group them in lines
[110](/lib/python3.11/site-packages/doctr/models/builder.py:110)
[111](/lib/python3.11/site-packages/doctr/models/builder.py:111) Args:
(...)
[117](/lib/python3.11/site-packages/doctr/models/builder.py:117) nested list of box indices
[118](lib/python3.11/site-packages/doctr/models/builder.py:118) """
[119](lib/python3.11/site-packages/doctr/models/builder.py:119) # Sort boxes, and straighten the boxes if they are rotated
--> [120](/lib/python3.11/site-packages/doctr/models/builder.py:120) idxs, boxes = self._sort_boxes(boxes)
[122](/lib/python3.11/site-packages/doctr/models/builder.py:122) # Compute median for boxes heights
[123](/lib/python3.11/site-packages/doctr/models/builder.py:123) y_med = np.median(boxes[:, 3] - boxes[:, 1])
File ~/lib/python3.11/site-packages/doctr/models/builder.py:61, in DocumentBuilder._sort_boxes(boxes)
[45](/lib/python3.11/site-packages/doctr/models/builder.py:45) """Sort bounding boxes from top to bottom, left to right
[46](/lib/python3.11/site-packages/doctr/models/builder.py:46)
[47](/lib/python3.11/site-packages/doctr/models/builder.py:47) Args:
(...)
[56](/lib/python3.11/site-packages/doctr/models/builder.py:56) so that we fit the lines afterwards to the straigthened page
[57](/lib/python3.11/site-packages/doctr/models/builder.py:57) """
[58](/lib/python3.11/site-packages/doctr/models/builder.py:58) if boxes.ndim == 3:
[59](/lib/python3.11/site-packages/doctr/models/builder.py:59) boxes = rotate_boxes(
[60](/lib/python3.11/site-packages/doctr/models/builder.py:60) loc_preds=boxes,
---> [61](/lib/python3.11/site-packages/doctr/models/builder.py:61) angle=-estimate_page_angle(boxes),
[62](/lib/python3.11/site-packages/doctr/models/builder.py:62) orig_shape=(1024, 1024),
[63](/lib/python3.11/site-packages/doctr/models/builder.py:63) min_angle=5.0,
[64](/lib/python3.11/site-packages/doctr/models/builder.py:64) )
[65](/lib/python3.11/site-packages/doctr/models/builder.py:65) boxes = np.concatenate((boxes.min(1), boxes.max(1)), -1)
[66](/lib/python3.11/site-packages/doctr/models/builder.py:66) return (boxes[:, 0] + 2 * boxes[:, 3] / np.median(boxes[:, 3] - boxes[:, 1])).argsort(), boxes
File ~/lib/python3.11/site-packages/doctr/utils/geometry.py:380, in estimate_page_angle(polys)
[376](/lib/python3.11/site-packages/doctr/utils/geometry.py:376) """Takes a batch of rotated previously ORIENTED polys (N, 4, 2) (rectified by the classifier) and return the
[377](/lib/python3.11/site-packages/doctr/utils/geometry.py:377) estimated angle ccw in degrees
[378](/lib/python3.11/site-packages/doctr/utils/geometry.py:378) """
[379](/lib/python3.11/site-packages/doctr/utils/geometry.py:379) # Compute mean left points and mean right point with respect to the reading direction (oriented polygon)
--> [380](/lib/python3.11/site-packages/doctr/utils/geometry.py:380) xleft = polys[:, 0, 0] + polys[:, 3, 0]
[381](/lib/python3.11/site-packages/doctr/utils/geometry.py:381) yleft = polys[:, 0, 1] + polys[:, 3, 1]
[382](/lib/python3.11/site-packages/doctr/utils/geometry.py:382) xright = polys[:, 1, 0] + polys[:, 2, 0]
IndexError: index 3 is out of bounds for axis 1 with size 1 Any idea where it can come from ? |
Beta Was this translation helpful? Give feedback.
-
Hi @stiffmeister923 @agombert 👋, If you really want to work directly with the For example:
|
Beta Was this translation helpful? Give feedback.
Hi @stiffmeister923 @agombert 👋,
If you really want to work directly with the
DocumentBuilder
to have all it's functionality (boxes sorting,.show()
,.export()
, etc. i think the easiest way would be to mock the recognition results:For example: