We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
解析pdf时报错 app-1 | 2024-11-06 10:42:24.790 | INFO | magic_pdf.model.pdf_extract_kit:call:490 - table time: 0.0 app-1 | │ │ │ │ └ b'%PDF-1.7\n%\xe4\xe3\xcf\xd2\n4 0 obj\n<</Type/XObject\n/Subtype/Form\n/FormType 1\n/Matrix[1 0 0 1 0 0]\n/BBox[0 0 595 841]... app-1 | │ │ │ └ <magic_pdf.pipe.OCRPipe.OCRPipe object at 0x7f19c9e5f370> app-1 | │ │ └ <function doc_analyze at 0x7f1b4175c160> app-1 | │ └ [] app-1 | └ <magic_pdf.pipe.OCRPipe.OCRPipe object at 0x7f19c9e5f370> app-1 | File "/opt/mineru_venv/lib/python3.10/site-packages/magic_pdf/model/doc_analyze_by_custom_model.py", line 166, in doc_analyze app-1 | result = custom_model(img) app-1 | │ └ array([[[255, 255, 255], app-1 | │ [255, 255, 255], app-1 | │ [255, 255, 255], app-1 | │ ..., app-1 | │ [255, 255, 255], app-1 | │ [255... app-1 | └ <magic_pdf.model.pdf_extract_kit.CustomPEKModel object at 0x7f19c9e5dcc0> app-1 | File "/opt/mineru_venv/lib/python3.10/site-packages/magic_pdf/model/pdf_extract_kit.py", line 468, in call app-1 | html_code = self.table_model.img2html(new_image) app-1 | │ │ │ └ <PIL.Image.Image image mode=RGB size=1283x457 at 0x7F19C9E5FA60> app-1 | │ │ └ <function ppTableModel.img2html at 0x7f1a144c48b0> app-1 | │ └ <magic_pdf.model.ppTableModel.ppTableModel object at 0x7f19e1bfbb20> app-1 | └ <magic_pdf.model.pdf_extract_kit.CustomPEKModel object at 0x7f19c9e5dcc0> app-1 | File "/opt/mineru_venv/lib/python3.10/site-packages/magic_pdf/model/ppTableModel.py", line 42, in img2html app-1 | pred_res, _ = self.table_sys(image) app-1 | │ │ └ array([[[255, 255, 255], app-1 | │ │ [255, 255, 255], app-1 | │ │ [255, 255, 255], app-1 | │ │ ..., app-1 | │ │ [ 67, 67, 67], app-1 | │ │ [ 67... app-1 | │ └ <paddleocr.ppstructure.table.predict_table.TableSystem object at 0x7f19e1bfbd00> app-1 | └ <magic_pdf.model.ppTableModel.ppTableModel object at 0x7f19e1bfbb20> app-1 | File "/opt/mineru_venv/lib/python3.10/site-packages/paddleocr/ppstructure/table/predict_table.py", line 100, in call app-1 | pred_html = self.match(structure_res, dt_boxes, rec_res) app-1 | │ │ │ │ └ [] app-1 | │ │ │ └ array([], dtype=float64) app-1 | │ │ └ (['', '', '
这是需要解析的pdf的两张截图,不方便整体上传
Linux
3.10
0.9.x
cuda
The text was updated successfully, but these errors were encountered:
复现需要提供pdf文档,方便把这两页单独截出来生成一个新的pdf上传一下吗?
Sorry, something went wrong.
No branches or pull requests
Description of the bug | 错误描述
解析pdf时报错
', '', '', '', '', '', '', '', '</e...app-1 | 2024-11-06 10:42:24.790 | INFO | magic_pdf.model.pdf_extract_kit:call:490 - table time: 0.0
app-1 | │ │ │ │ └ b'%PDF-1.7\n%\xe4\xe3\xcf\xd2\n4 0 obj\n<</Type/XObject\n/Subtype/Form\n/FormType 1\n/Matrix[1 0 0 1 0 0]\n/BBox[0 0 595 841]...
app-1 | │ │ │ └ <magic_pdf.pipe.OCRPipe.OCRPipe object at 0x7f19c9e5f370>
app-1 | │ │ └ <function doc_analyze at 0x7f1b4175c160>
app-1 | │ └ []
app-1 | └ <magic_pdf.pipe.OCRPipe.OCRPipe object at 0x7f19c9e5f370>
app-1 | File "/opt/mineru_venv/lib/python3.10/site-packages/magic_pdf/model/doc_analyze_by_custom_model.py", line 166, in doc_analyze
app-1 | result = custom_model(img)
app-1 | │ └ array([[[255, 255, 255],
app-1 | │ [255, 255, 255],
app-1 | │ [255, 255, 255],
app-1 | │ ...,
app-1 | │ [255, 255, 255],
app-1 | │ [255...
app-1 | └ <magic_pdf.model.pdf_extract_kit.CustomPEKModel object at 0x7f19c9e5dcc0>
app-1 | File "/opt/mineru_venv/lib/python3.10/site-packages/magic_pdf/model/pdf_extract_kit.py", line 468, in call
app-1 | html_code = self.table_model.img2html(new_image)
app-1 | │ │ │ └ <PIL.Image.Image image mode=RGB size=1283x457 at 0x7F19C9E5FA60>
app-1 | │ │ └ <function ppTableModel.img2html at 0x7f1a144c48b0>
app-1 | │ └ <magic_pdf.model.ppTableModel.ppTableModel object at 0x7f19e1bfbb20>
app-1 | └ <magic_pdf.model.pdf_extract_kit.CustomPEKModel object at 0x7f19c9e5dcc0>
app-1 | File "/opt/mineru_venv/lib/python3.10/site-packages/magic_pdf/model/ppTableModel.py", line 42, in img2html
app-1 | pred_res, _ = self.table_sys(image)
app-1 | │ │ └ array([[[255, 255, 255],
app-1 | │ │ [255, 255, 255],
app-1 | │ │ [255, 255, 255],
app-1 | │ │ ...,
app-1 | │ │ [ 67, 67, 67],
app-1 | │ │ [ 67...
app-1 | │ └ <paddleocr.ppstructure.table.predict_table.TableSystem object at 0x7f19e1bfbd00>
app-1 | └ <magic_pdf.model.ppTableModel.ppTableModel object at 0x7f19e1bfbb20>
app-1 | File "/opt/mineru_venv/lib/python3.10/site-packages/paddleocr/ppstructure/table/predict_table.py", line 100, in call
app-1 | pred_html = self.match(structure_res, dt_boxes, rec_res)
app-1 | │ │ │ │ └ []
app-1 | │ │ │ └ array([], dtype=float64)
app-1 | │ │ └ (['', '', '
app-1 | │ └ <ppstructure.table.table_master_match.TableMasterMatcher object at 0x7f19c9d3fb80>
app-1 | └ <paddleocr.ppstructure.table.predict_table.TableSystem object at 0x7f19e1bfbd00>
app-1 | File "/opt/mineru_venv/lib/python3.10/site-packages/paddleocr/ppstructure/table/table_master_match.py", line 949, in call
app-1 | match_results = self.match()
app-1 | │ └ <function Matcher.match at 0x7f1a1448cd30>
app-1 | └ <ppstructure.table.table_master_match.TableMasterMatcher object at 0x7f19c9d3fb80>
app-1 | File "/opt/mineru_venv/lib/python3.10/site-packages/paddleocr/ppstructure/table/table_master_match.py", line 769, in match
app-1 | get_bboxes_list(end2end_result, structure_master_result)
app-1 | │ │ └ {'text': ',,,,,,,,,,,,<e...
app-1 | │ └ []
app-1 | └ <function get_bboxes_list at 0x7f1a1448c3a0>
app-1 | File "/opt/mineru_venv/lib/python3.10/site-packages/paddleocr/ppstructure/table/table_master_match.py", line 302, in get_bboxes_list
app-1 | xywh_bbox = xyxy2xywh(src_bboxes)
app-1 | │ └ array([], dtype=float64)
app-1 | └ <function xyxy2xywh at 0x7f1a14693d00>
app-1 | File "/opt/mineru_venv/lib/python3.10/site-packages/paddleocr/ppstructure/table/table_master_match.py", line 71, in xyxy2xywh
app-1 | new_bboxes[0] = bboxes[0] + (bboxes[2] - bboxes[0]) / 2
app-1 | │ │ │ └ array([], dtype=float64)
app-1 | │ │ └ array([], dtype=float64)
app-1 | │ └ array([], dtype=float64)
app-1 | └ array([], dtype=float64)
app-1 |
app-1 | IndexError: index 0 is out of bounds for axis 0 with size 0
app-1 | INFO: 10.0.104.3:53724 - "POST /pdf_parse?parse_method=ocr&is_json_md_dump=True&output_dir=output HTTP/1.1" 500 Internal Server Error
How to reproduce the bug | 如何复现
这是需要解析的pdf的两张截图,不方便整体上传
Operating system | 操作系统
Linux
Python version | Python 版本
3.10
Software version | 软件版本 (magic-pdf --version)
0.9.x
Device mode | 设备模式
cuda
The text was updated successfully, but these errors were encountered: