Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

表格识别速度非常慢,比不开表格模型慢了十几倍 #926

Closed
charliedream1 opened this issue Nov 11, 2024 · 6 comments
Closed
Labels
bug Something isn't working

Comments

@charliedream1
Copy link

Description of the bug | 错误描述

表格识别速度非常慢,比不开表格模型慢了十几倍。请问有哪些配置需要注意?目前没有装paddle-gpu版,看着表格不需要paddle就没装,是这个导致的?还是还有别的设置需要注意。表格识别时间都在200-400,导致10页的PDF,20分钟都转不完。

How to reproduce the bug | 如何复现

开启和关闭表格识别,对比时间

Operating system | 操作系统

Linux

Python version | Python 版本

3.10

Software version | 软件版本 (magic-pdf --version)

0.9.x

Device mode | 设备模式

cuda

@charliedream1 charliedream1 added the bug Something isn't working label Nov 11, 2024
@myhloli
Copy link
Collaborator

myhloli commented Nov 11, 2024

本周我们将会发布0.9.3,接入了rapid table表格识别,单表识别在1~2s,速度更快,效果更准。

@myhloli myhloli closed this as completed Nov 11, 2024
@charliedream1
Copy link
Author

charliedream1 commented Nov 11, 2024 via email

@charliedream1
Copy link
Author

charliedream1 commented Nov 11, 2024 via email

@myhloli
Copy link
Collaborator

myhloli commented Nov 11, 2024

测试了下,任何方案对复杂表格的解析效果都很差,目前只能优先保证简单表格的解析功能。

@charliedream1
Copy link
Author

复杂表格用markdown好像不太好表示,如果用html来表示,是否能更好一些?

@myhloli
Copy link
Collaborator

myhloli commented Nov 12, 2024

复杂表格用markdown好像不太好表示,如果用html来表示,是否能更好一些?

目前表格就是使用html表示的

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants