Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Option to Disable Specific Models (e.g., Formula Detection) for Faster Execution #140

Open
Leopold-iziwork opened this issue Oct 8, 2024 · 0 comments

Comments

@Leopold-iziwork
Copy link

Leopold-iziwork commented Oct 8, 2024

Hello,

First of all, thank you for developing PDF Extract Kit, it is a great tool for extracting data from PDFs!

I would like to propose a feature that could improve the performance of the tool in certain use cases. Specifically, it would be useful to add a flag or option that allows users to disable certain models, such as formula detection, during the PDF extraction process.

Problem

In some scenarios, users may not need every model to be applied during the extraction process (e.g., formula detection). Currently, it seems that all models are executed by default, which can increase the runtime of the extraction process unnecessarily for cases where certain models aren’t needed.

Proposed Solution

Add a command-line option (or configuration flag) that allows users to selectively enable or disable specific models. For instance:

  • A flag like --no-formula to skip formula detection.
  • Alternatively, a general flag system where the user can specify which models they want to run (e.g., --models text,table,figure).

Expected Benefits

  • Improved performance: By skipping certain models, the execution time for PDF processing can be reduced significantly in use cases where full model execution isn’t necessary.
  • Flexibility: Users would have more control over which models to use, tailoring the tool to their specific needs.

Thank you for considering this feature request. It would be a great enhancement for performance optimization in scenarios where only a subset of models is needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant