-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release 0.9.3 #969
Release 0.9.3 #969
Conversation
docs(README): update badges
- Implement xycut algorithm to sort blocks when layoutreader fails - Add recursive_xy_cut function to perform the xycut algorithm- Update pdf_parse_union_core_v2.py to use xycut when layoutreader fails - Modify draw_bbox.py to handle cases where layoutreader fails to sort blocks
feat(model): add xycut algorithm for block sorting
- Decrease the maximum line count from 512 to 316 for layoutreader
- Lower the line count threshold from 316 to 200 to ensure compatibility - This change aims to prevent potential issues with layoutreader's maximum line support
refactor(pdf_parse): adjust line count threshold for layoutreader
Feat/add en docs
feat: using next_docs
- Add RapidTable model support for table recognition - Update table model configuration and initialization - Modify table recognition process to use RapidTable when specified - Add RapidTable dependency to setup.py
- Change the default table model from TABLE_MASTER to RAPID_TABLE
feat(table): integrate RapidTable model for table recognition
- Add missing '.jpg' file type to the list of allowed file types for upload
fix(gradio-app): add missing file type in upload
… output - Add orig_model_list parameter to maintain original model data - Deep copy model_json and pipe.model_list to preserve data integrity - Update json_md_dump function call to include orig_model_list - Improve condition check for empty model_json
refactor(magic_pdf_parse_main): optimize model data handling and JSON output
Modify the test directory
- Update test_image2html to use unittest framework - Add more assertions
test(table): improve ppTableModel test coverage
- Integrate RapidOCR with RapidTable model for table recognition - Improve memory management for devices with <= 8GB VRAM - Update table recognition process to use RapidOCR for RapidTable - Add rapidocr-paddle dependency in setup.py
feat(table): add RapidOCR support for RapidTable model
- Add DocLayout-YOLO repository link - Add RapidTable repository link
fix: remove classes hierarchy diagram
docs(README_ja-JP.md): update warning message and remove outdated content
fix(para_split_v3): Fix IndexError in para_split_v3.py for empty line handling
Style/docs
- Update the URL for downloading the model setup script in Dockerfile - Upgrade struct-eqtable to version 0.3.2 and remove pypandoc - Add new dependencies: einops, accelerate, doclayout_yolo, rapidocr-paddle, and rapid_table
build(Dockerfile): update model download script and dependencies
- Add digit check for single-character content to avoid adding unnecessary spaces
fix(ocr_mkcontent): improve handling of single-character content #937
Co-authored-by: xu rui <[email protected]>
…tial PDFs due to file corruption or non-standard format by forcing a re-print.
fix(parse_pipeline): Resolve post-processing exceptions caused by partial PDFs due to file corruption or non-standard format by forcing a re-print.
- Rename ppTableModel to TableMasterPaddleModel in test_tablemaster.py
refactor(model): rename and restructure model modules
docs:update docs for 0.9.3
Dev to 0.9.3
docs(README): update project references and translations
Dev to 0.9.3
I have read the CLA Document and I hereby sign the CLA 1 out of 3 committers have signed the CLA. |
Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.
Motivation
Please describe the motivation of this PR and the goal you want to achieve through this PR.
Modification
Please briefly describe what modification is made in this PR.
BC-breaking (Optional)
Does the modification introduce changes that break the backward compatibility of the downstream repositories?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.
Use cases (Optional)
If this PR introduces a new feature, it is better to list some use cases here and update the documentation.
Checklist
Before PR:
After PR: