Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getitng issue in get_text() #4037

Open
ashifaliclientpoint opened this issue Nov 11, 2024 · 5 comments
Open

Getitng issue in get_text() #4037

ashifaliclientpoint opened this issue Nov 11, 2024 · 5 comments

Comments

@ashifaliclientpoint
Copy link

Description of the bug

I am using this library to fetch indexing of some tags, everything is working fine. But in a specific file i am getting an issue.
In my file i have tags in the following index c:a:r, i:a:o, i:a:o
but when I am trying to fetch index of these tags from the file, it returns below index.
i:a:o, i:a:o, c:a:r

Here is my python script
import fitz

file_path = "checkbox-issue.pdf"
doc = fitz.open(file_path)

fitz.TOOLS.set_small_glyph_heights(True)
for page in doc:
text = page.get_text()
print(text)

Please provide me any solution if i am doing something wrong.
Thanks

How to reproduce the bug

use below script

import fitz

file_path = "checkbox-issue.pdf"
doc = fitz.open(file_path)

fitz.TOOLS.set_small_glyph_heights(True)
for page in doc:
text = page.get_text()
print(text)

PyMuPDF version

1.23.x or earlier

Operating system

Linux

Python version

3.9

@JorjMcKie
Copy link
Collaborator

The example file for problem reproduction is missing!

@JorjMcKie
Copy link
Collaborator

If text extraction returns these strings, then they are there and it is no bug.

@ashifaliclientpoint
Copy link
Author

allow me some time to arrange the example file. It is Customer document so I need to arrange this.

Thanks

@JorjMcKie
Copy link
Collaborator

Please never submit an issue that we cannot reproduce based on its content.
If you have confidential data that you cannot attach in the issue thread, you can instead a maintainer's email address - e.g. mine. Then, confidentiality is guaranteed.

@JorjMcKie
Copy link
Collaborator

I have not received a reproducing file yet. Please be aware that we will close the issue tomorrow if we do not receive required data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants