Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[reconstitution] Improve synthesize output quality #1528

Closed
tzktz opened this issue Mar 26, 2024 · 30 comments · Fixed by #1750
Closed

[reconstitution] Improve synthesize output quality #1528

tzktz opened this issue Mar 26, 2024 · 30 comments · Fixed by #1750
Labels
good first issue Good for newcomers help wanted Extra attention is needed module: utils Related to doctr.utils type: enhancement Improvement
Milestone

Comments

@tzktz
Copy link

tzktz commented Mar 26, 2024

          @felixdittrich92 i have face result image is not upto quality...fonts are breaks in result image..

model = ocr_predictor(pretrained=True)
# PDF
doc = DocumentFile.from_pdf("bankstatement.pdf")
# Analyze
result = model(doc)
import matplotlib.pyplot as plt
plt.imshow(result.synthesize()[0]); plt.axis('off'); plt.show()

see the result image..
Figure_1

Originally posted by @tzktz in #1525 (comment)

@tzktz tzktz changed the title @felixdittrich92 i have face result image is not upto quality...fonts are breaks in result image.. i have face result image is not upto quality...fonts are breaks in result image.. Mar 26, 2024
@felixdittrich92
Copy link
Contributor

Yeah we can maybe align the y-coords between line elements (words) and add some small horizontal default padding between detections
CC @odulcy-mindee

@tzktz
Copy link
Author

tzktz commented Mar 26, 2024

Yeah we can maybe align the y-coords between line elements (words) and add some small horizontal default padding between detections CC @odulcy-mindee

how to change the font_family ? @felixdittrich92

@felixdittrich92
Copy link
Contributor

result.synthesize(font_family="XYZ")

under the hood calls PIL:
font = ImageFont.truetype(font_family, font_size)

@tzktz
Copy link
Author

tzktz commented Mar 26, 2024

result.synthesize(font_family="XYZ")

under the hood calls PIL:
font = ImageFont.truetype(font_family, font_size)
synthetic_pages = result.synthesize(font_family='Arial.ttf', font_size=13)
plt.imshow(synthetic_pages[0]); plt.axis('off'); plt.show()

same warning even i pass the font_family.. @felixdittrich92

WARNING:root:unable to load recommended font family. Loading default PIL font,font size issues may be expected.To prevent this, it is recommended to specify the value of 'font_family'.

@felixdittrich92
Copy link
Contributor

result.synthesize(font_family="XYZ")

under the hood calls PIL:
font = ImageFont.truetype(font_family, font_size)
synthetic_pages = result.synthesize(font_family='Arial.ttf', font_size=13)
plt.imshow(synthetic_pages[0]); plt.axis('off'); plt.show()

same warning even i pass the font_family.. @felixdittrich92

WARNING:root:unable to load recommended font family. Loading default PIL font,font size issues may be expected.To prevent this, it is recommended to specify the value of 'font_family'.

The font is installed on your system ?

@tzktz
Copy link
Author

tzktz commented Mar 27, 2024

result.synthesize(font_family="XYZ")

under the hood calls PIL:
font = ImageFont.truetype(font_family, font_size)
synthetic_pages = result.synthesize(font_family='Arial.ttf', font_size=13)
plt.imshow(synthetic_pages[0]); plt.axis('off'); plt.show()

same warning even i pass the font_family.. @felixdittrich92

WARNING:root:unable to load recommended font family. Loading default PIL font,font size issues may be expected.To prevent this, it is recommended to specify the value of 'font_family'.

The font is installed on your system ?

yes i have that font in my project folder.. @felixdittrich92

@felixdittrich92
Copy link
Contributor

result.synthesize(font_family="XYZ")

under the hood calls PIL:
font = ImageFont.truetype(font_family, font_size)
synthetic_pages = result.synthesize(font_family='Arial.ttf', font_size=13)
plt.imshow(synthetic_pages[0]); plt.axis('off'); plt.show()

same warning even i pass the font_family.. @felixdittrich92

WARNING:root:unable to load recommended font family. Loading default PIL font,font size issues may be expected.To prevent this, it is recommended to specify the value of 'font_family'.

The font is installed on your system ?

yes i have that font in my project folder.. @felixdittrich92

Ah ok got it that's not enough you need to install the font on your system : https://linuxiac.com/how-to-install-fonts-on-linux/#:~:text=Go%20to%20%E2%80%9CSystem%20Settings%E2%80%9D%20%3E,%E2%80%9CInstall%20from%20File%E2%80%9D%20button.&text=Then%20select%20the%20font%20files,%2Dwide%20or%20per%2Duser.

@tzktz
Copy link
Author

tzktz commented Mar 27, 2024

see the below input and output results.. result image quality is very poor.. pixels were broken @felixdittrich92
input image..(1240 x 1754) 158.44kb
input

result image..(1907 x 965) 46kb
Figure_1

@tzktz
Copy link
Author

tzktz commented Apr 2, 2024

@felixdittrich92 any update?

@felixdittrich92
Copy link
Contributor

Hi @tzktz 👋,

Unfortunately i don't have the time to work on that at the moment, so we need to address this later on or you work on that if you want (feel free to open a PR)

related code can be found at:

def synthesize_page(

Best regards,
Felix

@felixdittrich92 felixdittrich92 changed the title i have face result image is not upto quality...fonts are breaks in result image.. [reconstitution] Improve synthesize output quality Apr 16, 2024
@felixdittrich92 felixdittrich92 added this to the 2.0.0 milestone Apr 16, 2024
@felixdittrich92 felixdittrich92 added type: enhancement Improvement good first issue Good for newcomers help wanted Extra attention is needed module: utils Related to doctr.utils labels Apr 16, 2024
@SkaarFacee
Copy link
Contributor

@felixdittrich92 Hey, sorry for being MIA. I needed to take some time off. I am back now and I was hoping I could take up this issue? Let me know regarding this :)

@felixdittrich92
Copy link
Contributor

@felixdittrich92 Hey, sorry for being MIA. I needed to take some time off. I am back now and I was hoping I could take up this issue? Let me know regarding this :)

Hey @SkaarFacee 👋
Sure feel free to work on it 😊
The code moved a bit it is now in:
https://github.com/mindee/doctr/blob/main/doctr/utils/reconstitution.py

@SkaarFacee
Copy link
Contributor

Okay thanks. Let me take a look on what I can do

@SkaarFacee
Copy link
Contributor

@felixdittrich92 Do you have any suggestions on how I can improve the quality of the image ?

@felixT2K
Copy link
Contributor

felixT2K commented Apr 29, 2024

@SkaarFacee
One thing we could do is if we have the line box information we could align all boxes inside to the line y coordinate (to become a more straight view)
I found the following hf space the reconstitution looks not bad maybe you can use it as reference or to get some inspiration ^^ :
https://huggingface.co/spaces/SWHL/RapidOCRDemo/blob/main/utils.py

@SkaarFacee
Copy link
Contributor

Okay, I will take a look and see what can be done over the weekend. This doesn't look that complex at quick glance :)

@felixdittrich92
Copy link
Contributor

Okay, I will take a look and see what can be done over the weekend. This doesn't look that complex at quick glance :)

Yeah i think too :)

@felixdittrich92 felixdittrich92 modified the milestones: 2.0.0, 1.0.0 May 5, 2024
@SkaarFacee
Copy link
Contributor

Hey, I am working on on this, sorry for the delay. Something came up at work and got me busy

@SkaarFacee
Copy link
Contributor

@felixT2K
I was using the link you mentioned as reference (https://huggingface.co/spaces/SWHL/RapidOCRDemo/blob/main/utils.py)
I can't exactly pin point the place where the y coordinate was used to to align the line.
If the goal is to straighten the line, why don't we make the y coordinates of each box in the line the same using a mathematical approach ( using the mean or centroid of the boxes). If you could maybe give me more insights on the hf reference I would gladly implement that as too 😄

@felixdittrich92
Copy link
Contributor

@felixT2K I was using the link you mentioned as reference (https://huggingface.co/spaces/SWHL/RapidOCRDemo/blob/main/utils.py) I can't exactly pin point the place where the y coordinate was used to to align the line. If the goal is to straighten the line, why don't we make the y coordinates of each box in the line the same using a mathematical approach ( using the mean or centroid of the boxes). If you could maybe give me more insights on the hf reference I would gladly implement that as too 😄

Correct that was what i have had in mind you know which boxes are in one line (of `resolve_lines=True otherwise if only one line element available keep the y coords of each box) then take the lines y coordinate for each box to straighten the boxes on the line :)

@felixdittrich92 felixdittrich92 removed this from the 1.0.0 milestone Jun 6, 2024
@felixdittrich92
Copy link
Contributor

@SkaarFacee any updates ? 🤗

@SkaarFacee
Copy link
Contributor

Hey, I am soo sorry. I had some personal issues that needed some time to be resolved. Everything is fine now and back to normal. I shall give you an update asap now 😓 😞

@felixdittrich92
Copy link
Contributor

felixdittrich92 commented Jun 9, 2024

Hey, I am soo sorry. I had some personal issues that needed some time to be resolved. Everything is fine now and back to normal. I shall give you an update asap now 😓 😞

No problem :)

In the meanwhile i identified the root issue.

  • Code itself works as expected
  • Align to line geometry can be added easily
  • issue: font size computation isn't correct (it should be computed depending on the geometry size in respect to the page size) - that's the reason why it looks sometimes gibberish

@SkaarFacee
Copy link
Contributor

Regarding the font size computation, should we maybe add a limit to the font to page size ratio so that the issue becomes less frequent ? Also regarding the line geometry alignment, I think maybe we can also add an tolerance value of some sort so that words that have y pixel values as 40 and say 43 come together in a straight line. ( Here I assume the tolerance is of +-5)

@felixdittrich92
Copy link
Contributor

Hey @SkaarFacee 👋,

  • point for the line alignment sounds good to me 👍
  • about the font size i think we need some logic to get the ratio of a geometry to the page size and then find some calculation to compute the font size or we do a mapping depending on the ratio to some fixed font size values which fits well wdyt ?

@SkaarFacee
Copy link
Contributor

about the font size i think we need some logic to get the ratio of a geometry to the page size and then find some calculation to compute the font size or we do a mapping depending on the ratio to some fixed font size values which fits well wdyt ?

Yes, do we have some sample images so to get started with in this case ?

@felixdittrich92
Copy link
Contributor

about the font size i think we need some logic to get the ratio of a geometry to the page size and then find some calculation to compute the font size or we do a mapping depending on the ratio to some fixed font size values which fits well wdyt ?

Yes, do we have some sample images so to get started with in this case ?

Yes give me few minutes i can attach some

@felixdittrich92
Copy link
Contributor

Here you go :)
samples.zip

@SkaarFacee
Copy link
Contributor

Are these samples that need to be improved or a combination of good outputs and bad outputs ?

@felixdittrich92
Copy link
Contributor

Are these samples that need to be improved or a combination of good outputs and bad outputs ?

Only some samples for testing :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed module: utils Related to doctr.utils type: enhancement Improvement
Projects
None yet
4 participants