Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NotImplementedError: File format not supported #505

Open
kushalmraut opened this issue Aug 7, 2024 · 4 comments
Open

NotImplementedError: File format not supported #505

kushalmraut opened this issue Aug 7, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@kushalmraut
Copy link

kushalmraut commented Aug 7, 2024

for some pdf links i am getting this error NotImplementedError: File format not supported

[<ipython-input-11-0615a449639b>](https://localhost:8080/#) in <cell line: 1>()
----> 1 tables = camelot.read_pdf('https://downloads.usda.library.cornell.edu/usda-esmis/files/cj82k728n/2v23wr658/v405t658m/wwcb2921.pdf', pages='1', flavor='lattice')

2 frames
[/usr/local/lib/python3.10/dist-packages/camelot/utils.py](https://localhost:8080/#) in download_url(url)
     87         content_type = obj.info().get_content_type()
     88         if content_type != "application/pdf":
---> 89             raise NotImplementedError("File format not supported")
     90         f.write(obj.read())
     91     filepath = os.path.join(os.path.dirname(f.name), filename)

NotImplementedError: File format not supported

Steps to reproduce the bug
run below code to reproduce the error.

tables = camelot.read_pdf('https://downloads.usda.library.cornell.edu/usda-esmis/files/cj82k728n/2v23wr658/v405t658m/wwcb2921.pdf', pages='1', flavor='lattice')

Expected behavior

list of tables was expected

PDF

https://downloads.usda.library.cornell.edu/usda-esmis/files/cj82k728n/2v23wr658/v405t658m/wwcb2921.pdf

Screenshots
image

Environment

Linux-6.1.85+-x86_64-with-glibc2.35
Python 3.10.12 (main, Jul 29 2024, 16:56:48) [GCC 11.4.0]
NumPy 1.26.4
OpenCV 4.10.0
Camelot 0.8.2

also tried
Linux-6.1.85+-x86_64-with-glibc2.35
Python 3.10.12 (main, Jul 29 2024, 16:56:48) [GCC 11.4.0]
NumPy 1.26.4
OpenCV 4.10.0
Camelot 0.9.0

and
Linux-6.1.85+-x86_64-with-glibc2.35
Python 3.10.12 (main, Jul 29 2024, 16:56:48) [GCC 11.4.0]
NumPy 1.26.4
OpenCV 4.10.0
Camelot 0.11.0

@kushalmraut kushalmraut added the bug Something isn't working label Aug 7, 2024
@bosd
Copy link

bosd commented Aug 7, 2024

Hey!

As #343, we try to build a maintained fork at pypdf_table_extraction.

Can you check with the latest code over there if the issue still exsists?
Please open a issue there if so.

@jatinchhabriya
Copy link

jatinchhabriya commented Aug 20, 2024

@MartinThoma @vinayak-mehta @bosd I am facing the same error as Kushal,
Expected Output: List of tables
Standard Output since this week: "Attribute Error: File Format not supported". Could you please let me know if a fix has been deployed on the forked branch, this was working a week ago and for my particular use case lattice boundary provided exclusively in camelot-py[cv] is required.

@bosd
Copy link

bosd commented Aug 20, 2024

Could you please let me know if a fix has been deployed on the forked branch,

I assume the fork is ok. The tests are passing there.

Please test your use case with a fresh pip install of pypdf_table_extraction.

If that doesn't work. Please install from source from the main branch.

If you still encounter an error. Please open an issue on the new repo.

@jatinchhabriya
Copy link

@MartinThoma @bosd @vinayak-mehta Tried installing the main branch of forked branch as per your suggestion. Could you please add an example usage of how camelot has to be imported post installing pypdf-table-extraction via github main branch. Also added the issue to the forked branch, please tag the active maintainers py-pdf#63

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants