Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrupted File in GeneOntology #95

Open
AJB117 opened this issue Jul 18, 2024 · 1 comment
Open

Corrupted File in GeneOntology #95

AJB117 opened this issue Jul 18, 2024 · 1 comment

Comments

@AJB117
Copy link

AJB117 commented Jul 18, 2024

Hi, I downloaded the GeneOntology dataset from the provided Zenodo link, but I came across this error during model evaluation:

PytorchStreamReader failed reading zip archive: invalid header or archive is corrupted

After some digging around, it looks like the file 1jhw_A.pt is causing this. I verified this with a simple torch.load in the unzipped GeneOntology directory. I'm currently getting around this this by adding "1JHW-A" to https://github.com/a-r-j/ProteinWorkshop/blob/main/proteinworkshop/datasets/go.py?plain=1#L288. Is this protein meant to be dropped? Thanks!

@AJB117 AJB117 changed the title GeneOntology - Unreadable Protein Corrupted File in GeneOntology Jul 18, 2024
@a-r-j
Copy link
Owner

a-r-j commented Jul 18, 2024

Hi @AJB117 thanks for flagging this, we'll try to update the Zenodo record. I don't believe it should be dropped, no. Excluding it is probably fine or you can go ahead and re-build the dataset from source (i.e. delete everything in processed/).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants