Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to correctly decode a valid ZIP64 archive, no error are returned #251

Open
grim7reaper opened this issue Oct 19, 2024 · 0 comments
Open
Labels
bug Something isn't working

Comments

@grim7reaper
Copy link

grim7reaper commented Oct 19, 2024

Describe the bug

The library failed to properly decode a ZIP64 file in its entirety.
No error are returned, but only the first 65,535 files are read from the archive.

To Reproduce
Steps to reproduce the behavior:

  1. Go to https://github.com/whosonfirst-data/whosonfirst-data-admin-gb
  2. Download a copy of the repository as a ZIP
  3. Execute the code below
use std::fs::File;

fn main() {
    let mut file = File::open("whosonfirst-data-admin-gb-master.zip").unwrap();
    let mut zip = zip::ZipArchive::new(file).unwrap();

    dbg!(zip.len());
}

This prints 65535 instead of 148661.

Expected behavior

  • At least generate an error instead of silently "truncate"/ignore half of the file.
  • Ideally, properly read the archive as it seems to be a valid one

Additional context

Problems seems to come from this check in read.rs:

                } else if footer64.version_needed_to_extract > footer64.version_made_by {
                    Err(InvalidArchive(
                        "ZIP64 footer indicates a new version is needed to extract this archive than the \
                         version that wrote it",
                    ))

Seems like in this archive version_needed_to_extract is 45 and version_made_by is 0.
But I don't really understand this is an error that prevent the correct decoding?

Sure, it makes sense to check version_needed_to_extract ("The minimum supported ZIP specification version needed to extract the file"), to make sure the library is able to properly decode this file.

But why should we care about the "ZIP specification version supported by the software used to encode the file." (version_made_by)?
I mean yeah it's weird that a software claiming to support version 0 generate a file requiring 4.5 to decode it and the file is probably buggy in a sense, but to me this is more a warning-level issue than something that should totally prevent to decode the file.

And even if this should be a blocking error, then it seems to be somehow swallowed and never surface to the calling code so the users have no way to know that the reading failed.
Instead, the code seems to fallback on ZIP32 codepath, hence returning only the first 65K entries.

@grim7reaper grim7reaper added the bug Something isn't working label Oct 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant