-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zip-cli binary crate #235
base: master
Are you sure you want to change the base?
zip-cli binary crate #235
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a partial review; I'll review most of the code changes tomorrow or Wednesday.
Cargo.toml
Outdated
[workspace] | ||
members = [ | ||
".", | ||
"cli", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should fuzz
also be a member?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great idea! Unfortunately, I get this incredibly strange error during linking:
error: linking with `cc` failed: exit status: 1
|
= note: LC_ALL="C" PATH="/home/cosmicexplorer/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/bin:/home/cosmicexplorer/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/bin/self-contained:/home/cosmicexplorer/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/bin:/home/cosmicexplorer/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/bin/self-contained:/home/cosmicexplorer/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/bin:/home/cosmicexplorer/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/bin/self-contained:/home/cosmicexplorer/bazel-bullshit/:/usr/lib/jvm/default-runtime/bin:/home/cosmicexplorer/.pyenv/shims:/home/cosmicexplorer/tools/emacs/rex-install/bin:/home/cosmicexplorer/.local/bin:/home/cosmicexplorer/go/bin:/home/cosmicexplorer/.local/bin:/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/lib/jvm/default/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl:/usr/lib/rustup/bin:/usr/bin/core_perl:/usr/bin/vendor_perl:/home/cosmicexplorer/.zsh/snippets/bash:/home/cosmicexplorer/go/bin:/home/cosmicexplorer/.cargo/bin" VSLANG="1033" "cc" "-m64" "/tmp/rustcgcg83u/symbols.o" "-Wl,-Bstatic" "-Wl,--whole-archive" "-Wl,/home/cosmicexplorer/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/librustc-nightly_rt.asan.a" "-Wl,--no-whole-archive" "/home/cosmicexplorer/tools/zip/target/x86_64-unknown-linux-gnu/release/deps/fuzz_read-73bbb0616ed32380.fuzz_read.775828843f50ae8f-cgu.0.rcgu.o" "-Wl,--as-needed" "-L" "/home/cosmicexplorer/tools/zip/target/x86_64-unknown-linux-gnu/release/deps" "-L" "/home/cosmicexplorer/tools/zip/target/release/deps" "-L" "/home/cosmicexplorer/tools/zip/target/x86_64-unknown-linux-gnu/release/build/libfuzzer-sys-394e55403fa44217/out" "-L" "/home/cosmicexplorer/tools/zip/target/x86_64-unknown-linux-gnu/release/build/tikv-jemalloc-sys-9cd5b3f80b1e0b3c/out/build/lib" "-L" "/usr/lib" "-L" "/home/cosmicexplorer/tools/zip/target/x86_64-unknown-linux-gnu/release/build/zstd-sys-28dcdb0894e5b147/out" "-L" "/home/cosmicexplorer/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib" "/tmp/rustcgcg83u/libzstd_sys-f8986c36e00f9a6f.rlib" "/tmp/rustcgcg83u/libtikv_jemalloc_sys-b8ade72b5ff520e1.rlib" "/tmp/rustcgcg83u/liblibfuzzer_sys-64607a53ace2a9b6.rlib" "/home/cosmicexplorer/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/libcompiler_builtins-4d7d16bbf0636a40.rlib" "-Wl,-Bdynamic" "-lbz2" "-lstdc++" "-lgcc_s" "-lutil" "-lrt" "-lpthread" "-lm" "-ldl" "-lc" "-B/home/cosmicexplorer/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/bin/gcc-ld" "-B/home/cosmicexplorer/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/bin/gcc-ld" "-B/home/cosmicexplorer/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/bin/gcc-ld" "-fuse-ld=lld" "-Wl,--eh-frame-hdr" "-Wl,-z,noexecstack" "-L" "/home/cosmicexplorer/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib" "-L" "/home/cosmicexplorer/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/self-contained" "-o" "/home/cosmicexplorer/tools/zip/target/x86_64-unknown-linux-gnu/release/deps/fuzz_read-73bbb0616ed32380" "-pie" "-Wl,-z,relro,-z,now" "-Wl,--strip-all" "-nodefaultlibs"
# ... (snip)
rust-lld: error: undefined symbol: __sancov_gen_.51
>>> referenced by fuzz_read.775828843f50ae8f-cgu.0
>>> /home/cosmicexplorer/tools/zip/target/x86_64-unknown-linux-gnu/release/deps/fuzz_read-73bbb0616ed32380.fuzz_read.775828843f50ae8f-cgu.0.rcgu.o:(asan.module_dtor.10819)
rust-lld: error: undefined symbol: __sancov_gen_.1086
>>> referenced by fuzz_read.775828843f50ae8f-cgu.0
>>> /home/cosmicexplorer/tools/zip/target/x86_64-unknown-linux-gnu/release/deps/fuzz_read-73bbb0616ed32380.fuzz_read.775828843f50ae8f-cgu.0.rcgu.o:(asan.module_dtor.10892)
rust-lld: error: too many errors emitted, stopping now (use --error-limit=0 to see all errors)
collect2: error: ld returned 1 exit status
error: could not compile `zip-fuzz` (bin "fuzz_read") due to 1 previous error
Error: failed to build fuzz script: ASAN_OPTIONS="detect_odr_violation=0" RUSTFLAGS="-Cpasses=sancov-module -Cllvm-args=-sanitizer-coverage-level=4 -Cllvm-args=-sanitizer-coverage-inline-8bit-counters -Cllvm-args=-sanitizer-coverage-pc-table -Cllvm-args=-sanitizer-coverage-trace-compares --cfg fuzzing -Clink-dead-code -Zsanitizer=address -Cllvm-args=-sanitizer-coverage-stack-depth -Cdebug-assertions -C codegen-units=1" "cargo" "build" "--manifest-path" "/home/cosmicexplorer/tools/zip/fuzz/Cargo.toml" "--target" "x86_64-unknown-linux-gnu" "--release" "--config" "profile.release.debug=true" "--all-features" "--bin" "fuzz_read"
This is from:
- putting
"fuzz"
inCargo.toml
'sworkspace.members
array. - commenting out the
workspace.members
array infuzz/Cargo.toml
.
This seems to happen pretty reliably, and I honestly have no clue how to debug it. It would very much be nice to be able to reuse the arbitrary
dependency declaration. If I dug more into how cargo fuzz
is executed and configured I could probably figure this out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhhh, this is because of our strip
and other stuff in profile.release
! I think I can turn that off for fuzz targets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, ok. It looks like cargo fuzz
is strictly only able to use release
(by default) or dev
(with a command-line flag). Custom Profiles might be appropriate. I don't really know how people distribute rust binaries, though--do they just run cargo install zip-cli
and it builds from source? What profile would that use? I'll figure that out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it looks like we could e.g. tell users "run cargo install --profile release-lto zip-cli
" for a smaller binary, maybe (see https://doc.rust-lang.org/cargo/commands/cargo-install.html#compilation-options). I suspect there are best practices already established for this sort of thing that we should look to, since we're surely not the only person who has wanted to put a fuzz crate and a binary crate in the same workspace. I will now take a look at your other comment regarding splitting up the binary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, to clarify:
cargo fuzz
is only able to use thedev
orrelease
profiles (release
by default)- I assume we want to use
release
because fuzzing is slow, so we don't really have an option to change anything there -- we're just stuck with fuzzing usingprofile.release
.
- I assume we want to use
- turning on
lto = false
inprofile.release
breakscargo fuzz
(some linker error about ASan, linked above)lto = true
converts our current 1.6MB binary to 1.3MB, which is good but not as good asstrip
, which does work just fine.
- Custom profiles can inherit from
profile.release
and additionally set e.g.lto = true
, without interfering with other profiles.- For local development, we can run
cargo build --profile release-lto
.
- For local development, we can run
cargo install
also usesprofile.release
by default, and users generally expect to be able to just runcargo install zip-cli
without additional flags.- However,
cargo install
at least does support custom profiles. So we could tell users "please runcargo install --profile release-lto
for a smaller binary."
- However,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I suspect that disabling aes-crypto
and compression algos other than deflate
will make a bigger difference to the binary's size and memory footprint than any combination of lto
and strip
, and still cover the most common use cases.
1f9d6ce
to
78c659b
Compare
Ok, after all those very long replies, the solution is actually quite simple:
Result: > cd cli
> cargo build --release
> ls target/release/zip-cli
-rwxr-xr-x 2 cosmicexplorer wheel 1.8M Aug 13 09:34 target/release/zip-cli*
> cd clite
> cargo build --release
> ls target/release/zip-clite
-rwxr-xr-x 2 cosmicexplorer wheel 825k Aug 13 09:35 target/release/zip-clite* So we end up having to publish two separate crates, but (1) this allows us to achieve the precise separation you described in #235 (comment), and (2) this also means users can actually just run Currently > cd cli
> cargo build --release --no-default-features --features deflate-zlib,bzip2,lzma,xz
> ls target/release/zip-cli
-rwxr-xr-x 2 cosmicexplorer wheel 1.2M Aug 13 09:40 target/release/zip-cli* Note that this 1.2M is smaller than the previous 1.8M with default features enabled. I like this solution a lot and I'm really glad we didn't have to delve into anything more complex! |
If we really wanted to care about artifact size for Lines 83 to 317 in 78c659b
Especially if our goal here is more about providing an interface to the |
Ugh, I actually really like the help text generation mechanism for clap and I don't think binary size is that much of a concern for us at all. I think we definitely have an excuse to go outside of clap if we feel like it but I would prefer to work on other things right now. |
Have you looked at the other parsers listed here? https://rust-cli-recommendations.sunshowers.io/cli-parser.html#alternatives-to-clap (I'm not sure what they mean when they say |
It also occurs to me that it might be helpful to gate the dependency on |
Thanks so much for that discussion of use cases for this binary!! It is a new day and I think it is actually not a bad idea to try parsing args by hand! Will report back ^_^! EDIT: a |
Seems pretty clear that clap is unnecessary, and I've removed it (the parsing is also much easier to follow, since clap requires immense circumlocutions to support...an ordered sequence of args). I'd also like to note: I'm currently depending on > cd cli
> ls target/release/zip-cli
-rwxr-xr-x 2 cosmicexplorer wheel 1.4M Aug 14 14:07 target/release/zip-cli*
> cd clite
> ls target/release/zip-clite
-rwxr-xr-x 2 cosmicexplorer wheel 477k Aug 14 14:07 target/release/zip-clite* For iterating on this, |
Hm, I tried removing |
Ok, after doing a lot of > cargo run -- -v compress -o out.zip -n a -d -n b -s -i c -m 755 -f tmp/file2.txt
writing compressed zip to output file path "out.zip"
default zip entry options: FileOptions { compression_method: Deflated, compression_level: None, last_modified_time: DateTime::default(), permissions: None, large_file: false, encrypt_with: None, extended_options: (), alignment: 1, zopfli_buffer_size: Some(32768) }
setting name of next entry to "a"
writing dir entry
setting name of next entry to "b"
setting symlink flag for next entry
writing immediate symlink entry with name "b" and target "c"
setting file mode 0o755
writing file entry from path "tmp/file2.txt" with name "tmp/file2.txt"
Error:
0: tmp/file2.txt
1: No such file or directory (os error 2)
Location:
/home/cosmicexplorer/tools/zip/cli/src/compress.rs:286
Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets. I was able to mimic this somewhat without the Note that invalid cli args are processed without any error crate, and we explicitly call > cargo run -- -v compress -o out.zip -n a -d -n b -s -s -i c -m 755 -f tmp/file2.txt
writing compressed zip to output file path "out.zip"
default zip entry options: FileOptions { compression_method: Deflated, compression_level: None, last_modified_time: DateTime::from_date_and_time(2024, 8, 14, 20, 2, 54)?, permissions: None, large_file: false, encrypt_with: None, extended_options: (), alignment: 1, zopfli_buffer_size: Some(32768) }
setting name of next entry to "a"
writing dir entry
setting name of next entry to "b"
setting symlink flag for next entry
setting symlink flag for next entry
error: symlink flag provided twice before entry
Usage: target/debug/zip-cli compress [-h|--help] [OUTPUT-FLAG] [ENTRIES]... [--] [PATH]... This is what the > cargo run -- compress --help
do a compress
Usage: target/debug/zip-cli compress [-h|--help] [OUTPUT-FLAG] [ENTRIES]... [--] [PATH]...
-h, --help Print help
Output flags:
Where and how to write the generated zip archive.
-o, --output-file <file>
Output zip file path to write.
The output file is currently always truncated if it already exists.
If not provided, output is written to stdout.
--stdout
Allow writing output to stdout even if stdout is a tty.
ENTRIES:
After at most one output flag is provided, the rest of the command line is attributes and
entry data. Attributes modify later entries.
Sticky attributes:
These flags apply to everything that comes after them until reset by another instance of the
same attribute. Sticky attributes continue to apply to positional arguments received after
processing all flags.
-c, --compression-method <method-name>
Which compression technique to use.
Defaults to deflate if not specified.
Possible values:
- stored: uncompressed
- deflate: with deflate (default)
- deflate64: with deflate64
- bzip2: with bzip2
- zstd: with zstd
-l, --compression-level <level>
How much compression to perform, from 0..=24.
The accepted range of values differs for each technique.
-m, --mode <mode>
Unix permissions to apply to the file, in octal (like chmod).
--large-file [true|false]
Whether to enable large file support.
This may take up more space for records, but allows files over 32 bits in length to be
written, up to 64 bit sizes.
Non-sticky attributes:
These flags only apply to the next entry after them, and may not be repeated.
-n, --name <name>
The name to apply to the entry.
-s, --symlink
Make the next entry into a symlink entry.
A symlink entry may be immediate with -i, or it may read the symlink value from the
filesystem with -f.
Entry data:
Each of these flags creates an entry in the output zip archive.
-d, --dir
Create a directory entry.
A name must be provided beforehand with -n.
-i, --imm <immediate>
Write an entry containing this data.
A name must be provided beforehand with -n.
-f, --file <path>
Write an entry with the contents of this file path.
A name may be provided beforehand with -n, otherwise the name will be inferred from
relativizing the given path to the working directory.
-r, --recursive-dir <dir>
Write all the recursive contents of this directory path.
A name may be provided beforehand with -n, which will be used as the prefix for all
recursive contents of this directory. Otherwise, the name will be inferred from
relativizing the given path to the working directory.
Positional entries:
[PATH]...
Write the file or recursive directory contents, relativizing the path.
If the given path points to a file, then a single file entry will be written.
If the given path is a symlink, then a single symlink entry will be written.
If the given path refers to a directory, then the recursive contents will be written. |
Cool! Can |
Yes!!! (In fact it currently doesn't support anything except regular files, directories, and symlinks, although it definitely could). We can keep the existing "sticky" behavior for explicit |
symlink_target: &'t mut Vec<u8>, | ||
) -> Result<Option<&'t mut [u8]>, CommandError> { | ||
let (kind, size) = { | ||
/* FIXME: the ZipFile<'a> struct contains a *mutable* reference to the parent archive, |
Check notice
Code scanning / devskim
A "TODO" or similar was left in source code, possibly indicating incomplete functionality Note
#[cfg(unix)] | ||
Self::create_or_overwrite_symlink(&mut *err, target, &full_output_path)?; | ||
#[cfg(not(unix))] | ||
todo!("TODO: cannot create symlink for entry {name} on non-unix yet!"); |
Check notice
Code scanning / devskim
A "TODO" or similar was left in source code, possibly indicating incomplete functionality Note
@@ -780,6 +781,8 @@ | |||
} | |||
} | |||
|
|||
/* TODO: consider a ZipWriter which works with just a Write bound to support streaming output? This |
Check notice
Code scanning / devskim
A "TODO" or similar was left in source code, possibly indicating incomplete functionality Note
3f23800
to
69670e3
Compare
Problem
We often want to generate zip files for use in testing and benchmarking, as in #208 and #233. In #233 I added a python script in
tests/data/
which uses the built-inzipfile
stdlib module to create zip files. While this works, it also seems a little silly when we've done most of the work already in this crate to create a highly flexible interface to zip files. Separately, it seems like having a CLI to work on would be a good way to learn more about ways we can make this crate work better with filesystems (e.g. in a TODO I note that we should probably modify.add_symlink_from_path()
to accept anOsStr
target).Prior Investigations
I previously created https://github.com/cosmicexplorer/medusa-zip as a sandbox to mess around with high-performance zip merging and splitting. That was partially because the
zip
crate was unmaintained at the time, and partially because I was under the impression that some of the techniques would be difficult to retrofit into a standard zip library. It was also an excuse to play around with pyo3 and learn how to build for many python platforms with cibuildwheels.Since then, I've found that:
async
as inmedusa-zip
tends to heavily slow down programs during purely filesystem-based I/O, as they tend not to wait for very long (this is why linux does not offer very many nonblocking file APIs).async
tends to be harmful to latency and more useful for very high throughput, especially when waiting on network latencies.medusa-zip
, but can be implemented withpread()
and pipes on#[cfg(unix)]
targets as in parallel/pipelined extraction #208.Solution
Create a
zip-cli
crate in thecli/
subdir. This isn't published anywhere yet, as the main purpose right now is to make it easier to generate test data for benchmarks. This is planned to provide three subcommands:compress
: accept a specification of entries and options and generate a zip file output, likezip
.info
: write out info about a zip file, likezipinfo
.extract
: extract individual entries or an entire archive to stdout or the filesystem, likeunzip
.Result
Currently, only
compress
is implemented, but it seems to work great:TODO
info
andextract
, and gate them by feature flags (zip-cli binary crate #235 (comment)).WrapErr
trait andError
impl that doesn't call intoeyre
oranyhow
, since we're not making a large server application which may error out in unexpected ways (zip-cli binary crate #235 (comment)).process::exit()
by hand.-f
args and positional file args that don't exist, don't write out different error messages, but make them produce the same case of e.g. aCompressError
struct, which is converted into a standard stderr output + exit code at the top-levelmain()
method.Longer-term: optimize
compress
via a "merge" operationmedusa-zip
accepts a JSON interface for specifying inputs to compress, which is less useful for the shell but much more useful for execution e.g. from build tools. Having all the inputs planned out in advance also makes it possible to perform extremely simple and powerful optimizations by creating "sub-zips" in parallel and merging them without decompressing. This is something I plan to propose after I spend more time on #208.