Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ptw/do not skeleton patches #70

Closed

Conversation

prestontw
Copy link
Contributor

@prestontw prestontw commented Jun 7, 2021

I've verified that this addresses #64 locally.
I've also added a test!

My main repository has several patched dependencies. cargo chef would turn these dependencies'
lib.rs files into empty strings.
This would cause compilation issues since code I had patched in was suddenly missing.

Edit: I've now taken the below alternative approach. To avoid erasing the bodies of these dependencies, I've added an ignore_regexes parameter
for Prepare. I would specify ... --ignore-regexes "rust-patches/*". It's important that I use neither "./rust-patches/*" since that would be treated as a regex nor rust-patches/* since that would be expanded by the shell.

An alternative approach would be to get this information automatically from sections of the Cargo.toml
that contain patch. @LukeMathWalker what do you think of manually specifying directories to ignore
vs inferring them from Cargo.toml?

It's important that people don't use `./rust-patches`
or `rust-patches/*`, since the first won't match
the regex and the second the shell will expand.
Easier to use than manually specifying,
but doesn't handle non-top-level cargo.toml patches.
@prestontw
Copy link
Contributor Author

An alternative approach would be to get this information automatically from sections of the Cargo.toml
that contain patch. @LukeMathWalker what do you think of manually specifying directories to ignore
vs inferring them from Cargo.toml?

I implemented this approach in d570ac4. I personally like it better since the patch information is already encoded in the Cargo.toml file and it avoids the two "rust-patches/*" issues above. A slight issue is that it only handles top-level [patch.*] entries, but maybe we can wait to handle that until someone actually needs it.

Let me know what you think! I'm happy to either revert it or try to add support for non-top-level patch entries.

@prestontw
Copy link
Contributor Author

@LukeMathWalker what do you think of the test I added? Do you think it's enough of a minimal example?

@LukeMathWalker
Copy link
Owner

I have seen the PR but I haven't had time to review it yet @prestontw - I'll provide feedback here when I do :)

Should we rely on conventions of users putting
all of their patches in one location? As long as they aren't
intermixed with their source, it should be pretty easy
to guess.
@prestontw
Copy link
Contributor Author

Here are what I see as the pro's and con's of each of the different approaches:

Approach Pro's Con's
Manually specifying via flags The user has direct control over the paths that we do not "skelefy". This might be useful for other use cases outside of patches. Duplicate information between the Cargo.toml files and cargo chef invocations. This ties our Dockerfiles to changes in Cargo.toml.
Exact path from Cargo.toml Automatically updates which directories to leave out as people change the patches. Forces users to list all directories in a dependency as a patch, even if they don't need it for the project themselves. For example, someone might need only a couple of Rocket's crates, but they would have to add all of Rocket's sub directories as patches, even though they don't need them.
Parent of path in Cargo.toml Avoids manual edits from the first row and specifying all directories from the second row. Is a heuristic and might not work for everyone's project layout.

Maybe a mix of the third and first row would be a good compromise? We try to infer the paths, but we give someone the option to override that list if they want.

@LukeMathWalker
Copy link
Owner

The first row is not an option I am afraid - I want to maintain a minimal CLI interface for cargo-chef and I don't think this usecase is big enough to deserve bleeding into the user interface.

Can you go into more details on

Forces users to list all directories in a dependency as a patch, even if they don't need it for the project themselves. For example, someone might need only a couple of Rocket's crates, but they would have to add all of Rocket's sub directories as patches, even though they don't need them.

I am not sure I understand what you mean here. On top of the test, could you provide an actual minimal reproducible example as a GitHub repo? It would make it easier to understand the current failure mode as well (you mentioned compilation errors, but you didn't provide details about them).

@prestontw
Copy link
Contributor Author

I am not sure I understand what you mean here. On top of the test, could you provide an actual minimal reproducible example as a GitHub repo? It would make it easier to understand the current failure mode as well (you mentioned compilation errors, but you didn't provide details about them).

Sorry! I will try to clarify. The failure occurs when someone tries to reference something that we define in a lib.rs. cargo-chef will turn the existing file full of definitions into a minimal file. This will lead to other code that depends on the crate to fail with errors like:

failed to resolve: could not find `error` in `rocket`
...
pub use rocket::error::{
                ^^^^^ could not find `error` in `rocket`

For

Forces users to list all directories in a dependency as a patch, even if they don't need it for the project themselves

since we minify all main.rs and lib.rs files except those listed as patches, directories that the current project doesn't use as patches but are still necessary for the entire crate to compile will fail.

I'll work on creating a minimal reproducible example! That should be clearer than my explanation.

@prestontw
Copy link
Contributor Author

(you mentioned compilation errors, but you didn't provide details about them).

Hi @LukeMathWalker, sorry for the delay. It's been hard to find time to try to minimize the problem, so in the meantime, here are some compilation issues I'm running into:

#39 36.70 error[E0433]: failed to resolve: could not find `signature` in `ring`
#39 36.70    --> /usr/local/cargo/registry/src/-f8edbd7a07552c5f/sct-0.5.0/src/lib.rs:165:40
#39 36.70     |
#39 36.70 165 |             RSA_PKCS1_SHA384 => &ring::signature::RSA_PKCS1_2048_8192_SHA384,
#39 36.70     |                                        ^^^^^^^^^ could not find `signature` in `ring`
#39 36.70 
#39 36.72 error[E0433]: failed to resolve: could not find `signature` in `ring`
#39 36.72    --> /usr/local/cargo/registry/src/-f8edbd7a07552c5f/sct-0.5.0/src/lib.rs:183:15
#39 36.72     |
#39 36.72 183 |         ring::signature::verify(alg, key, data, sig)
#39 36.72     |               ^^^^^^^^^ could not find `signature` in `ring`
#39 36.72 
#39 36.79 error: aborting due to 76 previous errors

We patched ring at version 0.14.6. cargo-chef skelefies ring's lib.rs file, so it loses its pub mod signature export. Ideally, since this ring functions as a dependency rather than a changeable source file, we would not skelefy it.

@LukeMathWalker
Copy link
Owner

LukeMathWalker commented Aug 2, 2021

Now I got it 😁
The second option you put forward seems to be the most appropriate. I still don't fully understand the con you mentioned though:

Forces users to list all directories in a dependency as a patch, even if they don't need it for the project themselves. For example, someone might need only a couple of Rocket's crates, but they would have to add all of Rocket's sub directories as patches, even though they don't need them.

Can you elaborate? Once again, an example would be ideal.

@prestontw
Copy link
Contributor Author

I still don't fully understand the con you mentioned though:

Forces users to list all directories in a dependency as a patch, even if they don't need it for the project themselves. For example, someone might need only a couple of Rocket's crates, but they would have to add all of Rocket's sub directories as patches, even though they don't need them.

Can you elaborate?

I'll describe this with another example first, then create a minimal repo (just in case it's clear enough that a minimal repo isn't necessary).

We patch Pear, I'm not sure what version since its Cargo.toml is especially bare. Our dependency is on the Pear/lib member. However, a file in that member has a dependency on another member, specifically, codegen for its switch proc macro. Then when we skelefy codegen's lib.rs, we obliterate this switch declaration:

#34 52.67 error[E0432]: unresolved import `switch`
#34 52.67  --> rust-patches/Pear/lib/src/parsers.rs:1:49
#34 52.67   |
#34 52.67 1 | use {Result, Input, Length, Expected, ParseErr, switch};
#34 52.67   |                                                 ^^^^^^ no `switch` in the root
#34 52.67 
#34 52.67 error: cannot determine resolution for the macro `switch`
#34 52.67    --> rust-patches/Pear/lib/src/parsers.rs:296:9
#34 52.67     |
#34 52.67 296 |         switch! { [collection; input]
#34 52.67     |         ^^^^^^
#34 52.67     |
#34 52.67     = note: import resolution is stuck, try simplifying macro imports
#34 52.67 
#34 53.10 error: aborting due to 2 previous errors

To try to rephrase

Forces users to list all directories in a dependency as a patch, even if they don't need it for the project themselves.

I, the user, now have to discover the internal, implicit dependencies between the members of the entire Pear crate, even though I only depend on a specific member, through compiler error messages. Another way of phrasing this: as the user of a dependency, I now have to list the transitive dependencies within a rust-patch in my Cargo.toml even though I don't use them.

@LukeMathWalker
Copy link
Owner

Then when we skelefy codegen's lib.rs, we obliterate this switch declaration:

I think this is the part that confuses me - why is codegen in your local filesystem if you are just patching Pear/lib?
I'd expect Pear/lib to pull crates.io version of codegen.

@LukeMathWalker
Copy link
Owner

Or are we working under the assumption that the patched version of Pear/lib brings the whole workspace/large parts of it due to path dependencies?

@prestontw
Copy link
Contributor Author

Or are we working under the assumption that the patched version of Pear/lib brings the whole workspace/large parts of it due to path dependencies?

This is correct. At least for the example, Pear/lib depends on codegen locally: https://github.com/SergioBenitez/Pear/blob/master/lib/Cargo.toml#L11

(I realized that one of my earlier points isn't as painful since we can look in Cargo.tomls to add rust-patches rather than adding based strictly on compiler errors, but it still leaks internal dependencies to users.)

@prestontw
Copy link
Contributor Author

prestontw commented Aug 2, 2021

This is correct. At least for the example, Pear/lib depends on codegen locally: https://github.com/SergioBenitez/Pear/blob/master/lib/Cargo.toml#L11

I've been looking into recreating the issue in another repo, and it looks like we "patched" the dependencies by copying their entire git repos instead of de-taring the vendored tarballs (from using cargo-local-registry). So, this seems pretty unique. It's definitely possible for people to copy a git repo instead of manipulating the existing vendored tarball, so it might pop up in the future? I'm not sure if it's enough of a use case to necessitate this PR.

Separately, I'm going to look if a similar issue pops up with using cargo vendor instead of cargo-local-registry. If it does, then that might be a reason for including this PR.

@prestontw
Copy link
Contributor Author

prestontw commented Aug 3, 2021

In trying to recreate the issue, I've come across a separate issue using cargo chef with dependencies vendored with cargo vendor. I've made a minimal repo at https://github.com/prestontw/cargo-chef-vendored-repo (edit: issue)

The second option you put forward seems to be the most appropriate.

After investigating, this seems the most appropriate to me too---I don't think it's worth trying to approximate the patch directory based off of a weird historical vendor. If it makes sense to you, I'll revert this PR to a point where it was using the second option, then potentially raise another PR to address the issue in https://github.com/prestontw/cargo-chef-vendored-repo. How does that sound?

@LukeMathWalker
Copy link
Owner

Sounds good to me!
I wonder if you can determine the local path dependencies using Cargo.lock, therefore fixing the issue of transitive "patched" dependencies in one go.

@prestontw
Copy link
Contributor Author

I wonder if you can determine the local path dependencies using Cargo.lock, therefore fixing the issue of transitive "patched" dependencies in one go.

How do you imagine this working? I am checking Cargo.lock, and the only difference I can tell between patched dependencies and vendored dependencies is patched dependencies don't have a source specified.

@prestontw
Copy link
Contributor Author

I wonder if you can determine the local path dependencies using Cargo.lock, therefore fixing the issue of transitive "patched" dependencies in one go.

We could try to determine local path dependencies by using patched crates' Cargo.toml's. We would examine their dependencies for explicit paths and add those crates to the list of crates we shouldn't skelefy? I'm also wondering if I should just list out these transitive dependencies as patches (at least in my case).

@LukeMathWalker
Copy link
Owner

You are right unfortunately - the lockfile is not very helpful in this case.
Traversing the manifests seems a good alternative option.

@prestontw
Copy link
Contributor Author

prestontw commented Aug 5, 2021

Traversing the manifests seems a good alternative option.

Do you think that the complexity of traversing manifests is worth it? The alternative is adding more directories to the patches section of a Cargo.toml, which is a little strange to encounter. But I'm not sure if this is a common enough issue to add the complexity. I can also try doing this traversal and seeing if it is actually complex---what do you think?

@LukeMathWalker
Copy link
Owner

Traversing the manifests seems a good alternative option.

Do you think that the complexity of traversing manifests is worth it? The alternative is adding more directories to the patches section of a Cargo.toml, which is a little strange to encounter. But I'm not sure if this is a common enough issue to add the complexity. I can also try doing this traversal and seeing if it is actually complex---what do you think?

I think that tools like cargo-chef, if they want to be adopted and trusted, should strive to keep the simplest interface with the minimal possible number of downsides.
So I definitely consider it worth it.

Another possible avenue: cargo +nightly build --unit-graph -Z unstable-options returns the source path for each dependency in the tree, although it requires nightly.

@alsuren
Copy link
Contributor

alsuren commented Mar 3, 2022

Another possible avenue: cargo +nightly build --unit-graph -Z unstable-options returns the source path for each dependency in the tree, although it requires nightly.

It is possible to do the same with the output of cargo metadata on stable. I hacked something up the other day to help cargo watch find relative dependencies. Basically traverse the dependency graph, and skip anything that is a remote dependency by filtering out CARGO_HOME.

https://github.com/watchexec/cargo-watch/pull/192/files#diff-2b01ecd6b65bb66a3751d23fd19e3fdb6f2764fc466b7537438d04719f4257a3R44

This take a couple of seconds, but it works, and handles patches and relative dependencies without having to parse Cargo.toml or .cargo/config etc. The risk is that the cargo metadata command might hunt for cargo configs that are outside your docker context root, so your recipe JSON might change based on what someone has in their ~/.cargo/config if you're really unlucky? Maybe that's not a problem for us?

@prestontw I haven't looked at your code yet (I'm writing this on my phone) so I might be missing the point entirely. Do you think that this approach might solve the problem for you?

@prestontw
Copy link
Contributor Author

Do you think that this approach might solve the problem for you?

Hmm, interesting! So in my particular case, rather than filtering based on whether the package is in CARGO_HOME, I would filter based on if they are in the patch directory? I might not be understanding fully, but I'm not sure if we can rely on there being a "patch directory"---this sounds like a way to implement approach 3 in #70 (comment). Maybe this gives us more fine-grained directory analysis, so rather than trying to infer whether they are in the patch directory, we can tell that we are a sibling of something that is patched?

Looking at the manifests' and pulling more directories in based on explicit paths sounds the most robust to me, but I'm curious what other people think. @alsuren am I understanding what you're suggesting?

@LukeMathWalker
Copy link
Owner

Closing as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants