-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pip prefers old sdists that "obviously" can't work over recent wheels #13037
Comments
@cburroughs did you notice that the old numpy is encountered collecting build dependencies. I think it's not at all obvious a build dependency has any bearing on install dependencies in general. |
The problem here is going to an old enough scipy in the requirement dependencies that pip tries to build scipy, in an ideal world pip would not backtrack that far on scipy, and build dependencies wouldn't be a consideration. I will take a look at this example when I next get a chance, I might already have an open PR that fixes it. This MRE is really helpful, thanks. In general this problem can not be completely solved, backtracking is a hard problem, but it can be improved. You may want to consider using the flag |
(I was also debugging this on the Pantsbuild slack, so adding some more.) The thing I don't understand here is why adding For example; this is one snippet from the log:
This reads like we picked thinc 8.3 (incompatible!) and then we start chugging through all possible scipys, eventually hitting a crash.. What I'd expect to have happened here is rejecting the thinc version, thus eliminating all variants of thinc compatible with the picked version of spacy, and backtracking. |
@tgolsson are you sure?: https://github.com/explosion/thinc/blob/v9.0.0/setup.cfg#L50 |
Fair point, I should've been clearer.
So spacy v3.8.2 is a pointless consideration, because it implies |
Welp, thinc seems borked in that range. On Github tags go from 8.2.5: |
This is fixed by my optimizations in #12499, although I plan to break that PR up into smaller PRs so each optimization can be shown to have a net benefit, I am waiting on #13001 to land, which is waiting on a resolvelib release (sarugaku/resolvelib#159 (comment)). So at the very earliest pip will be able to handle this in 25.0 next year. To track this as improvements to the resolve logic happen I've added this scenario to known problematic ones: https://github.com/notatallshaw/Pip-Resolution-Scenarios-and-Benchmarks/blob/main/scenarios/problematic.toml#L144
I tried these requirements and they resolved fine:
What did you try? Can you check again. Once a candidate is pinned in the resolution process pip has to prove there is no resolution with that candidate before it is unpinned and pip tries something else. At some point a version of thinc is pinned that is not compatible with any version of SciPy, but pip can not know that ahead of time, it must check every version of SciPy available to it, which includes old sdists.
Both user order, and depth of finding a package are considered during resolution: https://pip.pypa.io/en/stable/topics/more-dependency-resolution/#the-resolver-algorithm But I think in this is a simpler case, as I describe above, pip has pinned a version of thinc that is not compatible with any version of SciPy, but that can only be confirmed by checking all versions of SciPy. This is a common limitation of resolution algorithms, they can’t go back and “unpin” a candidate early, because doing so and proving the resolution is still sound is tricky. |
Also I’d like to note, if libraries like SciPy are going to add upper-bounds, or worse, tightly couple themselves to dependencies that other libraries also depend on, then resolving those dependencies will become increasingly difficult, if not impossible. Pip’s resolver can be improved here, but with enough dependencies that themselves tightly couple shared dependencies, eventually it will be impossible to use these libraries side by side by their own tight requirements. |
That's my point. If you remove the thinc bound, it doesn't. But the thinc bound should be easily inferred, and yet the logs imply that it's picking the never-valid 8.3.0 before trying a bunch of scipy versions. |
You might want to review the literature on resolution algorithms. Nothing is "easily inferred" in general in a resolver. In pip's case, we use a backtracking algorithm which effectively picks something to try, then goes down the rabbit hole until it either says "oops, that doesn't work" or finds a solution. But while it's following a thread, it's got a very limited view of the "big picture", so it can easily miss what seem like obvious implications in the wider context. Certainly we could spot the thinc bound sooner if we took a different route, but there's no immediate reason to assume the route we do pick is wrong - that's the point, essentially. We have heuristics to pick "better" routes, and find out we're on the wrong track sooner, but because they are heuristics they don't always work. We're always trying to improve the heuristics, so examples like this are always good to see, but we can't guarantee we'll always find the best result quickly (or even at all). |
Everything is a graph coloring problem if you squint hard enough. ;-) I'm just stating the following facts:
But after reviewing the logs in more detail; I'm wondering if the selection is proceeding as follows:
Nothing in the logs indicates this is what happens; except that in this listing:
Scipy appears before thinc. But scipy has no effect on the conflict, no matter what we pick it can't resolve the issue... So maybe the issue isn't that the bound isn't found, but rather the effects of it. I don't know, and I can't find a way to configure pip to show me only the resolve process. There's no reasonable way for me to read 58000 log lines to understand the choices made. :) |
The top level requirements here are:
These requirements aren't incompatible with any restriction on
You can actually see this in the logs:
No,
At some point, that candidate for scipy is rejected in the current resolver state and the resolver looks for new candidates for scipy to match the current requirements it has collected, but as no version of scipy meets the requirements in the current resolver state, this forces scipy to keep being backtracked on until it eventually finds an sdist it fails to build. The resolution algorithm is not limited to just trying all versions of a package for a given requirement, it can decide after rejecting a candidate it will look at a different requirement. This is driven from the preference method |
I do not understand why 8.3.2 and 8.3.1 are rejected but we accept 8.3.0? They have - as far as I can tell - identical install_requires, and none of them can work.
I understand that, but if we can prove the set of potential candidates is empty, surely that is where we backtrack? I'm not arguing that we're doing things in the wrong order. But if we reject 8.3.0 (which we have to), the whole branch of spacy==3.8.2 is dead. There's no reason to look at scipy, it won't help. We've eliminated all candidates for thinc. |
I'll take a look next time I have time to walk through resolution steps, possibly thinc is immediately rejected and it's a Red herring, and SciPy is getting stuck for a different reason. |
I think it's fair that if one thinks through the subprocess implementation it is not trivial or obvious how to extract the needed information. The scare quotes in the title may be bearing too much weight. I spend a fair amount of my time helping people struggling with Python dependency resolution (as I now you have as well; thank you!) who may not have traditional computer science backgrounds and seeing a request for I appreciate the several other more tractable avenues of investigation that have been suggested.
Thanks for the evangelism. FWIW at $DAYJOB I actually use
I hope this case helps. I'm sympathetic that that as soon as the frontier of practical dependency sets expands people immediately try to do something even more complicated. |
I've created a branch of pip to make it possible to follow the resolution algorithm (#13039). Some things to understand about it's output:
Here's the full output: https://gist.github.com/notatallshaw/b7ec131343f9343462c6a716be9075ec The takeaways from the log are:
A sufficiently smart algorithm would have rejected spacy 3.8.2 once all possible versions of thinc had been checked. But pip instead just see's all requirements that had numpy as a requirement, and goes through each of them in its preference order, one of which is SciPy. In #12499 I add optimizations that more deeply analyze the conflict, and prefer requirements which directly disagree, in this case There may be clever optimizations that could be applied at the resolvelib level, but the amount of communication currently between resolvelib and pip is quite limited, by design, so that is it is easier to reason about (not that I personally find any of this easy to reason about). |
I have an idea that might fix this at the To set your expectations though, assuming it pans out, it might take awhile to prove it's correct, create and merge a PR with resolvelib, wait for a release, and then vendor to pip. |
I understand the pip/resolvelib release cadence. @notatallshaw Thanks again for looking at this in such detail. I'll try to keep the interesting corner cases coming as I find them. |
Description
Given these requirements:
pip will fail with
subprocess-exited-with-error
onnumpy-1.17.3.zip
after about 2 minutes and not find a working dependency set. With the addition ofthinc<8.3
Pip successfully resolves in about half a minute.NOTE: Some of these dependencies are also mentioned in #12990
Expected behavior
numpy==1.17.3
is never going to satisfynumpy==1.21.5
. Perhaps Pip could 'figure that out' sooner.pip version
24.2 & main
Python version
3.10
OS
Linux
How to Reproduce
Output
Code of Conduct
The text was updated successfully, but these errors were encountered: