Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pip prefers old sdists that "obviously" can't work over recent wheels #13037

Open
1 task done
cburroughs opened this issue Oct 22, 2024 · 18 comments
Open
1 task done

pip prefers old sdists that "obviously" can't work over recent wheels #13037

cburroughs opened this issue Oct 22, 2024 · 18 comments
Labels
S: needs triage Issues/PRs that need to be triaged type: bug A confirmed bug or unintended behavior

Comments

@cburroughs
Copy link

Description

Given these requirements:

numpy==1.21.5
spacy<4.0.0,>=3.0.0
mlflow<3.0.0,>=2.13.0

pip will fail with subprocess-exited-with-error on numpy-1.17.3.zip after about 2 minutes and not find a working dependency set. With the addition of thinc<8.3 Pip successfully resolves in about half a minute.

NOTE: Some of these dependencies are also mentioned in #12990

Expected behavior

  • It seems "obvious" to a human that numpy==1.17.3 is never going to satisfy numpy==1.21.5. Perhaps Pip could 'figure that out' sooner.
  • As a naive user, if Pip can find a working solution in 30 seconds it ought to go in that fast direction instead of the slow direction when backtracking ;-)

pip version

24.2 & main

Python version

3.10

OS

Linux

How to Reproduce

$ pip --version
pip 24.3.dev0 from /home/ecsb/src/o/pip/src/pip (python 3.10)
$ cat reqs-min.txt 
numpy==1.21.5
spacy<4.0.0,>=3.0.0
mlflow<3.0.0,>=2.13.0
$ time pip install --dry-run -r reqs-min.txt

Output

Collecting numpy==1.21.5 (from -r reqs-min.txt (line 1))
  Using cached numpy-1.21.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.1 kB)
Collecting spacy<4.0.0,>=3.0.0 (from -r reqs-min.txt (line 2))
  Using cached spacy-3.8.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (27 kB)
Collecting mlflow<3.0.0,>=2.13.0 (from -r reqs-min.txt (line 3))
  Using cached mlflow-2.17.0-py3-none-any.whl.metadata (29 kB)
Collecting spacy-legacy<3.1.0,>=3.0.11 (from spacy<4.0.0,>=3.0.0->-r reqs-min.txt (line 2))
  Using cached spacy_legacy-3.0.12-py2.py3-none-any.whl.metadata (2.8 kB)
Collecting spacy-loggers<2.0.0,>=1.0.0 (from spacy<4.0.0,>=3.0.0->-r reqs-min.txt (line 2))
  Using cached spacy_loggers-1.0.5-py3-none-any.whl.metadata (23 kB)
Collecting murmurhash<1.1.0,>=0.28.0 (from spacy<4.0.0,>=3.0.0->-r reqs-min.txt (line 2))
  Using cached murmurhash-1.0.10-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.0 kB)
Collecting cymem<2.1.0,>=2.0.2 (from spacy<4.0.0,>=3.0.0->-r reqs-min.txt (line 2))
  Using cached cymem-2.0.8-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (8.4 kB)
Collecting preshed<3.1.0,>=3.0.2 (from spacy<4.0.0,>=3.0.0->-r reqs-min.txt (line 2))
  Using cached preshed-3.0.9-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.2 kB)
Collecting thinc<8.4.0,>=8.3.0 (from spacy<4.0.0,>=3.0.0->-r reqs-min.txt (line 2))
  Using cached thinc-8.3.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (15 kB)
Collecting wasabi<1.2.0,>=0.9.1 (from spacy<4.0.0,>=3.0.0->-r reqs-min.txt (line 2))
  Using cached wasabi-1.1.3-py3-none-any.whl.metadata (28 kB)
Collecting srsly<3.0.0,>=2.4.3 (from spacy<4.0.0,>=3.0.0->-r reqs-min.txt (line 2))
  Using cached srsly-2.4.8-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (20 kB)
Collecting catalogue<2.1.0,>=2.0.6 (from spacy<4.0.0,>=3.0.0->-r reqs-min.txt (line 2))
  Using cached catalogue-2.0.10-py3-none-any.whl.metadata (14 kB)
Collecting weasel<0.5.0,>=0.1.0 (from spacy<4.0.0,>=3.0.0->-r reqs-min.txt (line 2))
  Using cached weasel-0.4.1-py3-none-any.whl.metadata (4.6 kB)
Collecting typer<1.0.0,>=0.3.0 (from spacy<4.0.0,>=3.0.0->-r reqs-min.txt (line 2))
  Using cached typer-0.12.5-py3-none-any.whl.metadata (15 kB)
Collecting tqdm<5.0.0,>=4.38.0 (from spacy<4.0.0,>=3.0.0->-r reqs-min.txt (line 2))
  Using cached tqdm-4.66.5-py3-none-any.whl.metadata (57 kB)
Collecting requests<3.0.0,>=2.13.0 (from spacy<4.0.0,>=3.0.0->-r reqs-min.txt (line 2))
  Using cached requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Collecting pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4 (from spacy<4.0.0,>=3.0.0->-r reqs-min.txt (line 2))
  Using cached pydantic-2.9.2-py3-none-any.whl.metadata (149 kB)
Collecting jinja2 (from spacy<4.0.0,>=3.0.0->-r reqs-min.txt (line 2))
  Using cached jinja2-3.1.4-py3-none-any.whl.metadata (2.6 kB)
Requirement already satisfied: setuptools in ./.venv/lib/python3.10/site-packages (from spacy<4.0.0,>=3.0.0->-r reqs-min.txt (line 2)) (70.2.0)
Collecting packaging>=20.0 (from spacy<4.0.0,>=3.0.0->-r reqs-min.txt (line 2))
  Using cached packaging-24.1-py3-none-any.whl.metadata (3.2 kB)
Collecting langcodes<4.0.0,>=3.2.0 (from spacy<4.0.0,>=3.0.0->-r reqs-min.txt (line 2))
  Using cached langcodes-3.4.1-py3-none-any.whl.metadata (29 kB)
Collecting mlflow-skinny==2.17.0 (from mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached mlflow_skinny-2.17.0-py3-none-any.whl.metadata (30 kB)
Collecting Flask<4 (from mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached flask-3.0.3-py3-none-any.whl.metadata (3.2 kB)
Collecting alembic!=1.10.0,<2 (from mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached alembic-1.13.3-py3-none-any.whl.metadata (7.4 kB)
Collecting docker<8,>=4.0.0 (from mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached docker-7.1.0-py3-none-any.whl.metadata (3.8 kB)
Collecting graphene<4 (from mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached graphene-3.4-py2.py3-none-any.whl.metadata (6.7 kB)
Collecting markdown<4,>=3.3 (from mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached Markdown-3.7-py3-none-any.whl.metadata (7.0 kB)
Collecting matplotlib<4 (from mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached matplotlib-3.9.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB)
Collecting pandas<3 (from mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached pandas-2.2.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (89 kB)
Collecting pyarrow<18,>=4.0.0 (from mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached pyarrow-17.0.0-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (3.3 kB)
Collecting scikit-learn<2 (from mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached scikit_learn-1.5.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (13 kB)
Collecting scipy<2 (from mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached scipy-1.14.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (60 kB)
Collecting sqlalchemy<3,>=1.4.0 (from mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached SQLAlchemy-2.0.36-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.7 kB)
Collecting gunicorn<24 (from mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached gunicorn-23.0.0-py3-none-any.whl.metadata (4.4 kB)
Collecting cachetools<6,>=5.0.0 (from mlflow-skinny==2.17.0->mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached cachetools-5.5.0-py3-none-any.whl.metadata (5.3 kB)
Collecting click<9,>=7.0 (from mlflow-skinny==2.17.0->mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached click-8.1.7-py3-none-any.whl.metadata (3.0 kB)
Collecting cloudpickle<4 (from mlflow-skinny==2.17.0->mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached cloudpickle-3.1.0-py3-none-any.whl.metadata (7.0 kB)
Collecting databricks-sdk<1,>=0.20.0 (from mlflow-skinny==2.17.0->mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Downloading databricks_sdk-0.36.0-py3-none-any.whl.metadata (38 kB)
Collecting gitpython<4,>=3.1.9 (from mlflow-skinny==2.17.0->mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached GitPython-3.1.43-py3-none-any.whl.metadata (13 kB)
Collecting importlib-metadata!=4.7.0,<9,>=3.7.0 (from mlflow-skinny==2.17.0->mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached importlib_metadata-8.5.0-py3-none-any.whl.metadata (4.8 kB)
Collecting opentelemetry-api<3,>=1.9.0 (from mlflow-skinny==2.17.0->mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached opentelemetry_api-1.27.0-py3-none-any.whl.metadata (1.4 kB)
Collecting opentelemetry-sdk<3,>=1.9.0 (from mlflow-skinny==2.17.0->mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached opentelemetry_sdk-1.27.0-py3-none-any.whl.metadata (1.5 kB)
Collecting protobuf<6,>=3.12.0 (from mlflow-skinny==2.17.0->mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached protobuf-5.28.2-cp38-abi3-manylinux2014_x86_64.whl.metadata (592 bytes)
Collecting pyyaml<7,>=5.1 (from mlflow-skinny==2.17.0->mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached PyYAML-6.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.1 kB)
Collecting sqlparse<1,>=0.4.0 (from mlflow-skinny==2.17.0->mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached sqlparse-0.5.1-py3-none-any.whl.metadata (3.9 kB)
Collecting Mako (from alembic!=1.10.0,<2->mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached Mako-1.3.6-py3-none-any.whl.metadata (2.9 kB)
Collecting typing-extensions>=4 (from alembic!=1.10.0,<2->mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached typing_extensions-4.12.2-py3-none-any.whl.metadata (3.0 kB)
Collecting urllib3>=1.26.0 (from docker<8,>=4.0.0->mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached urllib3-2.2.3-py3-none-any.whl.metadata (6.5 kB)
Collecting Werkzeug>=3.0.0 (from Flask<4->mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached werkzeug-3.0.4-py3-none-any.whl.metadata (3.7 kB)
Collecting itsdangerous>=2.1.2 (from Flask<4->mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached itsdangerous-2.2.0-py3-none-any.whl.metadata (1.9 kB)
Collecting blinker>=1.6.2 (from Flask<4->mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached blinker-1.8.2-py3-none-any.whl.metadata (1.6 kB)
Collecting graphql-core<3.3,>=3.1 (from graphene<4->mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached graphql_core-3.2.5-py3-none-any.whl.metadata (10 kB)
Collecting graphql-relay<3.3,>=3.1 (from graphene<4->mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached graphql_relay-3.2.0-py3-none-any.whl.metadata (12 kB)
Collecting MarkupSafe>=2.0 (from jinja2->spacy<4.0.0,>=3.0.0->-r reqs-min.txt (line 2))
  Using cached MarkupSafe-3.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.0 kB)
Collecting language-data>=1.2 (from langcodes<4.0.0,>=3.2.0->spacy<4.0.0,>=3.0.0->-r reqs-min.txt (line 2))
  Using cached language_data-1.2.0-py3-none-any.whl.metadata (4.3 kB)
Collecting contourpy>=1.0.1 (from matplotlib<4->mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached contourpy-1.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.4 kB)
Collecting cycler>=0.10 (from matplotlib<4->mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached cycler-0.12.1-py3-none-any.whl.metadata (3.8 kB)
Collecting fonttools>=4.22.0 (from matplotlib<4->mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached fonttools-4.54.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (163 kB)
Collecting kiwisolver>=1.3.1 (from matplotlib<4->mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached kiwisolver-1.4.7-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.metadata (6.3 kB)
INFO: pip is looking at multiple versions of matplotlib to determine which version is compatible with other requirements. This could take a while.
Collecting matplotlib<4 (from mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached matplotlib-3.9.1.post1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB)
  Using cached matplotlib-3.9.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB)
  Using cached matplotlib-3.8.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.8 kB)
Collecting pillow>=8 (from matplotlib<4->mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached pillow-11.0.0-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (9.1 kB)
Collecting pyparsing>=2.3.1 (from matplotlib<4->mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached pyparsing-3.2.0-py3-none-any.whl.metadata (5.0 kB)
Collecting python-dateutil>=2.7 (from matplotlib<4->mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached python_dateutil-2.9.0.post0-py2.py3-none-any.whl.metadata (8.4 kB)
INFO: pip is looking at multiple versions of pandas to determine which version is compatible with other requirements. This could take a while.
Collecting pandas<3 (from mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached pandas-2.2.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (19 kB)
  Using cached pandas-2.2.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (19 kB)
  Using cached pandas-2.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (19 kB)
  Using cached pandas-2.1.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (18 kB)
  Using cached pandas-2.1.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (18 kB)
  Using cached pandas-2.1.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (18 kB)
  Using cached pandas-2.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (18 kB)
INFO: pip is still looking at multiple versions of pandas to determine which version is compatible with other requirements. This could take a while.
  Using cached pandas-2.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (18 kB)
  Using cached pandas-2.0.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (18 kB)
Collecting pytz>=2020.1 (from pandas<3->mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached pytz-2024.2-py2.py3-none-any.whl.metadata (22 kB)
Collecting tzdata>=2022.1 (from pandas<3->mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached tzdata-2024.2-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting annotated-types>=0.6.0 (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy<4.0.0,>=3.0.0->-r reqs-min.txt (line 2))
  Using cached annotated_types-0.7.0-py3-none-any.whl.metadata (15 kB)
Collecting pydantic-core==2.23.4 (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy<4.0.0,>=3.0.0->-r reqs-min.txt (line 2))
  Using cached pydantic_core-2.23.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Collecting charset-normalizer<4,>=2 (from requests<3.0.0,>=2.13.0->spacy<4.0.0,>=3.0.0->-r reqs-min.txt (line 2))
  Using cached charset_normalizer-3.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (34 kB)
Collecting idna<4,>=2.5 (from requests<3.0.0,>=2.13.0->spacy<4.0.0,>=3.0.0->-r reqs-min.txt (line 2))
  Using cached idna-3.10-py3-none-any.whl.metadata (10 kB)
Collecting certifi>=2017.4.17 (from requests<3.0.0,>=2.13.0->spacy<4.0.0,>=3.0.0->-r reqs-min.txt (line 2))
  Using cached certifi-2024.8.30-py3-none-any.whl.metadata (2.2 kB)
Collecting joblib>=1.2.0 (from scikit-learn<2->mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached joblib-1.4.2-py3-none-any.whl.metadata (5.4 kB)
Collecting threadpoolctl>=3.1.0 (from scikit-learn<2->mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached threadpoolctl-3.5.0-py3-none-any.whl.metadata (13 kB)
INFO: pip is looking at multiple versions of scipy to determine which version is compatible with other requirements. This could take a while.
Collecting scipy<2 (from mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached scipy-1.14.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (60 kB)
  Using cached scipy-1.13.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (60 kB)
  Using cached scipy-1.13.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (60 kB)
  Using cached scipy-1.12.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (60 kB)
  Using cached scipy-1.11.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (60 kB)
  Using cached scipy-1.11.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (60 kB)
  Using cached scipy-1.11.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (59 kB)
INFO: pip is still looking at multiple versions of scipy to determine which version is compatible with other requirements. This could take a while.
  Using cached scipy-1.11.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (59 kB)
  Using cached scipy-1.10.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (58 kB)
Collecting greenlet!=0.4.17 (from sqlalchemy<3,>=1.4.0->mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached greenlet-3.1.1-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.metadata (3.8 kB)
Collecting blis<1.1.0,>=1.0.0 (from thinc<8.4.0,>=8.3.0->spacy<4.0.0,>=3.0.0->-r reqs-min.txt (line 2))
  Using cached blis-1.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.6 kB)
Collecting confection<1.0.0,>=0.0.1 (from thinc<8.4.0,>=8.3.0->spacy<4.0.0,>=3.0.0->-r reqs-min.txt (line 2))
  Using cached confection-0.1.5-py3-none-any.whl.metadata (19 kB)
INFO: pip is looking at multiple versions of thinc to determine which version is compatible with other requirements. This could take a while.
Collecting thinc<8.4.0,>=8.3.0 (from spacy<4.0.0,>=3.0.0->-r reqs-min.txt (line 2))
  Using cached thinc-8.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (15 kB)
  Using cached thinc-8.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (15 kB)
INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. See https://pip.pypa.io/warnings/backtracking for guidance. If you want to abort this run, press Ctrl + C.
Collecting scipy<2 (from mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached scipy-1.10.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (58 kB)
  Using cached scipy-1.9.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (58 kB)
INFO: pip is still looking at multiple versions of thinc to determine which version is compatible with other requirements. This could take a while.
  Using cached scipy-1.9.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (58 kB)
  Using cached scipy-1.9.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.2 kB)
INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. See https://pip.pypa.io/warnings/backtracking for guidance. If you want to abort this run, press Ctrl + C.
  Using cached scipy-1.9.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.2 kB)
  Using cached scipy-1.8.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.2 kB)
  Using cached scipy-1.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.2 kB)
  Using cached scipy-1.7.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.2 kB)
  Using cached scipy-1.7.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.2 kB)
  Using cached scipy-1.6.1.tar.gz (27.3 MB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
  Using cached scipy-1.6.0.tar.gz (27.3 MB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting scikit-learn<2 (from mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached scikit_learn-1.5.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
  Using cached scikit_learn-1.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB)
  Using cached scikit_learn-1.4.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB)
  Using cached scikit_learn-1.4.1.post1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB)
  Using cached scikit_learn-1.4.0-1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB)
  Using cached scikit_learn-1.3.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB)
Collecting scipy<2 (from mlflow<3.0.0,>=2.13.0->-r reqs-min.txt (line 3))
  Using cached scipy-1.5.4.tar.gz (25.2 MB)
  Installing build dependencies ... error
  error: subprocess-exited-with-error
  
  × pip subprocess to install build dependencies did not run successfully.
  │ exit code: 1
  ╰─> [58 lines of output]
      Ignoring numpy: markers 'python_version == "3.6" and platform_system != "AIX"' don't match your environment
      Ignoring numpy: markers 'python_version == "3.7" and platform_system != "AIX"' don't match your environment
      Ignoring numpy: markers 'python_version == "3.6" and platform_system == "AIX"' don't match your environment
      Ignoring numpy: markers 'python_version == "3.7" and platform_system == "AIX"' don't match your environment
      Ignoring numpy: markers 'python_version >= "3.8" and platform_system == "AIX"' don't match your environment
      Collecting wheel
        Using cached wheel-0.44.0-py3-none-any.whl.metadata (2.3 kB)
      Collecting setuptools
        Using cached setuptools-75.2.0-py3-none-any.whl.metadata (6.9 kB)
      Collecting Cython>=0.29.18
        Using cached Cython-3.0.11-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.2 kB)
      Collecting numpy==1.17.3
        Using cached numpy-1.17.3.zip (6.4 MB)
        Installing build dependencies: started
        Installing build dependencies: finished with status 'done'
        Getting requirements to build wheel: started
        Getting requirements to build wheel: finished with status 'done'
        Preparing metadata (pyproject.toml): started
        Preparing metadata (pyproject.toml): finished with status 'error'
        error: subprocess-exited-with-error
      
        × Preparing metadata (pyproject.toml) did not run successfully.
        │ exit code: 1
        ╰─> [24 lines of output]
            Running from numpy source directory.
            <string>:418: UserWarning: Unrecognized setuptools command, proceeding with generating Cython sources and expanding templates
            Traceback (most recent call last):
              File "/tmp/pip-help/.venv/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
                main()
              File "/tmp/pip-help/.venv/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
                json_out['return_val'] = hook(**hook_input['kwargs'])
              File "/tmp/pip-help/.venv/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 149, in prepare_metadata_for_build_wheel
                return hook(metadata_directory, config_settings)
              File "/tmp/pip-build-env-8gr_aycw/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 373, in prepare_metadata_for_build_wheel
                self.run_setup()
              File "/tmp/pip-build-env-8gr_aycw/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 516, in run_setup
                super().run_setup(setup_script=setup_script)
              File "/tmp/pip-build-env-8gr_aycw/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 318, in run_setup
                exec(code, locals())
              File "<string>", line 443, in <module>
              File "<string>", line 422, in setup_package
              File "/tmp/pip-install-8zzl8wkv/numpy_8e2507cae73343ffbdb59612b856e02a/numpy/distutils/core.py", line 26, in <module>
                from numpy.distutils.command import config, config_compiler, \
              File "/tmp/pip-install-8zzl8wkv/numpy_8e2507cae73343ffbdb59612b856e02a/numpy/distutils/command/config.py", line 20, in <module>
                from numpy.distutils.mingw32ccompiler import generate_manifest
              File "/tmp/pip-install-8zzl8wkv/numpy_8e2507cae73343ffbdb59612b856e02a/numpy/distutils/mingw32ccompiler.py", line 34, in <module>
                from distutils.msvccompiler import get_build_version as get_build_msvc_version
            ModuleNotFoundError: No module named 'distutils.msvccompiler'
            [end of output]
      
        note: This error originates from a subprocess, and is likely not a problem with pip.
      error: metadata-generation-failed
      
      × Encountered error while generating package metadata.
      ╰─> See above for output.
      
      note: This is an issue with the package mentioned above, not pip.
      hint: See above for details.
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

real	2m11.279s
user	1m55.358s
sys	0m4.069s

Code of Conduct

@cburroughs cburroughs added S: needs triage Issues/PRs that need to be triaged type: bug A confirmed bug or unintended behavior labels Oct 22, 2024
@jsirois
Copy link
Contributor

jsirois commented Oct 22, 2024

@cburroughs did you notice that the old numpy is encountered collecting build dependencies. I think it's not at all obvious a build dependency has any bearing on install dependencies in general.

@notatallshaw
Copy link
Member

notatallshaw commented Oct 22, 2024

The problem here is going to an old enough scipy in the requirement dependencies that pip tries to build scipy, in an ideal world pip would not backtrack that far on scipy, and build dependencies wouldn't be a consideration.

I will take a look at this example when I next get a chance, I might already have an open PR that fixes it. This MRE is really helpful, thanks.

In general this problem can not be completely solved, backtracking is a hard problem, but it can be improved. You may want to consider using the flag --prefer-binary if you know that you almost never want an sdist.

@tgolsson
Copy link

(I was also debugging this on the Pantsbuild slack, so adding some more.)

The thing I don't understand here is why adding thinc<8.3 is helpful to not hit this. thinc>=8.3 requires numpy>=2.0, so it can never be part of the final selection. I guess it helps change the order of package selection? I don't know if there's a way to get better diagnostics for how e.g. pip download rejects and selects packages, but I end up with ten-thousands of lines, and only 12 instances of "Will try a different candidate".

For example; this is one snippet from the log:

Collecting thinc<8.4.0,>=8.3.0 (from spacy<4.0.0,>=3.0.0->-r requirements.txt (line 2))
  <snip>
  Using cached thinc-8.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (15 kB)
INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. See https://pip.pypa.io/warnings/backtracking for guidance. If you want to abort this run, press Ctrl + C.
Will try a different candidate, due to conflict:
    The user requested numpy==1.21.5
    spacy 3.8.2 depends on numpy>=1.19.0; python_version >= "3.9"
    mlflow 2.17.0 depends on numpy<3
    matplotlib 3.8.4 depends on numpy>=1.21
    pandas 2.0.3 depends on numpy>=1.21.0; python_version >= "3.10"
    pyarrow 17.0.0 depends on numpy>=1.16.6
    scikit-learn 1.5.2 depends on numpy>=1.19.5
    scipy 1.13.0 depends on numpy<2.3 and >=1.22.4

This reads like we picked thinc 8.3 (incompatible!) and then we start chugging through all possible scipys, eventually hitting a crash.. What I'd expect to have happened here is rejecting the thinc version, thus eliminating all variants of thinc compatible with the picked version of spacy, and backtracking.

@jsirois
Copy link
Contributor

jsirois commented Oct 22, 2024

thinc>=8.3 requires numpy>=2.0

@tgolsson are you sure?: https://github.com/explosion/thinc/blob/v9.0.0/setup.cfg#L50

@tgolsson
Copy link

Fair point, I should've been clearer. spacy==v3.8.2 (latest AFAICT) has thinc>=8.3.0,<8.4.0, which leads to the bad resolution path. There is no configuration I can find that would allow v9.0.0 to come into consideration. thinc==8.3.* in turn has the following (fairly pointless?) numpy constraint:

    numpy>=2.0.0,<2.1.0; python_version < "3.9"
    numpy>=2.0.0,<2.1.0; python_version >= "3.9"`

So spacy v3.8.2 is a pointless consideration, because it implies thinc>=8.3, which becomes invalid. Either way doesn't explain why thinc 8.3.0 gets seemingly picked, because that should have been a backtrack from everything I can see. Or it was a backtrack but pip doesn't output the information (with -vvvvvvv) to tell me this.

@jsirois
Copy link
Contributor

jsirois commented Oct 22, 2024

Welp, thinc seems borked in that range. On Github tags go from 8.2.5:
https://github.com/explosion/thinc/blob/e51abd7c3e788d25da09d6bc997bb557471e43cd/setup.cfg#L53-L54
through some 9.0.0.dev* tags straight to 9.0.0. There are no 8.3.* tags at all. I have only been looking at source and not wheel metadata / sdist metadata. Perhaps I'm looking at the wrong source or the published artifacts in that 8.3.* range are just from a fantasy land.

@notatallshaw
Copy link
Member

notatallshaw commented Oct 23, 2024

This is fixed by my optimizations in #12499, although I plan to break that PR up into smaller PRs so each optimization can be shown to have a net benefit, I am waiting on #13001 to land, which is waiting on a resolvelib release (sarugaku/resolvelib#159 (comment)). So at the very earliest pip will be able to handle this in 25.0 next year.

To track this as improvements to the resolve logic happen I've added this scenario to known problematic ones: https://github.com/notatallshaw/Pip-Resolution-Scenarios-and-Benchmarks/blob/main/scenarios/problematic.toml#L144

The thing I don't understand here is why adding thinc<8.3 is helpful to not hit this

I tried these requirements and they resolved fine:

numpy==1.21.5
spacy<4.0.0,>=3.0.0
mlflow<3.0.0,>=2.13.0
thinc<8.3

What did you try? Can you check again.

Once a candidate is pinned in the resolution process pip has to prove there is no resolution with that candidate before it is unpinned and pip tries something else.

At some point a version of thinc is pinned that is not compatible with any version of SciPy, but pip can not know that ahead of time, it must check every version of SciPy available to it, which includes old sdists.

I guess it helps change the order of package selection?

Both user order, and depth of finding a package are considered during resolution: https://pip.pypa.io/en/stable/topics/more-dependency-resolution/#the-resolver-algorithm

But I think in this is a simpler case, as I describe above, pip has pinned a version of thinc that is not compatible with any version of SciPy, but that can only be confirmed by checking all versions of SciPy.

This is a common limitation of resolution algorithms, they can’t go back and “unpin” a candidate early, because doing so and proving the resolution is still sound is tricky.

@notatallshaw
Copy link
Member

Also I’d like to note, if libraries like SciPy are going to add upper-bounds, or worse, tightly couple themselves to dependencies that other libraries also depend on, then resolving those dependencies will become increasingly difficult, if not impossible.

Pip’s resolver can be improved here, but with enough dependencies that themselves tightly couple shared dependencies, eventually it will be impossible to use these libraries side by side by their own tight requirements.

@tgolsson
Copy link

The thing I don't understand here is why adding thinc<8.3 is helpful to not hit this

I tried these requirements and they resolved fine:

numpy==1.21.5
spacy<4.0.0,>=3.0.0
mlflow<3.0.0,>=2.13.0
thinc<8.3

That's my point. If you remove the thinc bound, it doesn't. But the thinc bound should be easily inferred, and yet the logs imply that it's picking the never-valid 8.3.0 before trying a bunch of scipy versions.

@pfmoore
Copy link
Member

pfmoore commented Oct 23, 2024

But the thinc bound should be easily inferred

You might want to review the literature on resolution algorithms. Nothing is "easily inferred" in general in a resolver. In pip's case, we use a backtracking algorithm which effectively picks something to try, then goes down the rabbit hole until it either says "oops, that doesn't work" or finds a solution. But while it's following a thread, it's got a very limited view of the "big picture", so it can easily miss what seem like obvious implications in the wider context.

Certainly we could spot the thinc bound sooner if we took a different route, but there's no immediate reason to assume the route we do pick is wrong - that's the point, essentially. We have heuristics to pick "better" routes, and find out we're on the wrong track sooner, but because they are heuristics they don't always work. We're always trying to improve the heuristics, so examples like this are always good to see, but we can't guarantee we'll always find the best result quickly (or even at all).

@tgolsson
Copy link

You might want to review the literature on resolution algorithms.

Everything is a graph coloring problem if you squint hard enough. ;-) I'm just stating the following facts:

  • The pip logs indicate that it has selected thinc==8.3.0 after rejecting 8.3.1 and 8.3.2
  • The requirements of thinc==8.3.0 is incompatible with the top-level requirements

But after reviewing the logs in more detail; I'm wondering if the selection is proceeding as follows:

  • Picks spacy
  • For each version of scipy that could be picked
    • Eliminate all compatible versions of thinc because scipy is compatible with our bound but thinc isn't compatible with either
  • Reaches a version of scipy it has to compile, and dies

Nothing in the logs indicates this is what happens; except that in this listing:

Will try a different candidate, due to conflict:
    The user requested numpy==1.21.5
    spacy 3.8.2 depends on numpy>=1.19.0; python_version >= "3.9"
    mlflow 2.17.0 depends on numpy<3
    matplotlib 3.8.4 depends on numpy>=1.21
    pandas 2.0.3 depends on numpy>=1.21.0; python_version >= "3.10"
    pyarrow 17.0.0 depends on numpy>=1.16.6
    scikit-learn 1.5.2 depends on numpy>=1.19.5
    scipy 1.9.1 depends on numpy<1.25.0 and >=1.18.5
    thinc 8.3.2 depends on numpy<2.1.0 and >=2.0.0; python_version >= "3.9"

Scipy appears before thinc. But scipy has no effect on the conflict, no matter what we pick it can't resolve the issue... So maybe the issue isn't that the bound isn't found, but rather the effects of it. I don't know, and I can't find a way to configure pip to show me only the resolve process. There's no reasonable way for me to read 58000 log lines to understand the choices made. :)

@notatallshaw
Copy link
Member

The requirements of thinc==8.3.0 is incompatible with the top-level requirements

The top level requirements here are:

numpy==1.21.5
spacy<4.0.0,>=3.0.0
mlflow<3.0.0,>=2.13.0

These requirements aren't incompatible with any restriction on thinc, you need to look at their dependencies to see what is incompatible.The resolution algorithm does a breadth first resolution of these top level requirements, and then does a depth first search algorithm on the dependencies following the preference rules in PipProvider.get_preference

But after reviewing the logs in more detail; I'm wondering if the selection is proceeding as follows:

  • Picks spacy
  • For each version of scipy that could be picked
    • Eliminate all compatible versions of thinc because scipy is compatible with our bound but thinc isn't compatible with either
  • Reaches a version of scipy it has to compile, and dies
    Nothing in the logs indicates this is what happens; except that in this listing

You can actually see this in the logs:

Collecting thinc<8.4.0,>=8.3.0 (from spacy<4.0.0,>=3.0.0->-r requirements.txt (line 2))
  Using cached thinc-8.3.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (15 kB)
...
Collecting thinc<8.4.0,>=8.3.0 (from spacy<4.0.0,>=3.0.0->-r requirements.txt (line 2))
  Using cached thinc-8.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (15 kB)
  Using cached thinc-8.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (15 kB)

Scipy appears before thinc. But scipy has no effect on the conflict, no matter what we pick it can't resolve the issue... So maybe the issue isn't that the bound isn't found, but rather the effects of it. I don't know, and I can't find a way to configure pip to show me only the resolve process. There's no reasonable way for me to read 58000 log lines to understand the choices made. :)

No, thinc first appears in the logs, at around line 17, whereas scipy first appears at around line 63:

Collecting scipy<2 (from mlflow<3.0.0,>=2.13.0->-r requirements.txt (line 3))
  Using cached scipy-1.14.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (60 kB)

At some point, that candidate for scipy is rejected in the current resolver state and the resolver looks for new candidates for scipy to match the current requirements it has collected, but as no version of scipy meets the requirements in the current resolver state, this forces scipy to keep being backtracked on until it eventually finds an sdist it fails to build.

The resolution algorithm is not limited to just trying all versions of a package for a given requirement, it can decide after rejecting a candidate it will look at a different requirement. This is driven from the preference method PipProvider.get_preference and is documented here: https://pip.pypa.io/en/stable/topics/more-dependency-resolution/#the-resolver-algorithm

@tgolsson
Copy link

The requirements of thinc==8.3.0 is incompatible with the top-level requirements

The top level requirements here are:

numpy==1.21.5
spacy<4.0.0,>=3.0.0
mlflow<3.0.0,>=2.13.0

These requirements aren't incompatible with any restriction on thinc, you need to look at their dependencies to see what is incompatible.The resolution algorithm does a breadth first resolution of these top level requirements, and then does a depth first search algorithm on the dependencies following the preference rules in PipProvider.get_preference

thinc==8.3.0 includes the requirement of numpy>=2.0.0,<2.1.0. That is very obviously incompatible with the top-level requirement of numpy==1.21.5. I'm not saying that thinc is itself incompatible, but that we have two conflicting requirements once thinc==8.3.0 has been picked. So any solving beyond that point is a waste of cycles. This conflicts infers an implicit thinc!=8.3.* - as @jsirois pointed out there is a v9.0.0 that could satisfy the numpy requirement but is outside spacys bounds.

I do not understand why 8.3.2 and 8.3.1 are rejected but we accept 8.3.0? They have - as far as I can tell - identical install_requires, and none of them can work.

The resolution algorithm is not limited to just trying all versions of a package for a given requirement, it can decide after rejecting a candidate it will look at a different requirement.

I understand that, but if we can prove the set of potential candidates is empty, surely that is where we backtrack? I'm not arguing that we're doing things in the wrong order. But if we reject 8.3.0 (which we have to), the whole branch of spacy==3.8.2 is dead. There's no reason to look at scipy, it won't help. We've eliminated all candidates for thinc.

@notatallshaw
Copy link
Member

notatallshaw commented Oct 23, 2024

I'll take a look next time I have time to walk through resolution steps, possibly thinc is immediately rejected and it's a Red herring, and SciPy is getting stuck for a different reason.

@cburroughs
Copy link
Author

@cburroughs did you notice that the old numpy is encountered collecting build dependencies. I think it's not at all obvious a build dependency has any bearing on install dependencies in general.

I think it's fair that if one thinks through the subprocess implementation it is not trivial or obvious how to extract the needed information. The scare quotes in the title may be bearing too much weight. I spend a fair amount of my time helping people struggling with Python dependency resolution (as I now you have as well; thank you!) who may not have traditional computer science backgrounds and seeing a request for numpy==1.21.5 followed by Collecting numpy==1.17.3 makes for a pretty baffling experience absent knowledge of the implementation.

I appreciate the several other more tractable avenues of investigation that have been suggested.

In general this problem can not be completely solved, backtracking is a hard problem, but it can be improved. You may want to consider using the flag --prefer-binary if you know that you almost never want an sdist.

Thanks for the evangelism. FWIW at $DAYJOB I actually use --only-binary :all: but that's not as easy outside a controlled corporate environment where one can build "missing" wheels.

Pip’s resolver can be improved here, but with enough dependencies that themselves tightly couple shared dependencies, eventually it will be impossible to use these libraries side by side by their own tight requirements.

I'll take a look next time I have time to walk through resolution steps,

I hope this case helps.

I'm sympathetic that that as soon as the frontier of practical dependency sets expands people immediately try to do something even more complicated.

@notatallshaw
Copy link
Member

notatallshaw commented Oct 23, 2024

I've created a branch of pip to make it possible to follow the resolution algorithm (#13039). Some things to understand about it's output:

  1. Adding requirements and collecting are not part of the solved state of the resolution, but a collection process to get the information for pip to resolve over
  2. Only pinning means the resolution has accepted a candidate in the resolutions current state
  3. "Rejecting" means it tried to pin but failed due to the stated message
  4. There is a missing piece of information about when the resolution effectively "unpins" candidates because it has determined that candidate in that state can't be installed, it's not clear to me if it's possible to make this message with the existing reporter API, because what it's actually doing is popping state off the stack
  5. There is probably more missing information here, this was quickly put together, I don’t know if I will have time to work on this branch, I guess it depends how useful I end up finding it

Here's the full output: https://gist.github.com/notatallshaw/b7ec131343f9343462c6a716be9075ec

The takeaways from the log are:

  1. No version of thinc is ever pinned
  2. All versions of SciPy conflict with the numpy requirement newer than 1.10.1
  3. SciPy 1.10.1 is the first SciPy to be pinned, but spacy 3.8.2 is already pinned on the stack, which requires thinc<8.4.0,>=8.3.0
  4. So until the resolution algorithm proves all solutions under spacy 3.8.2 are impossible it does not backtrack to an older candidate of spacy, which never happens before the failed build

A sufficiently smart algorithm would have rejected spacy 3.8.2 once all possible versions of thinc had been checked. But pip instead just see's all requirements that had numpy as a requirement, and goes through each of them in its preference order, one of which is SciPy.

In #12499 I add optimizations that more deeply analyze the conflict, and prefer requirements which directly disagree, in this case numpy==1.21.5 and thinc. This forces the conflict here, meaning pip can reject spacy 3.8.2 and avoid backtracking to older versions of SciPy.

There may be clever optimizations that could be applied at the resolvelib level, but the amount of communication currently between resolvelib and pip is quite limited, by design, so that is it is easier to reason about (not that I personally find any of this easy to reason about).

@notatallshaw
Copy link
Member

notatallshaw commented Oct 24, 2024

I have an idea that might fix this at the resolvelib level (sarugaku/resolvelib#171), if so this would avoid having to add complex logic in pip (like I propose in #12499).

To set your expectations though, assuming it pans out, it might take awhile to prove it's correct, create and merge a PR with resolvelib, wait for a release, and then vendor to pip.

@cburroughs
Copy link
Author

I understand the pip/resolvelib release cadence. @notatallshaw Thanks again for looking at this in such detail. I'll try to keep the interesting corner cases coming as I find them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S: needs triage Issues/PRs that need to be triaged type: bug A confirmed bug or unintended behavior
Projects
None yet
Development

No branches or pull requests

5 participants