Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The case for removing poetry.lock? #2835

Open
ashleysommer opened this issue Jul 24, 2024 · 8 comments
Open

The case for removing poetry.lock? #2835

ashleysommer opened this issue Jul 24, 2024 · 8 comments
Assignees

Comments

@ashleysommer
Copy link
Contributor

We've been seeing increasingly often situations where CI Jobs for PRs are failing because the poetry.lock file is out of sync with the pyproject.toml file, or where there is a hash conflict in the poetry.lock file. These are usually caused by dependabot updating only the pyproject.toml and not the poetry.lock file (though it is supposed to do both), or dependabot generating an invalid poetry.lock file, that happens sometimes.

There are lots of other ways it can get out of sync, including:

  • Merging many dependabot PRs in a row without rebasing subsequent ones
  • contributors running poetry update to bump the lockfile before creating their PR
  • merging main into an old PR before merging the PR
  • or merging an old PR without updating the lockfile.

Documentation about this on from Poetry:

Committing this file to VC is important because it will cause anyone who sets up the project to use the exact same versions of the dependencies that you are using. Your CI server, production machines, other developers in your team, everything and everyone runs on the same dependencies, which mitigates the potential for bugs affecting only some parts of the deployments. Even if you develop alone, in six months when reinstalling the project you can feel confident the dependencies installed are still working even if your dependencies released many new versions since then. (See note below about using the update command.)

And also from the Poetry docs:

For libraries it is not necessary to commit the lock file.

Again this comes down to the old question, is RDFLib an application or a library? If it is an application, we want everyone who contributes to RDFlib to be using exactly the same dependency versions, and producing exactly the same builds, and we need to build RDFLib in a reproducible way.

If RDFlib is a library, then we can simply include a compatibility-range of dependency versions in the pyproject.toml file, and it is up to the end-user's package management tool to install the right versions that work with the final application.

Personally I only ever use RDFLib as a library. All of my applications, even small scripts, have their own dependency list and start with import rdflib. To me that makes RDFLib a library. But I know there are others who use RDFLib only for the built-in cli tools. So to them, RDFLib is an application.

See this Stackoverflow Answer for a well thought out response to this same issue: https://stackoverflow.com/a/61076546

And also this comment from the stackoverflow answer:

I help maintain a number of closed and open source projects, and they all commit lockfiles, partly because I advocated in favor of it. By now I regret that choice, because it occurs quite often that someone's build is not working and the solution is to delete and re-build the lockfile, after which all of us end up having merge conflicts. –
Arne Commented Apr 7, 2020 at 12:02

Its now 2024 and it seems people (inlcuding us) are facing the same issue.

This is an argument from that thread in favor of keeping the lockfile:

Poetry's lock file is an universal lock file.

This means that Poetry doesn't care about the current environment, neither the Python version in use, nor the platform. Instead it makes sure that dependencies are resolvable within the given Python version range in pyproject.toml. This results in a lock file that is valid on any platform with a Python version within the range given in the pyproject.toml.

This difference to other tools, that produces lock file, is also the reason why Poetry is slower in resolving dependencies. This is also the reason why it is recommended to check in the poetry.lock in your vcs. Doing so, it speed up setting up your development environment and you make sure your environment is reproducible.

So we need to either
a) find a way to always keep everyone using the same lockfile, keep the lockfile up to date in main for every PR including dependabot PRs, and probably even ensure all contributors and all CI environments are using the same version of poetry.
or
b) remove the lockfile from the repo.

@ashleysommer
Copy link
Contributor Author

Note, I found this thread on the poetry issue tracker that describes the issue with content-hash, describing why it happens, and potential work-arounds.
python-poetry/poetry#496

@edmondchuc
Copy link
Contributor

What benefits do we currently gain from having the lockfile, besides ensuring a successful poetry install after cloning the repo?

Option b) does sound appealing due to less maintenance overhead.

For removing lockfile
Even with the application use case of RDFLib, as long as the dependency version constraints are specified in pyproject.toml, they should be supported and guaranteed to work. If there are any runtime issues, then we should update the dependency version constraints or have code paths that handle the dependency breaking changes in the application itself.

Any other considerations for option b)?

@nicholascar
Copy link
Member

Keep in 7.x, remove in 8.x after tidy up of pyproject.toml min versions

@Lincoln-GR
Copy link

just putting my thoughts on this here -

An argument in favor of the lock file is it means that every dependency is exactly specified for CI runs,
and for developers who keep their environments in sync with the lock file.
This reduces the chance of difference behavior locally vs in the CI,
or a CI run today vs a CI run tomorrow.
(e.g. test passing locally with some_dep==1.0.0 but failing in CI with some_dep==1.1.0)


For people using rdflib as an application, then it is likely that they only get the locked dependencies if they are doing a poetry install with the lock file present. If they are doing pip install rdflib then I think they will get whatever dependencies pip resolves for them, not what is in the lock file. Unless there is some other distrubution method with locked dependencies.


keep the lockfile up to date in main for every PR

To achieve this in another project, I added a CI job that is basically

jobs:
  poetry-lock-check:
    # Job to check poetry lock file is present, up-to-date with pyproject.toml,
    # and a valid resolution of the dependencies.
    runs-on: ubuntu-latest
    steps:
      # Setup steps
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"
      - run: pipx install poetry==1.8.4 --python "3.11"
      # Do checks
      - run: poetry check
      - run: poetry check --lock
      - name: Check dependencies can be resolved
        run: poetry lock --no-update
      - name: Check resolving dependencies did not change the lock file
        run: git diff --exit-code -- poetry.lock

This goes a long way to ensuring that the lockfile is always correct in main branch.
However it will does mean more work on PRs that fail the check.

@jclerman
Copy link
Contributor

jclerman commented Dec 8, 2024

There are also a number of pre-commit hooks supplied by poetry that can help to catch lockfile issues before commits are created: https://python-poetry.org/docs/pre-commit-hooks/.

@ivan-klass
Copy link

One more thing for .lock file is a package hash for security, so there's no code injection

@ivan-klass
Copy link

But the main problem is that when you're writing your own app, it also adds content hash of your own package you can't opt-out, and it's a disaster for merging things.

@jclerman
Copy link
Contributor

Merging with poetry.lock can be more of a hassle, but in my experience it's not a disaster - in the years that I've been working with projects that include it, it's never been a serious problem, and for at least the last couple of years, I can't recall a time where it was necessary to regenerate the file from scratch. Some instructions for developers would probably suffice.

On the plus side,poetry.lock keeps the development environment reproducible (and consistent across multiple developers) and avoids a source of complication when tracking down issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants