Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisit process of dependency staging in Beam Python #21073

Closed
damccorm opened this issue Jun 4, 2022 · 1 comment
Closed

Revisit process of dependency staging in Beam Python #21073

damccorm opened this issue Jun 4, 2022 · 1 comment
Labels
core done & done Issue has been reviewed after it was closed for verification, followups, etc. P3 python wish

Comments

@damccorm
Copy link
Contributor

damccorm commented Jun 4, 2022

There are a few issues:

  1. Including Beam itself in requirements.txt is causing unnecessary friction, and is suboptimal, because Beam takes care to stage itself to the workers, and Beam workers include Beam dependencies. This is not clear from https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/. Yet from a user's perspective including Beam into requirements.txt seems natural.

  2. Staging sources of all dependencies mentioned in requirements.txt, and their transitive dependencies, in some cases involves a hidden package recompilation, initiated by pip. The reason is that pip cannot reliably identify dependencies of a package without recompiling a package in certain cases, see [1-3] for pointers. This increases time it takes to launch a Beam job, and may require additional software (such as linux packages with header libraries or gcc deps) to be available. This causes friction, confusion, is not obvious and beyond Beam's control.

[1] pypa/pip#8387
[2] pypa/pip#7995
[3] https://discuss.python.org/t/pip-download-just-the-source-packages-no-building-no-metadata-etc/4651

Imported from Jira BEAM-12555. Original Jira may contain additional context.
Reported by: tvalentyn.

@tvalentyn tvalentyn added the done & done Issue has been reviewed after it was closed for verification, followups, etc. label Sep 6, 2023
@github-actions github-actions bot added this to the 2.51.0 Release milestone Sep 6, 2023
@tvalentyn
Copy link
Contributor

we no longer stage beam and no longer stage sources of packages, this has been done a while back already in a duplicate issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core done & done Issue has been reviewed after it was closed for verification, followups, etc. P3 python wish
Projects
None yet
Development

No branches or pull requests

2 participants