-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix upload_graph on v2 #32165
Fix upload_graph on v2 #32165
Conversation
Example run of real streaming jobs with this option - https://github.com/apache/beam/actions/runs/10371405701/job/28711629663 - I added it to the streaming test suite, ran the suite, then removed it after it was clear it worked. |
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment |
I think there's a secondary fix needed (separate PR) where we just minimize the unnecessary steps submitted in the Dataflow job request when using Runner v2 so the the request does not go over the Dataflow RPC limits (Dataflow with Runner v2 do not use such steps anyways but we just submit them for historical reasons). |
I don't think we need this - upload_graph has never worked with Runner v2 and users don't seem to be running into this problem, so I don't think the problem persists in the v2 case. There's also already some divergence here (e.g. Line 1236 in 391dbf6
|
I think these are the cases:
So there might be a corner case for a very large graph that does not choose v1 or v2. In this case, we will automatically turn on |
Note that this will require service changes since I believe currently we use the same Dataflow job request property for both. But agree that if we are not hitting request RPC errors with Runner v2, trimming steps in the the Dataflow job request for Runner v2 might not be a priority. |
If the user explicitly requests runner v2, we already clear the steps from the REST API request. |
(that was what caused the bug) |
Currently, if
upload_graph
is specified we error out when using runner v2. This is exacerbated because we auto-opt in large graphs to this option, so some pipelines can fail without a user even enabling this option.This fixes the problem and does the following:
upload_graph
option for runner v2 pipelines if it is explicitly set.Fixes #32159
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.