-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable drain operation for flink jobs with requiresStableInput annotation #28567
Conversation
Run PreCommit Java PVR Flink Batch |
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control |
I don't think we can just skip the check. The fact that the check fails implies there are still buffered data that have not been processed yet. We need to either: I have a suspicion that correct is only b), because otherwise, in case of failure and restore from checkpoint after we flushed the data, we might break the stable input contract. |
I agree that option B is the cleanest and makes logical sense. I think in implementation terms it works if somehow flushData is invoked after last checkpoint finishes (or some other approach). Option A violates the requiresStableInput contract and hence seems incorrect (i did experiment with this approach as well and it gave the expected output when checkpoint successfully completed). Having said that, the current approach also works correctly when |
This seems to be the source of troubles. See my comment here. |
@je-ik i have updated the PR based on our discussion. Could you kindly check again? |
LGTM, thanks! Can you please update the title of the PR to match the implementation? We can merge it afterwards. |
Could you also please squash the commits? Thanks! |
* Implement java exponential histograms (apache#28903) * Address comments * Address comments
* cron time update * cron fix
* reference PR 28915 issue_comment removed * post commit issue_comment fix * revert changes for postcommit
* Add link to the Dataflow service options page * Fix link format * Small text edit
* Update arc terraform to allow for coloaction in the default network.Allow usage of reserved ip. Allow usage of existing SA * sync beam env * move aditional runners to load based scaling
Slightly stricter definitions for catching more errors, as well as avoding the use of anyOf which often makes it difficult to deduce what the true error is. This does mean a pipeline must have a transform (or source/sink) block rather than simply be itself a list of transforms.
…for YamlTransform.
As well as good practice, not doing so may result in much more obscure errors (e.g. during encoding) downstream.
* Add readme for PerformanceTests TextIOIT, JDBC, Kafka IO, SpannerIO, SQLBigQueryIO and BiqQueryIO Python * Update readme * PRs 28582 28584 28606 28581 * PR 28738 LoadTests_Java_GBK_Dataflow * Add readme for PostCommit Java Examples Dataflow V2 * Add readme for LoadTests Java CoGBK * Add readme for LoadTests Python CoGBK Dataflow * Add readme for LoadTests Python ParDo and SideInput * Add readme for LoadTests Smoke Python and Java * Update Readme * Update Readme * updated README * Update readme for Performance Tests BigQueryIO Write Python Batch * Remove Trigger Phrases for Load Tests and Performance tests * PR 28846 28730 28827 28861 28897 * update readme --------- Co-authored-by: aleksandr-dudko <[email protected]> Co-authored-by: vitaly.terentyev <[email protected]> Co-authored-by: magicgoody <[email protected]>
…he#29046) Bumps [cloud.google.com/go/spanner](https://github.com/googleapis/google-cloud-go) from 1.50.0 to 1.51.0. - [Release notes](https://github.com/googleapis/google-cloud-go/releases) - [Changelog](https://github.com/googleapis/google-cloud-go/blob/main/CHANGES.md) - [Commits](googleapis/google-cloud-go@spanner/v1.50.0...spanner/v1.51.0) --- updated-dependencies: - dependency-name: cloud.google.com/go/spanner dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…che#28947) Bumps [golang.org/x/net](https://github.com/golang/net) from 0.7.0 to 0.17.0. - [Commits](golang/net@v0.7.0...v0.17.0) --- updated-dependencies: - dependency-name: golang.org/x/net dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
In order to aid debugging of lulls.
correct log message format apply spotless flush buffer when requiresStableInput is set re-organize imports add flag in FlinkPipelineOptions to allow draining for pipelines with RequiresStableInput apply spotless again
Codecov Report
@@ Coverage Diff @@
## master #28567 +/- ##
==========================================
- Coverage 38.39% 29.98% -8.41%
==========================================
Files 686 391 -295
Lines 101640 65297 -36343
==========================================
- Hits 39021 19580 -19441
+ Misses 61040 45717 -15323
+ Partials 1579 0 -1579
Flags with carried forward coverage won't be shown. Click here to find out more. see 302 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
closing this PR since i botched up squashing the commits and created #29102 instead |
Currently, drain operation does not work for flink pipelines when RequiresStableInput annotation is used. This is caused due to buffered data not being processed before the final checkpoint operation that causes watermark hold related exception. This PR addresses this issue by processing the buffer before the final checkpoint completes. More context in #28554.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.