-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Task]: Reenable single iteration #23043
Comments
To replicate this locally, you need 2 terminals, starting from the root directory of the beam repo. Terminal 1: Spark runner.
Terminal 2: Go SDK process.
This sends just the test pipeline to the local spark runner, allowing debugging from the SDK side along with whatever debugging process you like in the go binary. |
OK, have determined what's going on. The Spark runner always uses "multi-chunk" iterables, which isn't true of Flink or Prism (or even Dataflow, but that's harder to verify). eg. Spark: vs Prism: vs Flink: [128 32 196 155 160 188 247 247 0 0 0 1 15 1 0 0 0 0 3 1 0 5 65 112 112 108 101 1 0 6 66 97 110 97 110 97 1 0 6 67 104 101 114 114 121] That means the values are always coming over with a -1 (the 255 255 255 255 in the enccoded value from spark) as the length of the chunk header, enabling the multi-chunk protocol, but then not doing a state backed iterable. It's a bug on the Go SDK side, because outside of the state backed case, I didn't think any runner implemented the multi-chunk protocol. |
The issue is that since the DoFn didn't drain the iterable, there were still bytes to be read when processing returned to the datasource. So, a real bug, but on a disused path for most runners. #27762 has been filed to implement the behavior and test in prism, to allow future SDK devs to validate this behavior more easily. |
* [#23043] Re-enable single iteration for the Go SDK. * more debuging * don't drop plan * debug text. * Fix beam23043 * clean up debugging. * update unit test. * go fmt --------- Co-authored-by: lostluck <[email protected]>
What needs to happen?
Single iteration was temporarily disabled in #23042 and should either be turned back on or ripped out entirely. This should also address the issue raised in #22933
Issue Priority
Priority: 2
Issue Component
Component: sdk-go
The text was updated successfully, but these errors were encountered: