-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: minimal wordcount golang example is freezeing on gcs reading #32498
Comments
Same issue here, running the example command from: https://beam.apache.org/get-started/quickstart-go/ Downloading the file from GCS the process finishes successfully in seconds. 🤔
|
Does this happens only on Beam 2.59.0? |
It's happens in others apache beam version too:
|
This is an over splitting problem with the prism runner, due to higher latency with GCS. I had thought we sorted this out (there are some previous issues that were resolved previously but apparently not). This can be confirmed by comparing the behavior to a local file read vs from GCS. There's a bit of tension between certain goals of the Prism runner (fast execution in test situations) and practical use (reading from remote stores) that the current split policy doesn't satisfy. That needs to be fixed. The solution here is that we make the split policy more configurable so we can get the desired fast behavior check for the splitting tests, but increase the default wait time so the example works in higher latency environments. |
OK, definitely works well for me, but I am also on Google's network, in Seattle. Adding a bit more debugging tells me the following:
The current Default Split policy for Prism is to only ask for progress and similar every ~100ms, and if there has been any progress either by the channel counter, or downstream element emissions, then it will not split. This allows it to split when processing is slow (indicated by ~100-200ms where the counts have not moved). Setting the progress ticker to ~ 10ms gives me similar behavior as the reports (Which gives me the chance to find something that should work.) The split planning is so simple, it's not taking into account other work that has been previously done. So it's always only waiting a fixed interval for work for a given stage. A more robust view would take into account work "globally" on the job, and only split if a stage is "straggling" or similar, but prism shouldn't go that far at this time. And we don't want to slow down all stages just because one needs to be more conservative in how it splits. I'm now trying out adding a "back off", for a given stage. If a split needs to happen, the rate of progress requests (and split decisions) happens slower for all new stages. If stages finish faster than any progress requests, then they are made to go faster again. So this should even out to some "ideal" rate per stage. But for this issue, a few "quick" splits should happen and then the aggression is toned down enough for work to complete properly. This isn't likely to be the final dynamic splitting decision approach, since it would be best for that to be also tied to the rate of input to output and similar. Combined with a better initial splits of data would probably solve most problems. |
* [#32498] Add split / progress back off. * Use 100 milliseconds, and decrease "additively". --------- Co-authored-by: lostluck <[email protected]>
We've merged in a fix for the next release, but it would be great if you ran the following for us to verify the approach taken under your conditions.
It will take a little bit longer to start than normal since it will be requesting the code from HEAD, but hopefully this should work. I'm going to be filing an issue later for the remaining part of the solution, which should avoid over splitting entirely. Thanks again for the report! |
Hey @lostluck ! It worked here!
Thank you so much for your time and explanations. |
What happened?
I am trying to run the minimum wordcount Golang example on my local machine, but Apache Beam stays frozen on reading data from Google Storage, and the example never ends.
Example output:
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
The text was updated successfully, but these errors were encountered: