-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug][Go SDK]: Silent Dataflow failure upon glibc library mismatch #24470
Comments
Note that I've just added detailed instructions on reproducing this issue to the linked bug report. |
So there are two issues here.
The first has a few options: Option 1: Custom containers. That's always an option and lets any container be used as the worker container, as long as the entrypoint is the bootloader to handle the rest of the container contract. Option 2: Disable linking by turning off CGO, as already described. We aren't likely to move off of debian as the container base anytime soon, but we could expand https://beam.apache.org/documentation/sdks/go-cross-compilation/ with a bit of this information. The second is that we had to hunt for the root cause. That's never good. It hurts everyone. Ideally, the error is elevated properly, and tells you where to find answers. In this case, since we have dedicated loaders per language, they could point to relevant places on the beam site, like https://beam.apache.org/documentation/sdks/go-cross-compilation/ or similar. That's where the documentation for this should live. As for how to elevate it. All uses of the Beam Go SDK use the same containers, so the fix would be in the repo. In principle, the boot loader could connect to the FnAPI logging service to make the failure announcement. That should elevate it in the Dataflow logs (and all other SDK uses with a FnAPI). It's not entirely clear to me why we don't already beyond it's not been needed before. So the initial arg log is here: https://github.com/apache/beam/blob/master/sdks/go/container/boot.go#L100 And when the exec call fails, it's logged here: https://github.com/apache/beam/blob/master/sdks/go/container/boot.go#L169 I'm not familiar with how the logging gets elevated to errors/fatals/warnings etc from the container logs. The boot loaders don't use anything particularly fancy for logging, just the standard library "log" package, so it wouldn't hurt to upgrade it somewhat. I've already been considering starting integration with the hopefully upcoming structured log library for Go... So that would be my suggestion: We switch the boot loaders to connect to the FnAPI log service, and direct users to a beam site page where we can catalog issues and solutions. The bit that we don't have help for is relating the actual link error easily. Redirecting the binary's StdErr might work, but could lead to other noise going across the logging interface, or other duplication or overrun. |
Issue: #25314 is also noting this particular problem with the boot loaders. |
Some thoughts:
|
See https://github.com/cozos/beam/pull/3/files for first attempt |
@cozos Sorry for not seeing that sooner. That attempt would work, but it makes every SDK depend on a Go SDK specific harness detail. Those are best kept isolated. The boot container doesn't need anything nearly as involved to log over the FnAPI, so it can be simpler in order to keep it debuggable. |
Also the boot.go will never have any meaningful associated bundle_id and transform_id since those require a live pipeline execution, and those ids are only meaningful to the runner that generated them. |
The logging here should no longer be silent due to that previous fix. I beleive that did make it into 2.47.0, releasing soon. We likely don't want to require users to have CGO disabled, but that would also prevent issues like this I think? I'm less well versed in that. But I would love that verification so we can put that advice in the cross-compile documentation, or perhaps make the "autobuild and submit" binary mode set that too (though I beleive any env variables would also normally be propagated, so we'd want to respect that setting.) The linker error is a real issue, and how to resolve it might not be obvious to users. Ideally we have the boot loader additional log when this kind of error is detected, the CGO workaround (if verified) , or point to instructions (on the cross-comp page) on how fix the issue. |
Thanks so much! |
Looks like this never got closed because of how GitHub parses the "fixes" string in PR descriptions. Fixed by #26035 |
What happened?
Hi Apache Beam team,
I've recently identified an issue with using the Beam Go SDK with Google's Dataflow. I've already filed a bug report with Google, but I wanted to report it here as well in case there's something to be done in the SDK side.
I'll paste the contents of my liked bug report below:
Issue Priority
Priority: 2
Issue Component
Component: sdk-go
The text was updated successfully, but these errors were encountered: