-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: apache beam python SDK hangs and crashes with segmentation fault errors with orjson 3.9.4 #28318
Comments
thanks for reporting. did you see some stacktraces around the segmentation fault by chance? |
Only Beam 2.50.0 has |
The segmentation fault messages that we are seeing in the system log seems to indicate that it is coming from orjson: python[1925]: segfault at 7ff9f1ff4000 ip 00007ffa3c72df53 sp 00007ffa391bd000 error 6 in orjson.cpython-310-x86_64-linux-gnu.so[7ffa3c716000+2f000] |
In the mitigation section - I believe 3.9.2 is fine - we have been running on orjson 3.9.2 and the issue in the orjson project mentioned people were mitigating by reverting to version 3.9.2 |
The corresponding orjson issue ijl/orjson#415 now indicates that it is closed and that the underlying issue should be resolved in orjson 3.9.7 |
@tvalentyn can this one be closed? |
What happened?
A bug introduced in
orjson
dependency (ijl/orjson#415) might cause Beam Python pipelines to crash with a segmentation fault or get stuck. Beam usesorjson
in BigQuery IO, users of this IO might be affected.Mitigation
Until Beam 2.51.0 is released, consider any of the following workarounds:
Use
apache-beam==2.49.0
or below. To avoid running into another known issue, considerapache-beam==2.46.0
.Install
orjson==3.9.1
or below in the runtime environment. For example, you can use a--requirements_file
pipeline option with a file that includes:We recommend the version
orjson==3.9.1
since it was previously tested with Beam 2.49.0 SDK.For more information, see: https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/
Install an [updated version of orjson dependency] (https://pypi.org/project/orjson/#history) once 3.9.4 has a threading issue ijl/orjson#415 is fixed.
Original report
In our latest deployment of our apache beam pipeline our dependency for orjson (dependency of the python apache beam SDK) was upgraded from 3.9.2 to 3.9.4.
The apache beam SDK has a dependency on orjson < 4.0 here:
https://github.com/apache/beam/blob/master/sdks/python/setup.py#L233
With this upgrade of orjson from 3.9.2 to 3.9.4 we are periodically seeing our apache beam SDK hang or the workers crash with segmentation fault errors that we believe is related to this issue in the orjson project:
ijl/orjson#415
When reverting from orjson 3.9.4 to 3.9.2 it seems that the issues are resolved.
The python apache beam SDK may want to limit orjson to 3.9.2 or below until orjson issue 415 is resolved.
Issue Priority
Priority: 2
Issue Components
The text was updated successfully, but these errors were encountered: