The PostCommit Python job is flaky #30513

github-actions · 2024-03-05T18:33:20Z

The PostCommit Python is failing over 50% of the time
Please visit https://github.com/apache/beam/actions/workflows/beam_PostCommit_Python.yml?query=is%3Afailure+branch%3Amaster to see the logs.

shunping · 2024-03-11T20:22:07Z

It first failed on https://github.com/apache/beam/actions/runs/8210266873.

The failed task is :sdks:python:test-suites:portable:py38:portableWordCountSparkRunnerBatch.

Traceback:

INFO:apache_beam.utils.subprocess_server:Starting service with ('java' '-jar' '/runner/_work/beam/beam/runners/spark/3/job-server/build/libs/beam-runners-spark-3-job-server-2.56.0-SNAPSHOT.jar' '--spark-master-url' 'local[4]' '--artifacts-dir' '/tmp/beam-temp8q8022zi/artifactsg6e8usou' '--job-port' '56313' '--artifact-port' '0' '--expansion-port' '0')
INFO:apache_beam.utils.subprocess_server:Error: A JNI error has occurred, please check your installation and try again
INFO:apache_beam.utils.subprocess_server:Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/beam/vendor/grpc/v1p60p1/io/grpc/BindableService
INFO:apache_beam.utils.subprocess_server:	at java.lang.ClassLoader.defineClass1(Native Method)
INFO:apache_beam.utils.subprocess_server:	at java.lang.ClassLoader.defineClass(ClassLoader.java:757)
INFO:apache_beam.utils.subprocess_server:	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
INFO:apache_beam.utils.subprocess_server:	at java.net.URLClassLoader.defineClass(URLClassLoader.java:473)
INFO:apache_beam.utils.subprocess_server:	at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
INFO:apache_beam.utils.subprocess_server:	at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
INFO:apache_beam.utils.subprocess_server:	at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
INFO:apache_beam.utils.subprocess_server:	at java.security.AccessController.doPrivileged(Native Method)
INFO:apache_beam.utils.subprocess_server:	at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
INFO:apache_beam.utils.subprocess_server:	at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
INFO:apache_beam.utils.subprocess_server:	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
INFO:apache_beam.utils.subprocess_server:	at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
INFO:apache_beam.utils.subprocess_server:	at java.lang.Class.getDeclaredMethods0(Native Method)
INFO:apache_beam.utils.subprocess_server:	at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
INFO:apache_beam.utils.subprocess_server:	at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
INFO:apache_beam.utils.subprocess_server:	at java.lang.Class.getMethod0(Class.java:3018)
INFO:apache_beam.utils.subprocess_server:	at java.lang.Class.getMethod(Class.java:1784)
INFO:apache_beam.utils.subprocess_server:	at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:670)
INFO:apache_beam.utils.subprocess_server:	at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:652)
INFO:apache_beam.utils.subprocess_server:Caused by: java.lang.ClassNotFoundException: org.apache.beam.vendor.grpc.v1p60p1.io.grpc.BindableService
INFO:apache_beam.utils.subprocess_server:	at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
INFO:apache_beam.utils.subprocess_server:	at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
INFO:apache_beam.utils.subprocess_server:	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
INFO:apache_beam.utils.subprocess_server:	at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
INFO:apache_beam.utils.subprocess_server:	... 19 more
ERROR:apache_beam.utils.subprocess_server:Started job service with ('java', '-jar', '/runner/_work/beam/beam/runners/spark/3/job-server/build/libs/beam-runners-spark-3-job-server-2.56.0-SNAPSHOT.jar', '--spark-master-url', 'local[4]', '--artifacts-dir', '/tmp/beam-temp8q8022zi/artifactsg6e8usou', '--job-port', '56313', '--artifact-port', '0', '--expansion-port', '0')
ERROR:apache_beam.utils.subprocess_server:Error bringing up service
Traceback (most recent call last):
  File "/runner/_work/beam/beam/sdks/python/apache_beam/utils/subprocess_server.py", line 175, in start
    raise RuntimeError(
RuntimeError: Service failed to start up with error 1
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/runner/_work/beam/beam/sdks/python/apache_beam/examples/wordcount.py", line 111, in <module>
    run()
  File "/runner/_work/beam/beam/sdks/python/apache_beam/examples/wordcount.py", line 106, in run
    output | 'Write' >> WriteToText(known_args.output)
  File "/runner/_work/beam/beam/sdks/python/apache_beam/pipeline.py", line 612, in __exit__
    self.result = self.run()
  File "/runner/_work/beam/beam/sdks/python/apache_beam/pipeline.py", line 586, in run
    return self.runner.run_pipeline(self, self._options)
  File "/runner/_work/beam/beam/sdks/python/apache_beam/runners/runner.py", line 192, in run_pipeline
    return self.run_portable_pipeline(
  File "/runner/_work/beam/beam/sdks/python/apache_beam/runners/portability/portable_runner.py", line 381, in run_portable_pipeline
    job_service_handle = self.create_job_service(options)
  File "/runner/_work/beam/beam/sdks/python/apache_beam/runners/portability/portable_runner.py", line 296, in create_job_service
    return self.create_job_service_handle(server.start(), options)
  File "/runner/_work/beam/beam/sdks/python/apache_beam/runners/portability/job_server.py", line 81, in start
    self._endpoint = self._job_server.start()
  File "/runner/_work/beam/beam/sdks/python/apache_beam/runners/portability/job_server.py", line 110, in start
    return self._server.start()
  File "/runner/_work/beam/beam/sdks/python/apache_beam/utils/subprocess_server.py", line 175, in start
    raise RuntimeError(
RuntimeError: Service failed to start up with error 1
> Task :sdks:python:test-suites:portable:py38:portableWordCountSparkRunnerBatch FAILED

shunping · 2024-03-11T20:24:12Z

Added the owner of the commit whose post-commit job failed at the first time.
@damccorm

damccorm · 2024-03-11T20:43:46Z

I think we can pretty comfortably rule out that change, it was to the yaml sdk which is unrelated to portableWordCountSparkRunnerBatch. Note that this runs on a schedule, not on commits, though none of the commits in that scheduled time look particularly harmful

shunping · 2024-03-11T20:51:35Z

I see. It was red for the last two weeks and flaky before that too.

kennknowles · 2024-04-29T13:24:15Z

Permared right now

damccorm · 2024-04-29T13:28:27Z

Only sorta - each component job is actually not permared - e.g. there are 2 successes here, https://github.com/apache/beam/actions/runs/8873798546

The whole workflow is permared just because our flake percentage is so high

kennknowles · 2024-04-29T13:31:36Z

Yea, let's work out how to get top-level signal.

Abacn · 2024-04-29T14:26:23Z

The lowest and highest Python version (3.8, 3.11) are running more tests than (3.9, 3.10), could be those tests or task permared

kennknowles · 2024-04-29T14:36:39Z

Could make sense to find a way to get separate top-level signal for Python versions, assuming we can use software engineering to share everything necessary so they don't get out of sync.

Abacn · 2024-04-29T14:42:30Z

Yeah, we used to have this for Jenkins where each Python PostCommit had its own task

liferoad · 2024-05-27T23:57:18Z

The Vertex AI package version issue (we do not import this directly. So it should be fine.):


../../build/gradleenv/-1734967050/lib/python3.9/site-packages/vertexai/preview/developer/__init__.py:33 |  
-- | --
  | ../../build/gradleenv/-1734967050/lib/python3.9/site-packages/vertexai/preview/developer/__init__.py:33 |  
  | ../../build/gradleenv/-1734967050/lib/python3.9/site-packages/vertexai/preview/developer/__init__.py:33 |  
  | ../../build/gradleenv/-1734967050/lib/python3.9/site-packages/vertexai/preview/developer/__init__.py:33 |  
  | ../../build/gradleenv/-1734967050/lib/python3.9/site-packages/vertexai/preview/developer/__init__.py:33 |  
  | ../../build/gradleenv/-1734967050/lib/python3.9/site-packages/vertexai/preview/developer/__init__.py:33 |  
  | ../../build/gradleenv/-1734967050/lib/python3.9/site-packages/vertexai/preview/developer/__init__.py:33 |  
  | ../../build/gradleenv/-1734967050/lib/python3.9/site-packages/vertexai/preview/developer/__init__.py:33 |  
  | /runner/_work/beam/beam/build/gradleenv/-1734967050/lib/python3.9/site-packages/vertexai/preview/developer/__init__.py:33: DeprecationWarning: |  
  | After May 30, 2024, importing any code below will result in an error. |  
  | Please verify that you are explicitly pinning to a version of `google-cloud-aiplatform` |  
  | (e.g., google-cloud-aiplatform==[1.32.0, 1.49.0]) if you need to continue using this |  
  | library. |  
  |   |  
  | from vertexai.preview import ( |  
  | init, |  
  | remote, |  
  | VertexModel, |  
  | register, |  
  | from_pretrained, |  
  | developer, |  
  | hyperparameter_tuning, |  
  | tabular_models, |  
  | ) |  
  |

liferoad · 2024-05-28T00:02:16Z

A new flaky test in py39 and this is related to #29617:

https://ge.apache.org/s/hb7syztoolfhu/console-log?page=17


=================================== FAILURES =================================== |  
-- | --
  | �[31m�[1m_______________ BigQueryQueryToTableIT.test_big_query_legacy_sql _______________�[0m |  
  | [gw3] linux -- Python 3.9.19 /runner/_work/beam/beam/build/gradleenv/1398941893/bin/python3.9 |  
  |   |  
  | self = <apache_beam.io.gcp.big_query_query_to_table_it_test.BigQueryQueryToTableIT testMethod=test_big_query_legacy_sql> |  
  |   |  
  | �[37m@pytest�[39;49;00m.mark.it_postcommit�[90m�[39;49;00m |  
  | �[94mdef�[39;49;00m �[92mtest_big_query_legacy_sql�[39;49;00m(�[96mself�[39;49;00m):�[90m�[39;49;00m |  
  | verify_query = DIALECT_OUTPUT_VERIFY_QUERY % �[96mself�[39;49;00m.output_table�[90m�[39;49;00m |  
  | expected_checksum = test_utils.compute_hash(DIALECT_OUTPUT_EXPECTED)�[90m�[39;49;00m |  
  | pipeline_verifiers = [�[90m�[39;49;00m |  
  | PipelineStateMatcher(),�[90m�[39;49;00m |  
  | BigqueryMatcher(�[90m�[39;49;00m |  
  | project=�[96mself�[39;49;00m.project,�[90m�[39;49;00m |  
  | query=verify_query,�[90m�[39;49;00m |  
  | checksum=expected_checksum)�[90m�[39;49;00m |  
  | ]�[90m�[39;49;00m |  
  | �[90m�[39;49;00m |  
  | extra_opts = {�[90m�[39;49;00m |  
  | �[33m'�[39;49;00m�[33mquery�[39;49;00m�[33m'�[39;49;00m: LEGACY_QUERY,�[90m�[39;49;00m |  
  | �[33m'�[39;49;00m�[33moutput�[39;49;00m�[33m'�[39;49;00m: �[96mself�[39;49;00m.output_table,�[90m�[39;49;00m |  
  | �[33m'�[39;49;00m�[33moutput_schema�[39;49;00m�[33m'�[39;49;00m: DIALECT_OUTPUT_SCHEMA,�[90m�[39;49;00m |  
  | �[33m'�[39;49;00m�[33muse_standard_sql�[39;49;00m�[33m'�[39;49;00m: �[94mFalse�[39;49;00m,�[90m�[39;49;00m |  
  | �[33m'�[39;49;00m�[33mwait_until_finish_duration�[39;49;00m�[33m'�[39;49;00m: WAIT_UNTIL_FINISH_DURATION_MS,�[90m�[39;49;00m |  
  | �[33m'�[39;49;00m�[33mon_success_matcher�[39;49;00m�[33m'�[39;49;00m: all_of(*pipeline_verifiers),�[90m�[39;49;00m |  
  | }�[90m�[39;49;00m |  
  | options = �[96mself�[39;49;00m.test_pipeline.get_full_options_as_args(**extra_opts)�[90m�[39;49;00m |  
  | >     big_query_query_to_table_pipeline.run_bq_pipeline(options)�[90m�[39;49;00m |  
  |   |  
  | �[1m�[31mapache_beam/io/gcp/big_query_query_to_table_it_test.py�[0m:178: |  
  | _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ |  
  | �[1m�[31mapache_beam/io/gcp/big_query_query_to_table_pipeline.py�[0m:103: in run_bq_pipeline |  
  | result = p.run()�[90m�[39;49;00m |  
  | �[1m�[31mapache_beam/testing/test_pipeline.py�[0m:115: in run |  
  | result = �[96msuper�[39;49;00m().run(�[90m�[39;49;00m |  
  | �[1m�[31mapache_beam/pipeline.py�[0m:560: in run |  
  | �[94mreturn�[39;49;00m Pipeline.from_runner_api(�[90m�[39;49;00m |  
  | �[1m�[31mapache_beam/pipeline.py�[0m:587: in run |  
  | �[94mreturn�[39;49;00m �[96mself�[39;49;00m.runner.run_pipeline(�[96mself�[39;49;00m, �[96mself�[39;49;00m._options)�[90m�[39;49;00m |  
  | �[1m�[31mapache_beam/runners/direct/test_direct_runner.py�[0m:42: in run_pipeline |  
  | �[96mself�[39;49;00m.result = �[96msuper�[39;49;00m().run_pipeline(pipeline, options)�[90m�[39;49;00m |  
  | �[1m�[31mapache_beam/runners/direct/direct_runner.py�[0m:117: in run_pipeline |  
  | �[94mfrom�[39;49;00m �[04m�[96mapache_beam�[39;49;00m�[04m�[96m.�[39;49;00m�[04m�[96mrunners�[39;49;00m�[04m�[96m.�[39;49;00m�[04m�[96mportability�[39;49;00m�[04m�[96m.�[39;49;00m�[04m�[96mfn_api_runner�[39;49;00m �[94mimport�[39;49;00m fn_runner�[90m�[39;49;00m |  
  | �[1m�[31mapache_beam/runners/portability/fn_api_runner/__init__.py�[0m:18: in <module> |  
  | �[94mfrom�[39;49;00m �[04m�[96mapache_beam�[39;49;00m�[04m�[96m.�[39;49;00m�[04m�[96mrunners�[39;49;00m�[04m�[96m.�[39;49;00m�[04m�[96mportability�[39;49;00m�[04m�[96m.�[39;49;00m�[04m�[96mfn_api_runner�[39;49;00m�[04m�[96m.�[39;49;00m�[04m�[96mfn_runner�[39;49;00m �[94mimport�[39;49;00m FnApiRunner�[90m�[39;49;00m |  
  | �[1m�[31mapache_beam/runners/portability/fn_api_runner/fn_runner.py�[0m:68: in <module> |  
  | �[94mfrom�[39;49;00m �[04m�[96mapache_beam�[39;49;00m�[04m�[96m.�[39;49;00m�[04m�[96mrunners�[39;49;00m�[04m�[96m.�[39;49;00m�[04m�[96mportability�[39;49;00m�[04m�[96m.�[39;49;00m�[04m�[96mfn_api_runner�[39;49;00m �[94mimport�[39;49;00m execution�[90m�[39;49;00m |  
  | �[1m�[31mapache_beam/runners/portability/fn_api_runner/execution.py�[0m:62: in <module> |  
  | �[94mfrom�[39;49;00m �[04m�[96mapache_beam�[39;49;00m�[04m�[96m.�[39;49;00m�[04m�[96mrunners�[39;49;00m�[04m�[96m.�[39;49;00m�[04m�[96mportability�[39;49;00m�[04m�[96m.�[39;49;00m�[04m�[96mfn_api_runner�[39;49;00m �[94mimport�[39;49;00m translations�[90m�[39;49;00m |  
  | �[1m�[31mapache_beam/runners/portability/fn_api_runner/translations.py�[0m:55: in <module> |  
  | �[94mfrom�[39;49;00m �[04m�[96mapache_beam�[39;49;00m�[04m�[96m.�[39;49;00m�[04m�[96mrunners�[39;49;00m�[04m�[96m.�[39;49;00m�[04m�[96mworker�[39;49;00m �[94mimport�[39;49;00m bundle_processor�[90m�[39;49;00m |  
  | �[1m�[31mapache_beam/runners/worker/bundle_processor.py�[0m:69: in <module> |  
  | �[94mfrom�[39;49;00m �[04m�[96mapache_beam�[39;49;00m�[04m�[96m.�[39;49;00m�[04m�[96mrunners�[39;49;00m�[04m�[96m.�[39;49;00m�[04m�[96mworker�[39;49;00m �[94mimport�[39;49;00m operations�[90m�[39;49;00m |  
  | _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ |  
  |   |  
  | >   �[04m�[91m?�[39;49;00m�[04m�[91m?�[39;49;00m�[04m�[91m?�[39;49;00m�[90m�[39;49;00m |  
  | �[1m�[31mE   KeyError: '__pyx_vtable__'�[0m |  
  |   |  
  | �[1m�[31mapache_beam/runners/worker/operations.py�[0m:1: KeyError

liferoad · 2024-05-29T14:35:34Z

Last three runs are green now.

Close this for now.

shunping · 2024-05-29T15:10:19Z

Great. Thanks @liferoad

github-actions · 2024-05-30T09:33:33Z

Reopening since the workflow is still flaky

github-actions · 2024-06-18T12:42:27Z

Reopening since the workflow is still flaky

liferoad · 2024-06-18T14:32:21Z


[31m�[1m_______ ERROR collecting apache_beam/runners/worker/log_handler_test.py ________�[0m |  
-- | --
  | �[1m�[31mapache_beam/runners/worker/log_handler_test.py�[0m:34: in <module> |  
  | �[94mfrom�[39;49;00m �[04m�[96mapache_beam�[39;49;00m�[04m�[96m.�[39;49;00m�[04m�[96mrunners�[39;49;00m�[04m�[96m.�[39;49;00m�[04m�[96mworker�[39;49;00m �[94mimport�[39;49;00m bundle_processor�[90m�[39;49;00m |  
  | �[1m�[31mapache_beam/runners/worker/bundle_processor.py�[0m:69: in <module> |  
  | �[94mfrom�[39;49;00m �[04m�[96mapache_beam�[39;49;00m�[04m�[96m.�[39;49;00m�[04m�[96mrunners�[39;49;00m�[04m�[96m.�[39;49;00m�[04m�[96mworker�[39;49;00m �[94mimport�[39;49;00m operations�[90m�[39;49;00m |  
  | �[1m�[31mapache_beam/runners/worker/operations.py�[0m:1: in init apache_beam.runners.worker.operations |  
  | �[04m�[91m?�[39;49;00m�[04m�[91m?�[39;49;00m�[04m�[91m?�[39;49;00m�[90m�[39;49;00m |  
  | �[1m�[31mE   KeyError: '__pyx_vtable__'�[0m |  
  | �[31m�[1m________ ERROR collecting apache_beam/runners/worker/opcounters_test.py ________�[0m |  
  | �[1m�[31mapache_beam/runners/worker/opcounters_test.py�[0m:27: in <module> |  
  | �[94mfrom�[39;49;00m �[04m�[96mapache_beam�[39;49;00m�[04m�[96m.�[39;49;00m�[04m�[96mrunners�[39;49;00m�[04m�[96m.�[39;49;00m�[04m�[96mworker�[39;49;00m �[94mimport�[39;49;00m opcounters�[90m�[39;49;00m |  
  | �[1m�[31mapache_beam/runners/worker/opcounters.py�[0m:1: in init apache_beam.runners.worker.opcounters |  
  | �[04m�[91m?�[39;49;00m�[04m�[91m?�[39;49;00m�[04m�[91m?�[39;49;00m�[90m�[39;49;00m |  
  | �[1m�[31mE   ValueError: apache_beam.utils.counters.Counter size changed, may indicate binary incompatibility. Expected 56 from C header, got 32 from PyObject�[0m

https://ge.apache.org/s/w6kem3hrdnwii/console-log/task/:sdks:python:test-suites:direct:py38:tensorflowInferenceTest?anchor=1334&page=2


[36m�[1m=========================== short test summary info ============================�[0m |  
-- | --
  | �[31mERROR�[0m apache_beam/dataframe/transforms_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/dataframe/transforms_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/runners/render_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/runners/render_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/runners/trivial_runner_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/runners/trivial_runner_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/runners/dataflow/dataflow_job_service_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/runners/dataflow/dataflow_job_service_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/runners/interactive/interactive_runner_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/runners/interactive/interactive_runner_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/runners/interactive/utils_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/runners/interactive/utils_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/runners/portability/flink_runner_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/runners/portability/flink_runner_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/runners/portability/flink_uber_jar_job_server_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/runners/portability/flink_uber_jar_job_server_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/runners/portability/local_job_service_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/runners/portability/local_job_service_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/runners/portability/portable_runner_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/runners/portability/portable_runner_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/runners/portability/samza_runner_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/runners/portability/samza_runner_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/runners/portability/spark_java_job_server_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/runners/portability/spark_runner_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/runners/portability/spark_java_job_server_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/runners/portability/spark_uber_jar_job_server_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/runners/portability/spark_runner_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/runners/portability/spark_uber_jar_job_server_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/runners/portability/fn_api_runner/fn_runner_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/runners/portability/fn_api_runner/fn_runner_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/runners/portability/fn_api_runner/translations_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/runners/portability/fn_api_runner/translations_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/runners/portability/fn_api_runner/trigger_manager_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/runners/worker/bundle_processor_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/runners/worker/log_handler_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/runners/worker/opcounters_test.py - ValueError: apache_beam.utils.counters.Counter size changed, may indicate binary incompatibility. Expected 56 from C header, got 32 from PyObject |  
  | �[31mERROR�[0m apache_beam/runners/portability/fn_api_runner/trigger_manager_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/runners/worker/bundle_processor_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/runners/worker/log_handler_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/runners/worker/opcounters_test.py - ValueError: apache_beam.utils.counters.Counter size changed, may indicate binary incompatibility. Expected 56 from C header, got 32 from PyObject |  
  | �[31mERROR�[0m apache_beam/runners/worker/sdk_worker_main_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/runners/worker/sdk_worker_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/runners/worker/sideinputs_test.py - ValueError: apache_beam.utils.counters.Counter size changed, may indicate binary incompatibility. Expected 56 from C header, got 32 from PyObject |  
  | �[31mERROR�[0m apache_beam/runners/worker/sdk_worker_main_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/runners/worker/sdk_worker_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/runners/worker/sideinputs_test.py - ValueError: apache_beam.utils.counters.Counter size changed, may indicate binary incompatibility. Expected 56 from C header, got 32 from PyObject |  
  | �[31mERROR�[0m apache_beam/testing/load_tests/microbenchmarks_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/transforms/combinefn_lifecycle_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/testing/load_tests/microbenchmarks_test.py - KeyError: '__pyx_vtable__' |  
  | �[31mERROR�[0m apache_beam/transforms/combinefn_lifecycle_test.py - KeyError: '__pyx_vtable__'

jrmccluskey · 2024-07-02T14:27:48Z

No cython issues in recent runs, just a number of flakes for tests with external connections (GCSIO, RRIO) that aren't consistent across Python versions or different runs

Abacn · 2024-08-13T17:20:36Z

Currently Python3.12 Dataflow test has two test failing consistently:

apache_beam/ml/inference/sklearn_inference_it_test.py::SklearnInference::test_sklearn_mnist_classification 

apache_beam/ml/inference/sklearn_inference_it_test.py::SklearnInference::test_sklearn_mnist_classification_large_model

Error:

 subprocess.CalledProcessError: Command '['/runner/_work/beam/beam/build/gradleenv/2050596100/bin/python3.12', '-m', 'pip', 'download', '--dest', '/tmp/dataflow-requirements-cache', '-r', '/tmp/tmpoq1ebvgy/tmp_requirements.txt', '--exists-action', 'i', '--no-deps', '--implementation', 'cp', '--abi', 'cp312', '--platform', 'manylinux2014_x86_64']' returned non-zero exit status 1.


Error compiling Cython file:

sklearn/utils/_vector_sentinel.pyx:31:9: Previous declaration is here

Cannot install sklearn from source using cython

happened as early as https://github.com/apache/beam/commits/5b2bfe96f83a5631c3a8d5c3b92a0f695ffe2d7d

Abacn · 2024-08-13T17:28:34Z

We need bump sklearn requirements here: https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/sklearn_examples_requirements.txt

github-actions · 2024-08-16T18:36:22Z

Reopening since the workflow is still flaky

github-actions · 2024-08-30T06:37:58Z

Reopening since the workflow is still flaky

liferoad · 2024-08-30T13:58:16Z

2024-08-30T07:28:39.6571287Z if setup_options.setup_file is not None:
2024-08-30T07:28:39.6571763Z if not os.path.isfile(setup_options.setup_file):
2024-08-30T07:28:39.6572227Z > raise RuntimeError(
2024-08-30T07:28:39.6572923Z 'The file %s cannot be found. It was specified in the '
2024-08-30T07:28:39.6573578Z '--setup_file command line option.' % setup_options.setup_file)
2024-08-30T07:28:39.6574970Z �[1m�[31mE RuntimeError: The file /runner/_work/beam/beam/sdks/python/apache_beam/examples/complete/juliaset/src/setup.py cannot be found. It was specified in the --setup_file command line option.�[0m

https://productionresultssa6.blob.core.windows.net/actions-results/9f18d66f-dabf-46e8-8b29-ae50d075f3dd/workflow-job-run-912db29d-d57b-5850-6efb-b125ca814b95/logs/job/job-logs.txt?rsct=text%2Fplain&se=2024-08-30T14%3A06%3A43Z&sig=aqESnfP68oo0sF7TUtpq%2BNFgdgfCbq8Ey3q%2BFMLZtvI%3D&ske=2024-08-31T00%3A21%3A54Z&skoid=ca7593d4-ee42-46cd-af88-8b886a2f84eb&sks=b&skt=2024-08-30T12%3A21%3A54Z&sktid=398a6654-997b-47e9-b12b-9515b896b4de&skv=2024-05-04&sp=r&spr=https&sr=b&st=2024-08-30T13%3A56%3A38Z&sv=2024-05-04

tvalentyn · 2024-08-30T17:54:50Z

Currently failing test:

gradlew :sdks:python:test-suites:portable:py312:portableLocalRunnerJuliaSetWithSetupPy

damccorm · 2024-11-01T14:19:42Z

This is red again - https://github.com/apache/beam/actions/workflows/beam_PostCommit_Python.yml?query=branch%3Amaster

It looks like there are currently 2 issues:

Python 3.9 job is failing, I think probably because of the mypy changes. example failure
The TensorRT tests are failing. Originally, they were failing because of a mismatch between container/local python versions, but now they seem to be running into CUDA issues with the new container. example failure and corresponding failing Dataflow job

damccorm · 2024-11-01T14:20:01Z

@jrmccluskey would you mind taking a look at these?

jrmccluskey · 2024-11-01T14:37:42Z

Failure in the 3.9 postcommit is apache_beam/examples/fastavro_it_test.py::FastavroIT::test_avro_it, will dive deeper into that shortly

jrmccluskey · 2024-11-01T15:11:07Z

The problem in the TensorRT container is that we seem to have two different versions of CUDA installed, one at version 11.8 and the other at 12.1 (we want everything at 12.1)

damccorm · 2024-11-04T13:34:32Z

Looks like after sickbaying TensorRT tests, there are still failures. https://ge.apache.org/s/27igat7sfmcsu/console-log/task/:sdks:python:test-suites:portable:py310:portableWordCountSparkRunnerBatch?anchor=60&page=1 is an example, it looks like we're failing because we're missing a class in the spark runner.

@Abacn would you mind taking a look? Its unclear why this is happening now, but I'm guessing it may be related to #32976 (and maybe some caching kept it from showing up?)

Abacn · 2024-11-04T15:24:43Z

Looks like after sickbaying TensorRT tests, there are still failures. https://ge.apache.org/s/27igat7sfmcsu/console-log/task/:sdks:python:test-suites:portable:py310:portableWordCountSparkRunnerBatch?anchor=60&page=1 is an example, it looks like we're failing because we're missing a class in the spark runner.

@Abacn would you mind taking a look? Its unclear why this is happening now, but I'm guessing it may be related to #32976 (and maybe some caching kept it from showing up?)

It's bad gradle cache. Cannot reproduce locally on master branch. Also inspected the expansion jar.

For some reason, recently, Gradle cache for shadowJar breaks more frequently

github-actions bot added bug flaky_test P1 workflow_id: 69778299 labels Mar 5, 2024

kennknowles added the permared label Apr 29, 2024

kennknowles removed the permared label Apr 29, 2024

liferoad self-assigned this May 20, 2024

liferoad mentioned this issue May 20, 2024

Retry when the BQ job error can be retried #31350

Closed

3 tasks

liferoad closed this as completed May 29, 2024

github-actions bot added this to the 2.57.0 Release milestone May 29, 2024

github-actions bot reopened this May 30, 2024

jrmccluskey modified the milestones: 2.58.0 Release, 2.59.0 Release Jul 3, 2024

Abacn mentioned this issue Aug 13, 2024

Fix PostCommit Python sklearn on Python3.12 #32171

Merged

3 tasks

Abacn closed this as completed in #32171 Aug 14, 2024

github-actions bot reopened this Aug 16, 2024

damccorm closed this as completed Aug 20, 2024

github-actions bot reopened this Aug 30, 2024

tvalentyn linked a pull request Aug 30, 2024 that will close this issue

Fix juliaset_test_it #32378

Draft

tvalentyn mentioned this issue Aug 30, 2024

Revert "docs: modernize py dependencies docs and example" #32382

Merged

tvalentyn closed this as completed in #32382 Aug 30, 2024

github-actions bot modified the milestones: 2.59.0 Release, 2.60.0 Release Aug 30, 2024

damccorm reopened this Nov 1, 2024

damccorm assigned jrmccluskey and unassigned liferoad Nov 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The PostCommit Python job is flaky #30513

The PostCommit Python job is flaky #30513

github-actions bot commented Mar 5, 2024

shunping commented Mar 11, 2024

shunping commented Mar 11, 2024

damccorm commented Mar 11, 2024

shunping commented Mar 11, 2024

kennknowles commented Apr 29, 2024

damccorm commented Apr 29, 2024

kennknowles commented Apr 29, 2024

Abacn commented Apr 29, 2024

kennknowles commented Apr 29, 2024

Abacn commented Apr 29, 2024

liferoad commented May 27, 2024 •

edited

Loading

liferoad commented May 28, 2024 •

edited

Loading

liferoad commented May 29, 2024

shunping commented May 29, 2024

github-actions bot commented May 30, 2024

github-actions bot commented Jun 18, 2024

liferoad commented Jun 18, 2024 •

edited

Loading

jrmccluskey commented Jul 2, 2024

Abacn commented Aug 13, 2024

Abacn commented Aug 13, 2024

github-actions bot commented Aug 16, 2024

github-actions bot commented Aug 30, 2024

liferoad commented Aug 30, 2024

tvalentyn commented Aug 30, 2024

damccorm commented Nov 1, 2024

damccorm commented Nov 1, 2024

jrmccluskey commented Nov 1, 2024

jrmccluskey commented Nov 1, 2024

damccorm commented Nov 4, 2024

Abacn commented Nov 4, 2024

The PostCommit Python job is flaky #30513

The PostCommit Python job is flaky #30513

Comments

github-actions bot commented Mar 5, 2024

shunping commented Mar 11, 2024

shunping commented Mar 11, 2024

damccorm commented Mar 11, 2024

shunping commented Mar 11, 2024

kennknowles commented Apr 29, 2024

damccorm commented Apr 29, 2024

kennknowles commented Apr 29, 2024

Abacn commented Apr 29, 2024

kennknowles commented Apr 29, 2024

Abacn commented Apr 29, 2024

liferoad commented May 27, 2024 • edited Loading

liferoad commented May 28, 2024 • edited Loading

liferoad commented May 29, 2024

shunping commented May 29, 2024

github-actions bot commented May 30, 2024

github-actions bot commented Jun 18, 2024

liferoad commented Jun 18, 2024 • edited Loading

jrmccluskey commented Jul 2, 2024

Abacn commented Aug 13, 2024

Abacn commented Aug 13, 2024

github-actions bot commented Aug 16, 2024

github-actions bot commented Aug 30, 2024

liferoad commented Aug 30, 2024

tvalentyn commented Aug 30, 2024

damccorm commented Nov 1, 2024

damccorm commented Nov 1, 2024

jrmccluskey commented Nov 1, 2024

jrmccluskey commented Nov 1, 2024

damccorm commented Nov 4, 2024

Abacn commented Nov 4, 2024

liferoad commented May 27, 2024 •

edited

Loading

liferoad commented May 28, 2024 •

edited

Loading

liferoad commented Jun 18, 2024 •

edited

Loading