[Failing Test]: PythonPostCommit is Extremely Flaky #29214

jrmccluskey · 2023-10-31T14:29:53Z

What happened?

The apache_beam/io/external/xlang_kinesisio_it_test.py::CrossLanguageKinesisIOTest::test_kinesis_write test is failing in the Python PostCommit with a consistent error message:

botocore.exceptions.ConnectionClosedError: Connection was closed before we received a valid response from endpoint URL: "http://localhost:32770/".

The test is defined here: https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/external/xlang_kinesisio_it_test.py#L94

Specifically, the failure is in create_stream():

      if self.use_localstack:
>       self.kinesis_helper.create_stream(self.aws_kinesis_stream)

Issue Failure

Failure: Test is continually failing

Issue Priority

Priority: 1 (unhealthy code / failing or flaky postcommit so we cannot be sure the product is healthy)

Issue Components

The text was updated successfully, but these errors were encountered:

AnandInguva · 2023-11-03T18:34:44Z

Is this still relevant?

jrmccluskey · 2023-11-03T18:38:19Z

Yep still a problem, @damccorm said there had been a few issues along these lines with the self-hosted runners

volatilemolotov · 2023-11-07T11:57:36Z

Im looking into this one, there are issues with runners that soon will be fixed but that is still not the final fix needed for python postcommit, keep you posted

volatilemolotov · 2023-11-07T16:10:52Z

.take-issue

damccorm · 2023-11-07T19:26:30Z

@volatilemolotov I don't think you meant to auto-close this with the PR, is that right? If yes, we can reclose after a green signal anyways I guess

volatilemolotov · 2023-11-07T19:30:13Z

I did not meant to autoclose its only a part. Sorry im not aware of mechanisms, worked on a lot of different systems :)

damccorm · 2023-11-07T19:31:17Z

No worries, its actually a GitHub feature - https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword

volatilemolotov · 2023-11-08T10:38:56Z

So a scheduled run failed with one actually passing. Other three jobs fail in different places
https://github.com/apache/beam/actions/runs/6794201259/job/18470220889#step:9:37734

Any ideas what is going on? Could it be because of parallel run (I had a full green run in my fork)

damccorm · 2023-11-08T14:37:30Z

Definitely seems like we've upgraded from permared jobs to test flakiness, so I don't think this is a runner/actions problem anymore. For example https://github.com/apache/beam/actions/runs/6797617071/job/18480096798 already has 3 green jobs (with a 4th still running)

At least some of it is caused by #29076 - I see a bunch of failures related to that test in the workflow you linked.

I have #29197 to fix that, was holding off on merging since there was a lot going on causing issues, but it might be time to merge. I'm running https://github.com/apache/beam/actions/runs/6802293125 to make sure I'm correctly sickbaying it, but once that runs (assuming its working as expected) I think we should merge the PR.

volatilemolotov · 2023-11-08T15:12:04Z

The flow you referenced is green
https://github.com/apache/beam/actions/runs/6797617071/job/18480096798

So yeah, flakiness. Glad to have it sorted we were lucky that MTU issues did not cause bigger problems

Abacn · 2024-01-10T16:25:58Z

Now it is still flaky though with lower frequency: https://github.com/apache/beam/runs/20274778848

There is other flaky test. e.g.

apache_beam.examples.fastavro_it_test.FastavroIT

apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: Dataflow pipeline failed. State: FAILED, Error:
Traceback (most recent call last):
  File "apache_beam/runners/common.py", line 1435, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 636, in apache_beam.runners.common.SimpleInvoker.invoke_process
  File "apache_beam/runners/common.py", line 1611, in apache_beam.runners.common._OutputHandler.handle_process_outputs
  File "/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.10/site-packages/apache_beam/io/filebasedsource.py", line 380, in process
    source = self._source_from_file(metadata.path)
  File "/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.10/site-packages/apache_beam/io/filebasedsource.py", line 127, in __init__
    self._validate()
  File "/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.10/site-packages/apache_beam/options/value_provider.py", line 193, in _f
    return fnc(self, *args, **kwargs)
  File "/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.10/site-packages/apache_beam/io/filebasedsource.py", line 190, in _validate
    raise IOError('No files found based on the file pattern %s' % pattern)
OSError: No files found based on the file pattern gs://temp-storage-for-end-to-end-tests/py-it-cloud/output/e26d0c72-c41a-43e4-aa51-598e6994b277/fastavro-00003-of-00004

lostluck · 2024-01-17T21:20:44Z

There's one week until the 2.54.0 cut and this issue is tagged for that release, if possible/necessary please complete the necessary work before then, or move this to the 2.55.0 Release Milestone.

This one seems like we may need to cherry pick though if additional fixes occur.

volatilemolotov · 2024-01-18T10:48:49Z

Its still flaky, failing differently

https://github.com/apache/beam/actions/runs/7563095321/job/20594841089#step:9:37337

Cloud be the test is somehow broken but i cannot see a pattern right now

lostluck · 2024-02-06T18:14:57Z

Closing this one, as all the python postcommits are passing on the release branch.

jrmccluskey added bug failing test awaiting triage labels Oct 31, 2023

github-actions bot added python tests P1 permared labels Oct 31, 2023

jrmccluskey removed the awaiting triage label Nov 7, 2023

volatilemolotov mentioned this issue Nov 7, 2023

add mtu to docker comose files in python test suite #29334

Merged

3 tasks

AnandInguva assigned volatilemolotov Nov 7, 2023

damccorm closed this as completed in #29334 Nov 7, 2023

damccorm reopened this Nov 7, 2023

github-actions bot added this to the 2.53.0 Release milestone Nov 7, 2023

volatilemolotov mentioned this issue Nov 8, 2023

set dockerMTU to 1460 to align with GKE #29346

Merged

3 tasks

jrmccluskey changed the title ~~[Failing Test]: PythonPostCommit is Perma-Red~~ [Failing Test]: PythonPostCommit is Extremely Flaky Dec 6, 2023

jrmccluskey modified the milestones: 2.53.0 Release, 2.54.0 Release Dec 6, 2023

lostluck closed this as completed Feb 6, 2024

github-actions bot modified the milestones: 2.54.0 Release, 2.55.0 Release Feb 6, 2024

damccorm added the done & done Issue has been reviewed after it was closed for verification, followups, etc. label Feb 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Failing Test]: PythonPostCommit is Extremely Flaky #29214

[Failing Test]: PythonPostCommit is Extremely Flaky #29214

jrmccluskey commented Oct 31, 2023 •

edited

Loading

AnandInguva commented Nov 3, 2023

jrmccluskey commented Nov 3, 2023

volatilemolotov commented Nov 7, 2023

volatilemolotov commented Nov 7, 2023

damccorm commented Nov 7, 2023

volatilemolotov commented Nov 7, 2023

damccorm commented Nov 7, 2023

volatilemolotov commented Nov 8, 2023

damccorm commented Nov 8, 2023 •

edited

Loading

volatilemolotov commented Nov 8, 2023

Abacn commented Jan 10, 2024

lostluck commented Jan 17, 2024

volatilemolotov commented Jan 18, 2024

lostluck commented Feb 6, 2024

[Failing Test]: PythonPostCommit is Extremely Flaky #29214

[Failing Test]: PythonPostCommit is Extremely Flaky #29214

Comments

jrmccluskey commented Oct 31, 2023 • edited Loading

What happened?

Issue Failure

Issue Priority

Issue Components

AnandInguva commented Nov 3, 2023

jrmccluskey commented Nov 3, 2023

volatilemolotov commented Nov 7, 2023

volatilemolotov commented Nov 7, 2023

damccorm commented Nov 7, 2023

volatilemolotov commented Nov 7, 2023

damccorm commented Nov 7, 2023

volatilemolotov commented Nov 8, 2023

damccorm commented Nov 8, 2023 • edited Loading

volatilemolotov commented Nov 8, 2023

Abacn commented Jan 10, 2024

lostluck commented Jan 17, 2024

volatilemolotov commented Jan 18, 2024

lostluck commented Feb 6, 2024

jrmccluskey commented Oct 31, 2023 •

edited

Loading

damccorm commented Nov 8, 2023 •

edited

Loading