[Java BQ] Storage API streaming load test #28264

ahmedabu98 · 2023-08-31T15:31:33Z

Adding a streaming load test for writing via Storage API sink. Includes exactly-one and at-least-once semantics.

This test is set up to first write rows using batch FILE_LOADS mode to a "source of truth" table. Afterwards, it will write the same rows in streaming mode with Storage API to a second table. Then it will query between these two tables to check that they are identical. There is also the option of providing an existing table with the expected data, in which case the test will skip the first step.

The throughput, length of test (in minutes), and data shape can be changed by adding a new configuration line.

Also including a small addition: we can set an interval for the sink to intentionally crash every now and then. This is intended to test retry resilience. The sink will sometimes throw an exception to simulate a work item failure, and other times will exit the system to simulate a worker failure. Either way, we expect the pipeline to pick up where it left off and deliver data appropriately.

…test

ahmedabu98 · 2023-09-19T18:13:29Z

R: @johnjcasey

github-actions · 2023-09-19T18:15:37Z

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

…test

johnjcasey · 2023-09-19T18:41:58Z

We should have two different test configurations. One should publish performance metrics to the table, and should be the "healthy" scenario with no deliberate crashes. The other should not do this publication, and should include the intermittent failures.

…set with pipeline options

ahmedabu98 · 2023-09-19T19:44:41Z

Got it, I'll remove the crashSink option from TestProperties so that it's not exposed to the performance testing framework.
I'll still include the crashing logic. We can later create a test that makes use of this same class by just passing `crashStorageApiSinkEverySeconds=" to pipeline options. If this option is set, the test will run the pipelines normally without publishing any metrics.

…test

ahmedabu98 · 2023-09-21T17:50:25Z

R: @johnjcasey
R: @reuvenlax

PTAL

it/google-cloud-platform/build.gradle

...a/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java

...atform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiWritesShardedRecords.java

ahmedabu98 · 2023-10-02T22:19:19Z

Run PostCommit_Java_Dataflow

ahmedabu98 · 2023-10-02T22:19:25Z

Run PostCommit_Java_DataflowV2

ahmedabu98 added 3 commits August 24, 2023 11:03

more accurate regex for table spec; add toTableSpec

b573f81

storage api streaming load test

0b4a279

Merge branch 'master' of https://github.com/ahmedabu98/beam into load…

0084a07

…test

github-actions bot added java io gcp labels Aug 31, 2023

ahmedabu98 added 3 commits August 31, 2023 12:01

spotless

6d0c0d3

add documentation

ef6c1a3

add transform descriptions

73819a2

ahmedabu98 mentioned this pull request Sep 13, 2023

Vortex test #26943

Closed

ahmedabu98 added 4 commits September 15, 2023 07:32

Merge branch 'master' of https://github.com/ahmedabu98/beam into load…

424a8b8

…test

use io performance test utilities

920d2a8

only fail 30% of the time

9801457

add test to yaml file

a96eb9b

Merge branch 'master' of https://github.com/ahmedabu98/beam into load…

096342f

…test

do not expose crash sink to performance test framework. still can be …

ef173ee

…set with pipeline options

github-actions bot added the build label Sep 19, 2023

ahmedabu98 added 3 commits September 19, 2023 16:23

spotless

8202977

Merge branch 'master' of https://github.com/ahmedabu98/beam into load…

79b5109

…test

use periodicimpulse stopAfter

189870b

ahmedabu98 marked this pull request as ready for review September 21, 2023 17:50

johnjcasey requested changes Sep 21, 2023

View reviewed changes

ahmedabu98 added 3 commits September 21, 2023 14:48

remove unneeded dependency

abf0a3a

use custom bq service with crashing behavior

1ef141c

spotless

2b06fbf

log as error; fix another log message

676b4c0

ahmedabu98 requested review from johnjcasey and reuvenlax September 26, 2023 18:28

johnjcasey approved these changes Oct 3, 2023

View reviewed changes

ahmedabu98 merged commit 2e05211 into apache:master Oct 3, 2023
28 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Java BQ] Storage API streaming load test #28264

[Java BQ] Storage API streaming load test #28264

ahmedabu98 commented Aug 31, 2023 •

edited

Loading

ahmedabu98 commented Sep 19, 2023

github-actions bot commented Sep 19, 2023

johnjcasey commented Sep 19, 2023

ahmedabu98 commented Sep 19, 2023

ahmedabu98 commented Sep 21, 2023

ahmedabu98 commented Oct 2, 2023

ahmedabu98 commented Oct 2, 2023

[Java BQ] Storage API streaming load test #28264

[Java BQ] Storage API streaming load test #28264

Conversation

ahmedabu98 commented Aug 31, 2023 • edited Loading

ahmedabu98 commented Sep 19, 2023

github-actions bot commented Sep 19, 2023

johnjcasey commented Sep 19, 2023

ahmedabu98 commented Sep 19, 2023

ahmedabu98 commented Sep 21, 2023

ahmedabu98 commented Oct 2, 2023

ahmedabu98 commented Oct 2, 2023

ahmedabu98 commented Aug 31, 2023 •

edited

Loading