-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Java BQ] Storage API streaming load test #28264
Conversation
R: @johnjcasey |
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control |
We should have two different test configurations. One should publish performance metrics to the table, and should be the "healthy" scenario with no deliberate crashes. The other should not do this publication, and should include the intermittent failures. |
…set with pipeline options
Got it, I'll remove the |
R: @johnjcasey PTAL |
...a/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
Show resolved
Hide resolved
...atform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiWritesShardedRecords.java
Outdated
Show resolved
Hide resolved
Run PostCommit_Java_Dataflow |
Run PostCommit_Java_DataflowV2 |
Adding a streaming load test for writing via Storage API sink. Includes exactly-one and at-least-once semantics.
This test is set up to first write rows using batch FILE_LOADS mode to a "source of truth" table. Afterwards, it will write the same rows in streaming mode with Storage API to a second table. Then it will query between these two tables to check that they are identical. There is also the option of providing an existing table with the expected data, in which case the test will skip the first step.
The throughput, length of test (in minutes), and data shape can be changed by adding a new configuration line.
Also including a small addition: we can set an interval for the sink to intentionally crash every now and then. This is intended to test retry resilience. The sink will sometimes throw an exception to simulate a work item failure, and other times will exit the system to simulate a worker failure. Either way, we expect the pipeline to pick up where it left off and deliver data appropriately.