-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement BigQuery Stress Test #30287
Conversation
Assigning reviewers. If you would like to opt out of this review, comment R: @AnandInguva added as fallback since no labels match configuration Available commands:
The PR bot will only process comments in the main thread (not review comments). |
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control |
/** | ||
* BigQueryIO stress tests. The test is designed to assess the performance of BigQueryIO under | ||
* various conditions. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind adding a comment about how can we trigger specific test with gradle command line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome! Gooing to taking a closer look. Here is a few initial comment
Is the test currently writing metrics to influxDB, if so we can setup a grafana dashboard for it for http://metrics.beam.apache.org/ (in a separate PR) even it is currently empty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks pretty good. Just had a comment about the ignored test and metrics nomenclature (no actions need for this PR, something to think about later)
} | ||
|
||
@Test | ||
@Ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please add a note why the test is ignored for each specific test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed @Ignore
@JsonProperty public boolean exportMetricsToInfluxDB = false; | ||
|
||
/** InfluxDB measurement to publish results to. * */ | ||
@JsonProperty public String influxMeasurement; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(A note for future dashboard) Would the measurement be unique for each stress test / each IO ?
Pasted an example metrics fyi
{
"TotalStreamingDataProcessed": 0.0,
"BillableShuffleDataProcessed": 0.0,
"EstimatedCost": 0.06965586137166667,
"AvgInputThroughputBytesPerSec": 6.712440411666667E7,
"ElapsedTime": 1811.0,
"MaxCpuUtilization": 0.7093815630496711,
"AvgCpuUtilization": 0.6400222870148885,
"AvgInputThroughputElementsPerSec": 66132.418,
"TotalPdUsage": 405457.0,
"TotalGpuTime": 0.0,
"TotalSsdUsage": 0.0,
"MaxInputThroughputElementsPerSec": 71111.15,
"TotalDcuUsage": 0.0,
"TotalVcpuTime": 3243.0,
"TotalShuffleDataProcessed": 0.0,
"EstimatedDataProcessedGB": 101.5,
"TotalMemoryUsage": 1.3286034E7,
"MaxInputThroughputBytesPerSec": 7.217781435E7
}
As can be seen, the field name obtained from getMetrics is not are the same for all pipelines. If we start to publish to influxDB, we need to consider the naming of measurement field to distinguish different test settings.
Even better is to use influxDB tags to distinguish them, however currently it is not supported by IOITMetrics.
https://docs.influxdata.com/influxdb/v1/concepts/glossary/#measurement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed using suffixes, so the measurement should be unique for each stress test case.
It's a good idea to use tags, we'll look into that in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
This pull request introduces stress tests for BigQueryIO, designed to assess the performance under various conditions. The stress tests simulate dynamic load increases and evaluate the behavior of BigQueryIO for different write formats and methods.
Changes:
Dynamic load increases over time example:
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.