Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PlanNodeStats add addInputTiming, getOutputTiming and finishTiming #10986

Closed

Conversation

jinchengchenghh
Copy link
Contributor

@jinchengchenghh jinchengchenghh commented Sep 12, 2024

New planNodeStats output

-- Project[4][expressions: (c0:INTEGER, ROW["c0"]), (p1:BIGINT, plus(ROW["c1"],1)), (p2:BIGINT, plus(ROW["c1"],ROW["u_c1"]))] -> c0:INTEGER, p1:BIGINT, p2:BIGINT
      Output: 2000 rows (154.34KB, 20 batches), Cpu time: 907.80us, Blocked wall time: 0ns, Peak memory: 2.00KB, Memory allocations: 40, Threads: 1, CPU breakdown: I/O/F (27.24us/872.82us/7.74us)
   -- HashJoin[3][INNER c0=u_c0] -> c0:INTEGER, c1:BIGINT, u_c1:BIGINT
      Output: 2000 rows (136.23KB, 20 batches), Cpu time: 508.74us, Blocked wall time: 0ns, Peak memory: 88.50KB, Memory allocations: 7, CPU breakdown: I/O/F (177.87us/329.20us/1.66us)
      HashBuild: Input: 100 rows (1.31KB, 1 batches), Output: 0 rows (0B, 0 batches), Cpu time: 41.77us, Blocked wall time: 0ns, Peak memory: 68.00KB, Memory allocations: 2, Threads: 1, CPU breakdown: I/O/F(40.18us/1.59us/0ns)
      HashProbe: Input: 2000 rows (118.12KB, 20 batches), Output: 2000 rows (136.23KB, 20 batches), Cpu time: 466.97us, Blocked wall time: 0ns, Peak memory: 20.50KB, Memory allocations: 5, Threads: 1, CPU breakdown: I/O/F (137.69us/327.61us/1.66us)
      -- TableScan[2][table: hive_table] -> c0:INTEGER, c1:BIGINT
         Input: 2000 rows (118.12KB, 20 batches), Raw Input: 20480 rows (72.79KB), Output: 2000 rows (118.12KB, 20 batches), Cpu time: 8.89ms, Blocked wall time: 10.00us, Peak memory: 80.38KB, Memory allocations: 262, Threads: 1, Splits: 20, DynamicFilter producer plan nodes: 3, CPU breakdown: I/O/F (0ns/8.88ms/4.93us)
      -- Project[1][expressions: (u_c0:INTEGER, ROW["c0"]), (u_c1:BIGINT, ROW["c1"])] -> u_c0:INTEGER, u_c1:BIGINT
         Output: 100 rows (1.31KB, 1 batches), Cpu time: 43.22us, Blocked wall time: 0ns, Peak memory: 0B, Memory allocations: 0, Threads: 1, CPU breakdown: I/O/F (691ns/5.54us/36.98us)
         -- Values[0][100 rows in 1 vectors] -> c0:INTEGER, c1:BIGINT
            Input: 0 rows (0B, 0 batches), Output: 100 rows (1.31KB, 1 batches), Cpu time: 3.05us, Blocked wall time: 0ns, Peak memory: 0B, Memory allocations: 0, Threads: 1, CPU breakdown: I/O/F (0ns/2.48us/568ns)

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 12, 2024
Copy link

netlify bot commented Sep 12, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 6c6a060
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/670c8d66126a40000882af7a

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jinchengchenghh Thank you for the change.

Would you update the summary to show an example of the new output?

Would be nice to update the docs as well: https://facebookincubator.github.io/velox/develop/debugging/print-plan-with-stats.html

CC: @rui-mo

@@ -76,6 +76,15 @@ struct PlanNodeStats {
/// Sum of output bytes for all corresponding operators.
uint64_t outputBytes{0};

// Sum of addInput for all corresponding operators.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sum of CPU, scheduled and wall times for addInput call for all ...

@@ -100,6 +103,9 @@ std::string PlanNodeStats::toString(bool includeInputStats) const {
out << ", Physical written output: " << succinctBytes(physicalWrittenBytes);
}
out << ", Cpu time: " << succinctNanos(cpuWallTiming.cpuNanos)
<< ", Add input cpu time: " << succinctNanos(addInputTiming.cpuNanos)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe shorten a bit more readability and move to the end of the message

CPU time: ..., Blocked all time: ..., CPU breakdown: I / O / F (addInput / getOutput / finish)

@jinchengchenghh
Copy link
Contributor Author

Addressed all the comments, can you help review again? Thanks! @mbasmanova

@@ -862,6 +862,8 @@ TEST_F(AggregationTest, partialAggregationMemoryLimit) {
.customStats.at("flushRowCount")
.sum,
0);
std::cout << toPlanStats(task->taskStats()).at(aggNodeId).toString()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accidental change? Let's remove.

@@ -122,6 +125,13 @@ std::string PlanNodeStats::toString(bool includeInputStats) const {
<< folly::join(',', dynamicFilterStats.producerNodeIds);
}

out << ", CPU breakdown: I/O/F"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add empty space after "I/O/F"

@jinchengchenghh
Copy link
Contributor Author

Addressed all the comments. Could you help review again? Thanks! @mbasmanova


Velox also measures CPU time and peak memory usage for each operator. This
information is shown for all plan nodes.
Velox also measures CPU time and the breakdown of CPU time which including addInput, getOutput and finish time,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some typos. Overall, it might be better to split this paragraph.

Velox also measures CPU time, peak memory usage and total number of memory allocations for each operator. 

Cpu time: 8.89ms, Peak memory: 80.38KB, Memory allocations: 262

A breakdown of CPU time into addInput, getOutput and finish stages of the operator is also available. I/O/F below is a shortcut for addInput/getOutput/finish.

CPU breakdown: I/O/F (0ns/8.88ms/4.93us)

@@ -225,6 +266,13 @@ printPlanWithStats shows this information as “Blocked wall time”.

Blocked wall time: 10.00us

Some operators like TableScan may produce the dynamic filter, reports the plan node ids.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TableScan doesn't produce dynamic filters. These are produced by HashProbe operator. This doc update seems unrelated to the changes in the PR. Would you extract it in a separate PR so we can get it right without blocking this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, changed, only left CPU Time and CPU breakdown change in this document.

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@mbasmanova mbasmanova added the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label Oct 14, 2024
@facebook-github-bot
Copy link
Contributor

@kgpai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@kgpai merged this pull request in c30c8f9.

Copy link

Conbench analyzed the 1 benchmark run on commit c30c8f9d.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants