PlanNodeStats add addInputTiming, getOutputTiming and finishTiming #10986

jinchengchenghh · 2024-09-12T07:27:26Z

New planNodeStats output

-- Project[4][expressions: (c0:INTEGER, ROW["c0"]), (p1:BIGINT, plus(ROW["c1"],1)), (p2:BIGINT, plus(ROW["c1"],ROW["u_c1"]))] -> c0:INTEGER, p1:BIGINT, p2:BIGINT
      Output: 2000 rows (154.34KB, 20 batches), Cpu time: 907.80us, Blocked wall time: 0ns, Peak memory: 2.00KB, Memory allocations: 40, Threads: 1, CPU breakdown: I/O/F (27.24us/872.82us/7.74us)
   -- HashJoin[3][INNER c0=u_c0] -> c0:INTEGER, c1:BIGINT, u_c1:BIGINT
      Output: 2000 rows (136.23KB, 20 batches), Cpu time: 508.74us, Blocked wall time: 0ns, Peak memory: 88.50KB, Memory allocations: 7, CPU breakdown: I/O/F (177.87us/329.20us/1.66us)
      HashBuild: Input: 100 rows (1.31KB, 1 batches), Output: 0 rows (0B, 0 batches), Cpu time: 41.77us, Blocked wall time: 0ns, Peak memory: 68.00KB, Memory allocations: 2, Threads: 1, CPU breakdown: I/O/F(40.18us/1.59us/0ns)
      HashProbe: Input: 2000 rows (118.12KB, 20 batches), Output: 2000 rows (136.23KB, 20 batches), Cpu time: 466.97us, Blocked wall time: 0ns, Peak memory: 20.50KB, Memory allocations: 5, Threads: 1, CPU breakdown: I/O/F (137.69us/327.61us/1.66us)
      -- TableScan[2][table: hive_table] -> c0:INTEGER, c1:BIGINT
         Input: 2000 rows (118.12KB, 20 batches), Raw Input: 20480 rows (72.79KB), Output: 2000 rows (118.12KB, 20 batches), Cpu time: 8.89ms, Blocked wall time: 10.00us, Peak memory: 80.38KB, Memory allocations: 262, Threads: 1, Splits: 20, DynamicFilter producer plan nodes: 3, CPU breakdown: I/O/F (0ns/8.88ms/4.93us)
      -- Project[1][expressions: (u_c0:INTEGER, ROW["c0"]), (u_c1:BIGINT, ROW["c1"])] -> u_c0:INTEGER, u_c1:BIGINT
         Output: 100 rows (1.31KB, 1 batches), Cpu time: 43.22us, Blocked wall time: 0ns, Peak memory: 0B, Memory allocations: 0, Threads: 1, CPU breakdown: I/O/F (691ns/5.54us/36.98us)
         -- Values[0][100 rows in 1 vectors] -> c0:INTEGER, c1:BIGINT
            Input: 0 rows (0B, 0 batches), Output: 100 rows (1.31KB, 1 batches), Cpu time: 3.05us, Blocked wall time: 0ns, Peak memory: 0B, Memory allocations: 0, Threads: 1, CPU breakdown: I/O/F (0ns/2.48us/568ns)

netlify · 2024-09-12T07:27:44Z

✅ Deploy Preview for meta-velox canceled.

Name	Link
🔨 Latest commit	`6c6a060`
🔍 Latest deploy log	https://app.netlify.com/sites/meta-velox/deploys/670c8d66126a40000882af7a

mbasmanova

@jinchengchenghh Thank you for the change.

Would you update the summary to show an example of the new output?

Would be nice to update the docs as well: https://facebookincubator.github.io/velox/develop/debugging/print-plan-with-stats.html

CC: @rui-mo

mbasmanova · 2024-09-13T08:08:48Z

velox/exec/PlanNodeStats.h

@@ -76,6 +76,15 @@ struct PlanNodeStats {
  /// Sum of output bytes for all corresponding operators.
  uint64_t outputBytes{0};

+  // Sum of addInput for all corresponding operators.


Sum of CPU, scheduled and wall times for addInput call for all ...

mbasmanova · 2024-09-13T08:10:29Z

velox/exec/PlanNodeStats.cpp

@@ -100,6 +103,9 @@ std::string PlanNodeStats::toString(bool includeInputStats) const {
    out << ", Physical written output: " << succinctBytes(physicalWrittenBytes);
  }
  out << ", Cpu time: " << succinctNanos(cpuWallTiming.cpuNanos)
+      << ", Add input cpu time: " << succinctNanos(addInputTiming.cpuNanos)


Maybe shorten a bit more readability and move to the end of the message

CPU time: ..., Blocked all time: ..., CPU breakdown: I / O / F (addInput / getOutput / finish)

jinchengchenghh · 2024-09-23T02:38:49Z

Addressed all the comments, can you help review again? Thanks! @mbasmanova

mbasmanova · 2024-09-23T11:48:15Z

velox/exec/tests/AggregationTest.cpp

@@ -862,6 +862,8 @@ TEST_F(AggregationTest, partialAggregationMemoryLimit) {
          .customStats.at("flushRowCount")
          .sum,
      0);
+  std::cout << toPlanStats(task->taskStats()).at(aggNodeId).toString()


Accidental change? Let's remove.

mbasmanova · 2024-09-23T11:49:07Z

velox/exec/PlanNodeStats.cpp

@@ -122,6 +125,13 @@ std::string PlanNodeStats::toString(bool includeInputStats) const {
        << folly::join(',', dynamicFilterStats.producerNodeIds);
  }

+  out << ", CPU breakdown: I/O/F"


add empty space after "I/O/F"

jinchengchenghh · 2024-10-12T03:06:32Z

Addressed all the comments. Could you help review again? Thanks! @mbasmanova

mbasmanova · 2024-10-13T07:13:52Z

velox/docs/develop/debugging/print-plan-with-stats.rst


-Velox also measures CPU time and peak memory usage for each operator. This
-information is shown for all plan nodes.
+Velox also measures CPU time and the breakdown of CPU time which including addInput, getOutput and finish time,


There are some typos. Overall, it might be better to split this paragraph.

Velox also measures CPU time, peak memory usage and total number of memory allocations for each operator. Cpu time: 8.89ms, Peak memory: 80.38KB, Memory allocations: 262 A breakdown of CPU time into addInput, getOutput and finish stages of the operator is also available. I/O/F below is a shortcut for addInput/getOutput/finish. CPU breakdown: I/O/F (0ns/8.88ms/4.93us)

mbasmanova · 2024-10-13T07:15:07Z

velox/docs/develop/debugging/print-plan-with-stats.rst

@@ -225,6 +266,13 @@ printPlanWithStats shows this information as “Blocked wall time”.

 	Blocked wall time: 10.00us

+Some operators like TableScan may produce the dynamic filter, reports the plan node ids.


TableScan doesn't produce dynamic filters. These are produced by HashProbe operator. This doc update seems unrelated to the changes in the PR. Would you extract it in a separate PR so we can get it right without blocking this PR?

Sure, changed, only left CPU Time and CPU breakdown change in this document.

mbasmanova

Thanks.

facebook-github-bot · 2024-10-14T15:46:25Z

@kgpai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-10-15T20:31:09Z

@kgpai merged this pull request in c30c8f9.

conbench-facebook · 2024-10-15T21:01:11Z

Conbench analyzed the 1 benchmark run on commit c30c8f9d.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 12, 2024

jinchengchenghh mentioned this pull request Sep 13, 2024

Add OrderBy benchmark #10041

Open

mbasmanova reviewed Sep 13, 2024

View reviewed changes

jinchengchenghh force-pushed the opmetric branch from 7e1e147 to 8d59147 Compare September 23, 2024 02:37

mbasmanova reviewed Sep 23, 2024

View reviewed changes

jinchengchenghh force-pushed the opmetric branch from 392aa5c to 7ca08fe Compare October 12, 2024 03:05

mbasmanova reviewed Oct 13, 2024

View reviewed changes

jinchengchenghh force-pushed the opmetric branch from 7ca08fe to 4cee7eb Compare October 14, 2024 03:16

mbasmanova approved these changes Oct 14, 2024

View reviewed changes

mbasmanova added the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label Oct 14, 2024

jinchengchenghh added 2 commits October 14, 2024 10:44

PlanNodeStats add addInputTiming, getOutputTiming and finishTiming

4cee7eb

minor

6c6a060

facebook-github-bot closed this in c30c8f9 Oct 15, 2024

facebook-github-bot added the Merged label Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PlanNodeStats add addInputTiming, getOutputTiming and finishTiming #10986

PlanNodeStats add addInputTiming, getOutputTiming and finishTiming #10986

jinchengchenghh commented Sep 12, 2024 •

edited

Loading

netlify bot commented Sep 12, 2024 •

edited

Loading

mbasmanova left a comment

mbasmanova Sep 13, 2024

mbasmanova Sep 13, 2024

jinchengchenghh commented Sep 23, 2024

mbasmanova Sep 23, 2024

mbasmanova Sep 23, 2024

jinchengchenghh commented Oct 12, 2024

mbasmanova Oct 13, 2024

mbasmanova Oct 13, 2024

jinchengchenghh Oct 14, 2024

mbasmanova left a comment

facebook-github-bot commented Oct 14, 2024

facebook-github-bot commented Oct 15, 2024

conbench-facebook bot commented Oct 15, 2024

		@@ -225,6 +266,13 @@ printPlanWithStats shows this information as “Blocked wall time”.

		Blocked wall time: 10.00us

		Some operators like TableScan may produce the dynamic filter, reports the plan node ids.

PlanNodeStats add addInputTiming, getOutputTiming and finishTiming #10986

PlanNodeStats add addInputTiming, getOutputTiming and finishTiming #10986

Conversation

jinchengchenghh commented Sep 12, 2024 • edited Loading

netlify bot commented Sep 12, 2024 • edited Loading

✅ Deploy Preview for meta-velox canceled.

mbasmanova left a comment

Choose a reason for hiding this comment

mbasmanova Sep 13, 2024

Choose a reason for hiding this comment

mbasmanova Sep 13, 2024

Choose a reason for hiding this comment

jinchengchenghh commented Sep 23, 2024

mbasmanova Sep 23, 2024

Choose a reason for hiding this comment

mbasmanova Sep 23, 2024

Choose a reason for hiding this comment

jinchengchenghh commented Oct 12, 2024

mbasmanova Oct 13, 2024

Choose a reason for hiding this comment

mbasmanova Oct 13, 2024

Choose a reason for hiding this comment

jinchengchenghh Oct 14, 2024

Choose a reason for hiding this comment

mbasmanova left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Oct 14, 2024

facebook-github-bot commented Oct 15, 2024

conbench-facebook bot commented Oct 15, 2024

jinchengchenghh commented Sep 12, 2024 •

edited

Loading

netlify bot commented Sep 12, 2024 •

edited

Loading