Meaning of items in legend of plots called profiler_rank0_callxxx.png #83

cjlegg · 2020-11-13T14:44:56Z

The definitions four times in the legends need to be documented somewhere. Actually I now realise that "data sent" and "data received" are sizes not times! Rename to "amount of data sent (bytes)" and "amount of data received (bytes)", or if it is just one frame of data sent then "size of data sent (bytes)" and "size of data received (bytes)" might be better

The meaning of execution time is not clear - does it mean the duration of the MPI_alltoallv call as seen from the program that calls that?
The meaning of late arrival timing is not at all explained - what events is it calculated from and how?

How is the bandwidth calculated?

Would a plot of rank/machne name no vs data size vs execution time and/or late arrival timing be more helpful. This is a 3D plot but one dimension could be rendered with color. Putting all the calls on the same plot would show ranges of the behaviour.

I have just realised that repeating the experiment with different ranks assigned to different machines might reveal which poor performance is caused by the algorithm of the code under test (should go with rank if the code is deterministic) and which by a poorly perfoming compute node (because it has dodgy harware or because it has other stuff running on it)

gvallee · 2020-12-18T15:34:46Z

Maybe a separate issue needs to be opened about the following point: Would a plot of rank/machne name no vs data size vs execution time and/or late arrival timing be more helpful. This is a 3D plot but one dimension could be rendered with color. Putting all the calls on the same plot would show ranges of the behaviour.

gvallee · 2020-12-18T15:36:50Z

About the following point:

I have just realised that repeating the experiment with different ranks assigned to different machines might reveal which poor performance is caused by the algorithm of the code under test (should go with rank if the code is deterministic) and which by a poorly perfoming compute node (because it has dodgy harware or because it has other stuff running on it)

This is a great point and the tool could help for sure but I believe it is also greatly dependent on the application and what it does. So maybe we should open a separate issues and think more deeply about it.

gvallee added the documentation Improvements or additions to documentation label Dec 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Meaning of items in legend of plots called profiler_rank0_callxxx.png #83

Meaning of items in legend of plots called profiler_rank0_callxxx.png #83

cjlegg commented Nov 13, 2020

gvallee commented Dec 18, 2020

gvallee commented Dec 18, 2020

Meaning of items in legend of plots called profiler_rank0_callxxx.png #83

Meaning of items in legend of plots called profiler_rank0_callxxx.png #83

Comments

cjlegg commented Nov 13, 2020

gvallee commented Dec 18, 2020

gvallee commented Dec 18, 2020