Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meaning of items in legend of plots called profiler_rank0_callxxx.png #83

Open
cjlegg opened this issue Nov 13, 2020 · 2 comments
Open
Labels
documentation Improvements or additions to documentation

Comments

@cjlegg
Copy link
Collaborator

cjlegg commented Nov 13, 2020

The definitions four times in the legends need to be documented somewhere. Actually I now realise that "data sent" and "data received" are sizes not times! Rename to "amount of data sent (bytes)" and "amount of data received (bytes)", or if it is just one frame of data sent then "size of data sent (bytes)" and "size of data received (bytes)" might be better

The meaning of execution time is not clear - does it mean the duration of the MPI_alltoallv call as seen from the program that calls that?
The meaning of late arrival timing is not at all explained - what events is it calculated from and how?

How is the bandwidth calculated?

Would a plot of rank/machne name no vs data size vs execution time and/or late arrival timing be more helpful. This is a 3D plot but one dimension could be rendered with color. Putting all the calls on the same plot would show ranges of the behaviour.

I have just realised that repeating the experiment with different ranks assigned to different machines might reveal which poor performance is caused by the algorithm of the code under test (should go with rank if the code is deterministic) and which by a poorly perfoming compute node (because it has dodgy harware or because it has other stuff running on it)

@gvallee gvallee added the documentation Improvements or additions to documentation label Dec 18, 2020
@gvallee
Copy link
Owner

gvallee commented Dec 18, 2020

Maybe a separate issue needs to be opened about the following point: Would a plot of rank/machne name no vs data size vs execution time and/or late arrival timing be more helpful. This is a 3D plot but one dimension could be rendered with color. Putting all the calls on the same plot would show ranges of the behaviour.

@gvallee
Copy link
Owner

gvallee commented Dec 18, 2020

About the following point:

I have just realised that repeating the experiment with different ranks assigned to different machines might reveal which poor performance is caused by the algorithm of the code under test (should go with rank if the code is deterministic) and which by a poorly perfoming compute node (because it has dodgy harware or because it has other stuff running on it)

This is a great point and the tool could help for sure but I believe it is also greatly dependent on the application and what it does. So maybe we should open a separate issues and think more deeply about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants