-
Notifications
You must be signed in to change notification settings - Fork 424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCT/CUDA: Update cuda_copy perf estimates for Grace-Hopper #10155
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -69,7 +69,12 @@ typedef struct uct_cuda_copy_iface { | |
struct { | ||
unsigned max_poll; | ||
unsigned max_cuda_events; | ||
double bandwidth; | ||
struct { | ||
double h2d; | ||
double d2h; | ||
double d2d; | ||
double other; | ||
} bw; | ||
} config; | ||
/* handler to support arm/wakeup feature */ | ||
struct { | ||
|
@@ -87,7 +92,12 @@ typedef struct uct_cuda_copy_iface_config { | |
uct_iface_config_t super; | ||
unsigned max_poll; | ||
unsigned max_cuda_events; | ||
double bandwidth; | ||
struct { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @SeyedMir why not use bw[UCS_MEMORY_TYPE_LAST][UCS_MEMORY_TYPE_LAST] and avoid explicit fields for each direction? This way we don't have to introduce {"h2d", "host to device bandwidth",
ucs_offsetof(uct_cuda_copy_iface_config_t, bw.h2d)},
... with {"h2d", "host to device bandwidth",
ucs_offsetof(uct_cuda_copy_iface_config_t, bw.[UCS_MEMORY_TYPE_UNKNOWN][UCS_MEMORY_TYPE_CUDA])}, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought about doing it that way initially in fact. But, then I thought that bw matrix will have entries for types that are completely irrelevant to CUDA, and there will be much more entries than the four in this struct. Having said that, I'm not strongly against using the matrix. |
||
double h2d; | ||
double d2h; | ||
double d2d; | ||
double other; | ||
} bw; | ||
} uct_cuda_copy_iface_config_t; | ||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why bcopy BW is slower than zcopy one? BTW
11660 * 0.95
is not equal to9320
. Maybe we need to introduce to different env variables likeBCOPY_BW
andZCOPY_BW
to control this values accurately. Or if we are OK with changing performance in common case, maybe better just not to distinguish bcopy/zcopy perf and set one value in both cases?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually not zcopy vs. bcopy; it's zcopy vs. short. Unlike zcopy, put/get short operations invoke
cuStreamSynchronize
per operation. Therefore, we want to advertise a slightly lower bw for the short vs zcopy operation for cuda_copy.I'm not sure what 9320 represents. Why do you want it to be equal to 9320?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. I am thinking about whether we need this difference to be made by this way because each change to performance estimation without proper performance testing can lead to unforeseen degradation in some cases. So if we want to tune performance for GH systems only I would like to leave performance for other platforms untouched. Or if we are OK to change performance on all platforms in that PR, I am wondering whether this 5% difference really matters or we can follow this KISS principle and set the same values for both zcopy and short cases.
@brminich @yosefe WDYT?