A bunch of questions in cufinufft with respect to ntransf
and streams
#323
-
I was curious about a bunch of questions (in cufinufft) concerning this repository that could help with performance:
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 7 replies
-
Dear Chaithya, |
Beta Was this translation helpful? Give feedback.
-
Launching on multiple streams is generally helpful for overlapping Regardless, we'll be looking into general optimizations in the near future, and I'll investigate how to best utilize streams.
You're right. This should be a pedantic call and unnecessary. The cost should be order nanoseconds iirc though. I'll remove it on my optimization pass.
This will require some thought. By IO tasks do you mean moving data to the GPU, or loading it from some data store? In the latter case, you can thread that off yourself straightforwardly right? In the former, I'm skeptical this will help more than a few percent, but open to have my mind changed. Transfers are typically a tiny tiny fraction of the computation time. Do you have some rough numbers for timings and/or the type of data you're working with? Is the GPU reporting any kind of underutilization? Are you using the python or C/C++ API? |
Beta Was this translation helpful? Give feedback.
Launching on multiple streams is generally helpful for overlapping
host <==> device
transfers with kernel execution, which is on the agenda of improvements to consider. It can be used for concurrent kernel execution, but, unless you're working with very small transforms (where you probably shouldn't be using cufinufft, or even finufft for...) it's unlikely that concurrent kernel execution will be particularly helpful. In most c…