Replies: 1 comment
-
One big problem with time measurement in this particular case is that you may not only be measuring the zeroing, but also the preparation of the memory by the kernel. Assuming that you just obtained a large area of memory via, say, You can convince yourself of this behaviour by So if you want to measure performance of zeroing, it's probably best to do one zeroing pass for warming up and then zeroing the array again, measuring only the second pass. It is possible to influence when memory is taken from the kernel and when it is given back via functions like |
Beta Was this translation helpful? Give feedback.
-
Zero the output array - best way? (1.5 s for 200M array on AMD laptop, too slow, 10% of a 1e8 1e8 1d1 transform @1e-6)
switch to Calloc in makeplan, eg finufft.cpp :
finufft/src/finufft.cpp
Line 716 in cc8629f
but we use fftw_alloc_complex()... need to use alignas eg 64 bytes = avx512 width.
Would need to add a flag "fwBatch_is_zeroed" to the plan, and zero fwBatch if this is false (at the start of each batch inside the execute), and set this False after each batch in the execute.
Remove the zeroing loop from spreadinterp.cpp.
First, benchmark calloc vs naive zeroing, etc...
Beta Was this translation helpful? Give feedback.
All reactions