-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Follow-up to the big reorg PR #584
Follow-up to the big reorg PR #584
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some explanatory comments
One more thing: I found a way of bypassing the FFTW memory handling functions entirely, so that we can work exclusively with template<typename ElementType, std::size_t ALIGNMENT_IN_BYTES=64>
class AlignedAllocator
{
private:
static_assert(
ALIGNMENT_IN_BYTES >= alignof(ElementType),
"Beware that types like int have minimum alignment requirements "
"or access will result in crashes."
);
public:
using value_type = ElementType;
static std::align_val_t constexpr ALIGNMENT{ ALIGNMENT_IN_BYTES };
/**
* This is only necessary because AlignedAllocator has a second template
* argument for the alignment that will make the default
* std::allocator_traits implementation fail during compilation.
* @see https://stackoverflow.com/a/48062758/2191065
*/
template<class OtherElementType>
struct rebind
{
using other = AlignedAllocator<OtherElementType, ALIGNMENT_IN_BYTES>;
};
public:
constexpr AlignedAllocator() noexcept = default;
constexpr AlignedAllocator( const AlignedAllocator& ) noexcept = default;
template<typename U>
constexpr AlignedAllocator( AlignedAllocator<U, ALIGNMENT_IN_BYTES> const& ) noexcept
{}
[[nodiscard]] ElementType*
allocate( std::size_t nElementsToAllocate )
{
if ( nElementsToAllocate
> std::numeric_limits<std::size_t>::max() / sizeof( ElementType ) ) {
throw std::bad_array_new_length();
}
auto const nBytesToAllocate = nElementsToAllocate * sizeof( ElementType );
return reinterpret_cast<ElementType*>(
::operator new[]( nBytesToAllocate, ALIGNMENT ) );
}
void deallocate(ElementType* allocatedPointer,
[[maybe_unused]] std::size_t nBytesAllocated )
{
/* According to the C++20 draft n4868 § 17.6.3.3, the delete operator
* must be called with the same alignment argument as the new expression.
* The size argument can be omitted but if present must also be equal to
* the one used in new. */
::operator delete[]( allocatedPointer, ALIGNMENT );
}
}; The code comes from https://stackoverflow.com/questions/60169819/modern-approach-to-making-stdvector-allocate-aligned-memory. Do you think this is worth introducing? It's more code, but it removes every |
why not |
That would be a no-op, because every data type is automatically aligned to at least The point is that FFTW wants "overaligned" pointers. They point to, say,
That would align the |
I see, then I suggest using xsimd aligned allocator since it is already a dependency: |
Thanks a lot, I wasn't aware of this class! This makes everything even easier. |
OK, it seems that we need to add the include path for |
Sorry, I don't think I know enough |
Can I push to this branch with cmake changes? I might have time on Tue/Wed. |
Absolutely, please do! |
Ah, perhaps I managed to fix it. To add the include directory, you have to tell |
That works, Linking xsimd to the target is the trick. |
Not sure if this PR is the right place but we could take the opportunity to align all the data that can benefit from it now. We could used aligned allocator in the std containers introduced. In a future PR, I will also sweep on the GPU side to align data to be transferred to the GPU as aligned PCIe sends/receive are also faster. |
It is certainly possible to use this for aligning more |
Yes, It is possible to define an |
It is not a bad idea to have an allocator inside the plan. One can experiment with caching allocators, aligned allocators and so on to tweak performance. In c++ would not be wild to support the allocator to be passed to finufft, for example a bigger project or exotic architecture might have his own as can pass it to finufft. |
Just to clarify: this only happens if you create a plan with aligned buffers, and then execute it (using the guru interface) on less aligned data. Otherwise FFTW will deal with "standard" aligned data, just perhaps not quite as efficiently. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ready to merge from my side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Martin - thanks! Took a while to go through it all (partly to understand various Class features).
My summary of your changes is in the google-doc.
There are a couple of questions and 1-2 very minor changes - should take you 15 mins. Then it can definitely come in.
Thanks so much, Alex
CPX a = (iflag > 0) ? exp(IMA * x[j]) : exp(-IMA * x[j]); | ||
CPX p = pow(a, (FLT)kmin); // starting phase for most neg freq | ||
CPX cc = c[j]; // no 1/nj prefac | ||
std::complex<T> a = (iflag > 0) ? exp(std::complex<T>(0, 1) * x[j]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can some definition be done so that I is available, as std::complex(0,1) of the right T ? We don't want to have to type this each time I is needed :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can, but actually I suggest to move to std::polar(a,b)
instead at some point (which computes a*exp(i*b)
).
Having a templated constant I
doesn't look very nice either ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I happen to like "I" or "IMA" or some template for 0+1i. It does not appear enough to fight about it :)
For some reason my comment about
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - happy to merge.
New attempt because of unexpected breakage