Follow-up to the big reorg PR #584

mreineck · 2024-10-22T17:24:13Z

New attempt because of unexpected breakage

mreineck

Some explanatory comments

CMakeLists.txt

include/finufft/utils.h

src/spreadinterp.cpp

include/finufft/finufft_core.h

mreineck · 2024-10-26T10:15:55Z

One more thing: I found a way of bypassing the FFTW memory handling functions entirely, so that we can work exclusively with vectors inside FINUFFT. It's pretty nice, but it requires the addition of an aligned allocator class, which looks like this

template<typename ElementType, std::size_t ALIGNMENT_IN_BYTES=64>
class AlignedAllocator
{
private:
  static_assert(
    ALIGNMENT_IN_BYTES >= alignof(ElementType),
    "Beware that types like int have minimum alignment requirements "
    "or access will result in crashes."
  );

public:
  using value_type = ElementType;
  static std::align_val_t constexpr ALIGNMENT{ ALIGNMENT_IN_BYTES };

    /**
     * This is only necessary because AlignedAllocator has a second template
     * argument for the alignment that will make the default
     * std::allocator_traits implementation fail during compilation.
     * @see https://stackoverflow.com/a/48062758/2191065
     */
    template<class OtherElementType>
    struct rebind
    {
      using other = AlignedAllocator<OtherElementType, ALIGNMENT_IN_BYTES>;
    };

public:
  constexpr AlignedAllocator() noexcept = default;
  constexpr AlignedAllocator( const AlignedAllocator& ) noexcept = default;

  template<typename U>
  constexpr AlignedAllocator( AlignedAllocator<U, ALIGNMENT_IN_BYTES> const& ) noexcept
    {}

  [[nodiscard]] ElementType*
  allocate( std::size_t nElementsToAllocate )
    {
    if ( nElementsToAllocate
        > std::numeric_limits<std::size_t>::max() / sizeof( ElementType ) ) {
       throw std::bad_array_new_length();
    }

  auto const nBytesToAllocate = nElementsToAllocate * sizeof( ElementType );
  return reinterpret_cast<ElementType*>(
    ::operator new[]( nBytesToAllocate, ALIGNMENT ) );
  }

  void deallocate(ElementType* allocatedPointer,
                [[maybe_unused]] std::size_t  nBytesAllocated )
  {
    /* According to the C++20 draft n4868 § 17.6.3.3, the delete operator
     * must be called with the same alignment argument as the new expression.
     * The size argument can be omitted but if present must also be equal to
     * the one used in new. */
    ::operator delete[]( allocatedPointer, ALIGNMENT );
  }
};

The code comes from https://stackoverflow.com/questions/60169819/modern-approach-to-making-stdvector-allocate-aligned-memory.

Do you think this is worth introducing? It's more code, but it removes every malloc/free/new/delete from finufft, as well as the need to check for null pointers returned from allocation functions.

DiamonDinoia · 2024-10-27T13:51:52Z

ALIGNMENT_IN_BYTES

why not ALIGNMENT_IN_BYTES=alignof(ElementType) as a member instead of a template?
Also, I is it possible to do alignas(alignment) std::vector<T> and pass the vector.data()?

mreineck · 2024-10-27T15:36:10Z

why not ALIGNMENT_IN_BYTES=alignof(ElementType) as a member instead of a template?

That would be a no-op, because every data type is automatically aligned to at least alignof(ElementType).

The point is that FFTW wants "overaligned" pointers. They point to, say, double, but should be aligned to the required alignment of an AVX datatype on an AVX-capable CPU, which is 32 bytes. If we rquire an alignment of 64 bytes, we automatically support everything up to AVX512.

Also, I is it possible to do alignas(alignment) std::vector and pass the vector.data()?

That would align the vector object itself, but not its internal data pointer.

DiamonDinoia · 2024-10-27T17:03:02Z

I see,

then I suggest using xsimd aligned allocator since it is already a dependency: xsimd::aligned_allocator<T>

mreineck · 2024-10-27T18:32:19Z

Thanks a lot, I wasn't aware of this class! This makes everything even easier.

mreineck · 2024-10-27T18:42:47Z

OK, it seems that we need to add the include path for xsimd to the test codes.

mreineck · 2024-10-27T18:48:24Z

Sorry, I don't think I know enough cmake to do this on my own.

DiamonDinoia · 2024-10-27T22:01:31Z

Can I push to this branch with cmake changes? I might have time on Tue/Wed.

mreineck · 2024-10-28T07:05:25Z

Absolutely, please do!

mreineck · 2024-10-28T08:04:20Z

Ah, perhaps I managed to fix it. To add the include directory, you have to tell cmake to link the library ... not the most intuitive thing.

makefile

DiamonDinoia · 2024-10-29T13:27:29Z

Ah, perhaps I managed to fix it. To add the include directory, you have to tell cmake to link the library ... not the most intuitive thing.

That works, Linking xsimd to the target is the trick.

include/finufft/finufft_core.h

DiamonDinoia · 2024-10-29T13:31:49Z

Not sure if this PR is the right place but we could take the opportunity to align all the data that can benefit from it now. We could used aligned allocator in the std containers introduced. In a future PR, I will also sweep on the GPU side to align data to be transferred to the GPU as aligned PCIe sends/receive are also faster.

mreineck · 2024-10-30T09:05:06Z

It is certainly possible to use this for aligning more vectors where needed, but I recommend to check whether it really makes a measurable difference. If you add a non-default allocator to a vector, this vector can no longer be passed to functions that expect, say, a simple const vector<T> &, and this can be quite a nuisance.
To be honest, I'm not sure how much the extra alignment helps FFTW in this context, especially since you typically use FFTW_ESTIMATE for planning. But I had to be extra cautious here, because I don't want to be accused for putting FFTW at a disadvantage in comparisons to the ducc FFT :-)

DiamonDinoia · 2024-10-30T17:05:06Z

It is certainly possible to use this for aligning more vectors where needed, but I recommend to check whether it really makes a measurable difference. If you add a non-default allocator to a vector, this vector can no longer be passed to functions that expect, say, a simple const vector<T> &, and this can be quite a nuisance. To be honest, I'm not sure how much the extra alignment helps FFTW in this context, especially since you typically use FFTW_ESTIMATE for planning. But I had to be extra cautious here, because I don't want to be accused for putting FFTW at a disadvantage in comparisons to the ducc FFT :-)

Yes, It is possible to define an AlignedVector<T> and pass that instead.
Alignment, impacts only certain architectures mainly older AMD in my experience. Newer should be okay. AFAIK FFTW crashes when the alignment is not followed.

DiamonDinoia · 2024-10-30T17:07:46Z

It is certainly possible to use this for aligning more vectors where needed, but I recommend to check whether it really makes a measurable difference. If you add a non-default allocator to a vector, this vector can no longer be passed to functions that expect, say, a simple const vector<T> &, and this can be quite a nuisance. To be honest, I'm not sure how much the extra alignment helps FFTW in this context, especially since you typically use FFTW_ESTIMATE for planning. But I had to be extra cautious here, because I don't want to be accused for putting FFTW at a disadvantage in comparisons to the ducc FFT :-)

Yes, It is possible to define an AlignedVector<T> and pass that instead. Alignment, impacts only certain architectures mainly older AMD in my experience. Newer should be okay. AFAIK FFTW crashes when the alignment is not followed.

It is not a bad idea to have an allocator inside the plan. One can experiment with caching allocators, aligned allocators and so on to tweak performance. In c++ would not be wild to support the allocator to be passed to finufft, for example a bigger project or exotic architecture might have his own as can pass it to finufft.

mreineck · 2024-10-30T18:16:27Z

AFAIK FFTW crashes when the alignment is not followed.

Just to clarify: this only happens if you create a plan with aligned buffers, and then execute it (using the guru interface) on less aligned data. Otherwise FFTW will deal with "standard" aligned data, just perhaps not quite as efficiently.

perftest/CMakeLists.txt

src/spreadinterp.cpp

DiamonDinoia

Looks good to me.

DiamonDinoia

Ready to merge from my side.

perftest/CMakeLists.txt

ahbarnett

Hi Martin - thanks! Took a while to go through it all (partly to understand various Class features).
My summary of your changes is in the google-doc.
There are a couple of questions and 1-2 very minor changes - should take you 15 mins. Then it can definitely come in.
Thanks so much, Alex

ahbarnett · 2024-10-22T18:44:51Z

test/directft/dirft1d.cpp

-    CPX a  = (iflag > 0) ? exp(IMA * x[j]) : exp(-IMA * x[j]);
-    CPX p  = pow(a, (FLT)kmin); // starting phase for most neg freq
-    CPX cc = c[j];              // no 1/nj prefac
+    std::complex<T> a  = (iflag > 0) ? exp(std::complex<T>(0, 1) * x[j])


can some definition be done so that I is available, as std::complex(0,1) of the right T ? We don't want to have to type this each time I is needed :)

We can, but actually I suggest to move to std::polar(a,b) instead at some point (which computes a*exp(i*b)).
Having a templated constant I doesn't look very nice either ...

Well I happen to like "I" or "IMA" or some template for 0+1i. It does not appear enough to fight about it :)

src/spreadinterp.cpp

include/finufft/finufft_core.h

include/finufft/utils.h

include/finufft_eitherprec.h

makefile

mreineck · 2024-11-05T07:52:18Z

For some reason my comment about std::norm being the field norm ended up somewhere near the absolute top of the PR's discussion, so I'll repeat it here.

std::norm claims to be the field norm, which justifies the square.
From a technical standpoint I totally understand this choice, since it avoids the (comparatively expensive) square root and produces a quantity that is needed very often (also std::abs already exists if you want the "normal" norm).
I'm using the function often, since it it has just 3 arithmetic operations compared to the (theoretical) 7 operations (+ one copy) of (a*conj(a)).real(), but the optimizer will most likely produce the same code for both.

ahbarnett

Thanks - happy to merge.

fix PR

2726bc3

ahbarnett self-requested a review October 22, 2024 18:35

finufft.cpp -> finufft_core.cpp

34b428d

mreineck commented Oct 23, 2024

View reviewed changes

include/finufft/finufft_core.h Show resolved Hide resolved

mreineck added 2 commits October 23, 2024 12:41

avoid more explicit memory management

91206c4

Merge remote-tracking branch 'origin/master' into simplify_more_v3

4e42b61

DiamonDinoia reviewed Oct 24, 2024

View reviewed changes

include/finufft/finufft_core.h Outdated Show resolved Hide resolved

make PI constexpr; switch from FFTW allocation to aligned C++ allocators

cebfb73

mreineck added 4 commits October 28, 2024 08:07

small fixes; CMake still broken

a7a8c6c

attempt to fix CMake

d746180

more fixes

0823fc6

make all object files depend on xsimd

7e90eb7

remove accidentally commited files

2502e92

mreineck commented Oct 28, 2024

View reviewed changes

makefile Show resolved Hide resolved

DiamonDinoia reviewed Oct 29, 2024

View reviewed changes

include/finufft/finufft_core.h Show resolved Hide resolved

Merge remote-tracking branch 'origin/master' into simplify_more_v3

724f27c

DiamonDinoia reviewed Nov 4, 2024

View reviewed changes

perftest/CMakeLists.txt Outdated Show resolved Hide resolved

DiamonDinoia reviewed Nov 4, 2024

View reviewed changes

src/spreadinterp.cpp Outdated Show resolved Hide resolved

DiamonDinoia approved these changes Nov 4, 2024

View reviewed changes

address review comments

3be6f0a

DiamonDinoia approved these changes Nov 4, 2024

View reviewed changes

perftest/CMakeLists.txt Outdated Show resolved Hide resolved

ahbarnett requested changes Nov 5, 2024

View reviewed changes

address review comments

cd3ca5a

ahbarnett approved these changes Nov 5, 2024

View reviewed changes

ahbarnett merged commit c2b3215 into flatironinstitute:master Nov 5, 2024
167 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Follow-up to the big reorg PR #584

Follow-up to the big reorg PR #584

mreineck commented Oct 22, 2024

mreineck left a comment

mreineck commented Oct 26, 2024 •

edited

Loading

DiamonDinoia commented Oct 27, 2024 •

edited

Loading

mreineck commented Oct 27, 2024

DiamonDinoia commented Oct 27, 2024

mreineck commented Oct 27, 2024 •

edited

Loading

mreineck commented Oct 27, 2024

mreineck commented Oct 27, 2024

DiamonDinoia commented Oct 27, 2024

mreineck commented Oct 28, 2024

mreineck commented Oct 28, 2024

DiamonDinoia commented Oct 29, 2024

DiamonDinoia commented Oct 29, 2024

mreineck commented Oct 30, 2024

DiamonDinoia commented Oct 30, 2024 •

edited

Loading

DiamonDinoia commented Oct 30, 2024

mreineck commented Oct 30, 2024

DiamonDinoia left a comment

DiamonDinoia left a comment

ahbarnett left a comment

ahbarnett Oct 22, 2024

mreineck Nov 5, 2024

ahbarnett Nov 5, 2024

mreineck commented Nov 5, 2024

ahbarnett left a comment

Follow-up to the big reorg PR #584

Follow-up to the big reorg PR #584

Conversation

mreineck commented Oct 22, 2024

mreineck left a comment

Choose a reason for hiding this comment

mreineck commented Oct 26, 2024 • edited Loading

DiamonDinoia commented Oct 27, 2024 • edited Loading

mreineck commented Oct 27, 2024

DiamonDinoia commented Oct 27, 2024

mreineck commented Oct 27, 2024 • edited Loading

mreineck commented Oct 27, 2024

mreineck commented Oct 27, 2024

DiamonDinoia commented Oct 27, 2024

mreineck commented Oct 28, 2024

mreineck commented Oct 28, 2024

DiamonDinoia commented Oct 29, 2024

DiamonDinoia commented Oct 29, 2024

mreineck commented Oct 30, 2024

DiamonDinoia commented Oct 30, 2024 • edited Loading

DiamonDinoia commented Oct 30, 2024

mreineck commented Oct 30, 2024

DiamonDinoia left a comment

Choose a reason for hiding this comment

DiamonDinoia left a comment

Choose a reason for hiding this comment

ahbarnett left a comment

Choose a reason for hiding this comment

ahbarnett Oct 22, 2024

Choose a reason for hiding this comment

mreineck Nov 5, 2024

Choose a reason for hiding this comment

ahbarnett Nov 5, 2024

Choose a reason for hiding this comment

mreineck commented Nov 5, 2024

ahbarnett left a comment

Choose a reason for hiding this comment

mreineck commented Oct 26, 2024 •

edited

Loading

DiamonDinoia commented Oct 27, 2024 •

edited

Loading

mreineck commented Oct 27, 2024 •

edited

Loading

DiamonDinoia commented Oct 30, 2024 •

edited

Loading