Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with Trilinos TPetra instantiations when using Intel MKL #390

Open
gassmoeller opened this issue Jul 18, 2024 · 1 comment
Open

Comments

@gassmoeller
Copy link
Member

We have problems installing trilinos on TACC Frontera with the latest candi version (see bug report in the ASPECT forum).
We see errors of the sort:

ld.bfd: /work2/10103/hx38324/frontera/libs/trilinos-release-14-4-0/lib/libstratimikosbelos.so.14.4: undefined reference to `Tpetra::MultiVector<float, int, long long, Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >::randomize()'
ld.bfd: /work2/10103/hx38324/frontera/libs/trilinos-release-14-4-0/lib/libstratimikosbelos.so.14.4: undefined reference to `Tpetra::DistObject<float, int, long long, Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >::packAndPrepare(Tpetra::SrcDistObject const&, Kokkos::DualView<int const*, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, void, void> const&, Kokkos::DualView<float*, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, void, void>&, Kokkos::DualView<unsigned long*, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, void, void>, unsigned long&)'
ld.bfd: /work2/10103/hx38324/frontera/libs/trilinos-release-14-4-0/lib/libstratimikosbelos.so.14.4: undefined reference to `Tpetra::MultiVector<float, int, long long, Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >::subViewNonConst(Teuchos::Range1D const&)'
ld.bfd: /work2/10103/hx38324/frontera/libs/trilinos-release-14-4-0/lib/libstratimikosbelos.so.14.4: undefined reference to `Tpetra::MultiVector<float, int, long long, Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >::update(float const&, Tpetra::MultiVector<float, int, long long, Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > const&, float const&)'
ld.bfd: /work2/10103/hx38324/frontera/libs/trilinos-release-14-4-0/lib/libstratimikosbelos.so.14.4: undefined reference to `Tpetra::DistObject<float, int, long long, Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >::reallocArraysForNumPacketsPerLid(unsigned long, unsigned long)'
ld.bfd: /work2/10103/hx38324/frontera/libs/trilinos-release-14-4-0/lib/libstratimikosbelos.so.14.4: undefined reference to `Tpetra::MultiVector<float, int, long long, Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >::need_sync_device() const'
ld.bfd: /work2/10103/hx38324/frontera/libs/trilinos-release-14-4-0/lib/libstratimikosbelos.so.14.4: undefined reference to `Tpetra::DistObject<float, int, long long, Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >::unpackAndCombine(Kokkos::DualView<int const*, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, void, void> const&, Kokkos::DualView<float*, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, void, void>, Kokkos::DualView<unsigned long*, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, void, void>, unsigned long, Tpetra::CombineMode, Kokkos::Serial const&)'

I played around a bit with this myself and it looks like one of the instantiations of TPetra (for float) is missing. I found this line which is active when using MKL (which we do). This looks like it could be the reason for the missing instantiation. Unfortunately I cannot simply add that instantiation, because the original bug in MKL is still there (HAVE_TEUCHOS_BLASFLOAT is false, so trilinos thinks the Intel MKL blas implementation does not support float). Interestingly the candi branch dealii-9.5 compiles without issues, so something must have changed in candi (maybe #375 or #350).

Any pointers for how to resolve this problem would be appreciated.

For now I try working around the problem by using the cluster provided trilinos modules and/or using the old candi version.

@cgcgcg
Copy link

cgcgcg commented Sep 19, 2024

This issue was recently reported to Trilinos: trilinos/Trilinos#13456

The problem is that candi disables float scalar type for Tpetra in builds with MKL

-D Tpetra_INST_FLOAT:BOOL=OFF \

but enables them for the rest of Trilinos:
-D Trilinos_ENABLE_FLOAT=ON \

This exposed an issue with Trilinos' CMake logic that was fixed here: trilinos/Trilinos#13457

The consequence of this change is that Candi's configuration for Trilinos will error out for future versions of Trilinos. The fix for Candi is to set Trilinos_ENABLE_FLOAT=OFF when building with MKL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants