Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update LIBXSMM backend #1248

Merged
merged 13 commits into from
Jul 12, 2023
Merged

Conversation

sebastiangrimberg
Copy link
Collaborator

@sebastiangrimberg sebastiangrimberg commented Jul 3, 2023

Provide a few updates to LIBXSMM to work with recent main_stable and main (broken on since May 25, 2023):

  • LIBXSMM commit libxsmm/libxsmm@ccc5373 on main_stable on 05/25 removed libxsmm_dmmdispatch and libxsmm_smmdispatch, which breaks the libCEED interface. These are superceded with libxsmm_dispatch_gemm_v2, available on main_stable since Jan. 2022 (libxsmm/libxsmm@c722901).
  • There is no need to call libxsmm_release_kernel on the JIT'd kernels from the hash tables in CeedTensorContractDestroy_Xsmm (see this comment).
  • Allow the user to specify and override BLAS_LIB in the Makefile for a known LIBXSMM configuration (for example, with BLAS=0 we can set BLAS_LIB=).

TODO:

  • Backward compatibility (?)

@jeremylt
Copy link
Member

jeremylt commented Jul 3, 2023

Note - I've been waiting for libXSMM to make a new release. Do you know if one is coming?

@jeremylt
Copy link
Member

jeremylt commented Jul 3, 2023

This mix of commits from other branches makes this branch hard for me to follow.

@sebastiangrimberg
Copy link
Collaborator Author

This mix of commits from other branches makes this branch hard for me to follow.

Indeed, hence the draft state. I will clean this up and mark as ready shortly.

@sebastiangrimberg
Copy link
Collaborator Author

Note - I've been waiting for libXSMM to make a new release. Do you know if one is coming?

I do not know a timeline unfortunately. There's some information on compatibility here: https://github.com/libxsmm/libxsmm/wiki/Compatibility#libxsmm-2x, which recommends moving to the v2 interface for now as I understand it.

@jeremylt
Copy link
Member

jeremylt commented Jul 3, 2023

Indeed, hence the draft state. I will clean this up and mark as ready shortly.

Cool, you did some linked PRs earlier so I wasn't sure if this was a new strategy you were adopting. I've tried to move away from linked PRs as much as I can since they tend to get confusing for others.

recommends moving to the v2 interface for now as I understand it.

Hmm. I read it as 'new work should use the new interface'. It's not a huge deal if they super promise this is close to v2, but it's definitely easier to communicate libCEED dependencies we know will work if we stick to releases. We've had issues in the past tracking main branches so I'm a bit hesitant to re-adopt a strategy we intentionally moved away from.

@sebastiangrimberg
Copy link
Collaborator Author

Perhaps we can ask @hfp to comment on this?

@jeremylt
Copy link
Member

jeremylt commented Jul 3, 2023

Re kernel cleanup: Does libXSMM now automatically clean those up on its own? libCEED tries to be very strict about cleaning up absolutely everything we allocate. 'It will go away when the application ends' isn't enough for us.

If I'm reading the comment correctly, I think it would be preferable to drop our internal hash table and let libXSMM manage caching kernels. Then we bypass the question altogether.

@sebastiangrimberg
Copy link
Collaborator Author

Re kernel cleanup: Does libXSMM now automatically clean those up on its own? libCEED tries to be very strict about cleaning up absolutely everything we allocate.

This is my impression after asking in LIBXSMM here: libxsmm/libxsmm#783 (comment). I noticed this running on libCEED main when using the commit of LIBXSMM tested in the CI: https://github.com/CEED/libCEED/blob/main/.gitlab-ci.yml#L22. When running with LIBXSMM_VERBOSE, I saw errors like LIBXSMM ERROR: failed to release kernel!. I asked in LIBXSMM if we needed to free the kernels explicitly and was told no, unless you build LIBXSMM in a specific way.

@jeremylt
Copy link
Member

jeremylt commented Jul 3, 2023

It is not clear to me if that pattern of kernel management results in libXSMM cleaning up afterwards or using the 'it all goes away when the application ends' memory management. We try to avoid the second.

@sebastiangrimberg
Copy link
Collaborator Author

I agree that it is preferable to avoid the latter behavior, and there might be a way to do so though I do not know. But at least the current behavior of calling libxsmm_release_kernel resulting in error messages seems like it isn't the right way to use the API and should be fixed.

@jeremylt
Copy link
Member

jeremylt commented Jul 3, 2023

The docstrings is libXSMM.h in the main branch seem to say that you need to call libxsmm_release_kernel to deallocate the memory.

If we are updating, we should probably be using a newer commit.

But, still, the comment you linked seems to say that libXSMM has its own way to cache kernels, so I think if we are updating, we should probably drop our hash table altogether so long as we verify it doesn't have performance impacts.

@sebastiangrimberg
Copy link
Collaborator Author

OK, here are a few results, running LIBXSMM_VERBOSE=3 ./bps -ceed /cpu/self/xsmm/serial,/cpu/self/xsmm/blocked -problem bp1,bp3 -degree 3 -q_extra 1 -simplex -local_nodes 100000 -ksp_max_it 200 (M1 Max). I ran with -simplex to put a bit more stress on the basis application, but would expect the changes to be the same for tensor-product elements:

  • main (with LIBXSMM from 44433be9426eddaed88415646c15b3bcc61afc85):

    • bp1, serial: 4.7 MDoFs/s
    • bp1, blocked: 25.9 MDoFs/s
    • bp3, serial: 1.7 MDoFs/s
    • bp3, blocked: 9.0 MDoFs/s
    • JIT (small/medium/large): 7/8/4
  • 1300445 (using libCEED hash tables to cache kernels):

    • bp1, serial: 4.7 MDoFs/s
    • bp1, blocked: 25.1 MDoFs/s
    • bp3, serial: 1.7 MDoFs/s
    • bp3, blocked: 9.0 MDoFs/s
    • JIT: 15/8/0
  • 1962c6d (using LIBXSMM's internal kernel lookup):

    • bp1, serial: 4.8 MDoFs/s
    • bp1, blocked: 26.3 MDoFs/s
    • bp3, serial: 1.7 MDoFs/s
    • bp3, blocked: 9.2 MDoFs/s
    • JIT: 4/3/0

Using a more recent LIBXSMM (main_stable from July 1):

  • 1300445 (using libCEED hash tables to cache kernels): Does not work for xsmm/serial (libxsmm_[sd]gemm segfault?)

  • 1962c6d (using LIBXSMM's internal kernel lookup):

    • bp1, serial: 4.7 MDoFs/s
    • bp1, blocked: 26.0 MDoFs/s
    • bp3, serial: 1.7 MDoFs/s
    • bp3, blocked: 9.0 MDoFs/s
    • JIT: 2/2/0

The changes all follow from LIBXSMM's simple example: https://github.com/libxsmm/libxsmm/blob/main_stable/samples/hello/hello.c.

@hfp
Copy link

hfp commented Jul 3, 2023

It is not clear to me if that pattern of kernel management results in libXSMM cleaning up afterwards or using the 'it all goes away when the application ends' memory management. We try to avoid the second.

LIBXSMM performs explicit cleanup (not left-up to OS or such). This cleanup might happen "late", i.e, when your application terminates.

@hfp
Copy link

hfp commented Jul 3, 2023

Re kernel cleanup: Does libXSMM now automatically clean those up on its own?

This has always been the case. We never had a version not attempting to cleanup the code registry. We always had the same lifetime policy to match normal function pointers, and the function libxsmm_release_kernel only exists "for symmetry" and completeness.

@jeremylt
Copy link
Member

jeremylt commented Jul 3, 2023

Looks like letting libXSMM cache the kernels itself makes good sense and has basically the same performance.

LIBXSMM performs explicit cleanup (not left-up to OS or such). This cleanup might happen "late", i.e, when your application terminates.

Huzzah, one less thing to worry about.

Seems like we should just let libXSMM manage the caching and cleanup itself.

@sebastiangrimberg
Copy link
Collaborator Author

It seems, then, that the only question with this PR would remain to check compatibility? libxsmm_dispatch_gemm_v2 has existed since master-1.17-2069, which is from Jan. 2022. If a user were to install with Spack by default they would get v1.17 (unless using @develop) which predates this by about a month. So, for the user's sake it does seem like it would be irresponsible to merge this change until a new version of LIBXSMM is tagged...

@jeremylt
Copy link
Member

jeremylt commented Jul 3, 2023

Yeah, I say we make this change as soon as libXSMM has tagged the new release

@jedbrown
Copy link
Member

jedbrown commented Jul 3, 2023

Yep, sounds good. Thanks for doing this study @sebastiangrimberg. Do you think the very slight regression in the most recent version is real or within the noise?

@sebastiangrimberg
Copy link
Collaborator Author

I ran these tests on my laptop, averaging results over 10 applications of KSPSolve. I would have to think these small differences are just noise, but unfortunately I don't have a good enough familiarity with LIBXSMM to comment on any possible performance regressions. I can try running the benchmarks on a server instead to try to get more consistent performance and see if there is any noticable difference in performance.

I would have to think that if we are using the library in the supported/recommended way to compile/query/apply the GEMM kernels, I don't see how we could achieve better performance or how we could be doing something which is hurting our performance. But again, I'd love to defer to the expertise of @hfp here to ensure there is not something we could be doing better for the types of GEMM/matrix sizes we are operating on.

@jeremylt
Copy link
Member

Per the discussion in #1164, it would be good to merge this branch now in anticipation of a "soon" LIBXSMM release.

If you let me know when this is PR just about ready, I can update the version of LIBXSMM that CI uses and we can run CI to verify before merging.

Makefile Show resolved Hide resolved
@jeremylt
Copy link
Member

I'll update CI right after my lunch and then I think we can merge

@jeremylt
Copy link
Member

Ok, if you apply this diff, then we can merge this branch:

$ git diff
diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
index 11b5881c..dcafb306 100644
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
@@ -18,8 +18,8 @@ noether-cpu:
     - echo "-------------- HIPCC ---------------" && $HIPCC --version
     - echo "-------------- GCOV ----------------" && gcov --version
 # Libraries for backends
-# -- LIBXSMM 44433be9426eddaed88415646c15b3bcc61afc85
-    - cd .. && export XSMM_HASH=44433be9426eddaed88415646c15b3bcc61afc85 && { [[ -d libxsmm-$XSMM_HASH ]] || { curl -L https://github.com/libxsmm/libxsmm/archive/$XSMM_HASH.tar.gz -o xsmm.tar.gz && tar zvxf xsmm.tar.gz && rm xsmm.tar.gz && make -C libxsmm-$XSMM_HASH -j$(nproc); }; } && export XSMM_DIR=$PWD/libxsmm-$XSMM_HASH && cd libCEED
+# -- LIBXSMM 2c145a109b5a8ad4e15f60ea42a86b9056bdc8b8
+    - cd .. && export XSMM_HASH=2c145a109b5a8ad4e15f60ea42a86b9056bdc8b8 && { [[ -d libxsmm-$XSMM_HASH ]] || { curl -L https://github.com/libxsmm/libxsmm/archive/$XSMM_HASH.tar.gz -o xsmm.tar.gz && tar zvxf xsmm.tar.gz && rm xsmm.tar.gz && make -C libxsmm-$XSMM_HASH -j$(nproc); }; } && export XSMM_DIR=$PWD/libxsmm-$XSMM_HASH && cd libCEED
     - echo "-------------- LIBXSMM -------------" && basename $XSMM_DIR
 # -- OCCA v1.1.0
     - cd .. && export OCCA_VERSION=occa-1.4.0 && { [[ -d $OCCA_VERSION ]] || { git clone --depth 1 --branch v1.4.0 https://github.com/libocca/occa.git $OCCA_VERSION && cd $OCCA_VERSION && export ENABLE_OPENCL="OFF" ENABLE_DPCPP="OFF" ENABLE_HIP="OFF" ENABLE_CUDA="OFF" && ./configure-cmake.sh && cmake --build build --parallel $NPROC_CPU && cmake --install build && cd ..; }; } && export OCCA_DIR=$PWD/$OCCA_VERSION/install && cd libCEED
@@ -162,8 +162,8 @@ noether-float:
 # -- MAGMA from dev branch
     - echo "-------------- MAGMA ---------------"
     - export MAGMA_DIR=/projects/hipMAGMA && git -C $MAGMA_DIR -c safe.directory=$MAGMA_DIR describe
-# -- LIBXSMM 44433be9426eddaed88415646c15b3bcc61afc85
-    - cd .. && export XSMM_HASH=44433be9426eddaed88415646c15b3bcc61afc85 && { [[ -d libxsmm-$XSMM_HASH ]] || { curl -L https://github.com/libxsmm/libxsmm/archive/$XSMM_HASH.tar.gz -o xsmm.tar.gz && tar zvxf xsmm.tar.gz && rm xsmm.tar.gz && make -C libxsmm-$XSMM_HASH -j$(nproc); }; } && export XSMM_DIR=$PWD/libxsmm-$XSMM_HASH && cd libCEED
+# -- LIBXSMM 2c145a109b5a8ad4e15f60ea42a86b9056bdc8b8
+    - cd .. && export XSMM_HASH=2c145a109b5a8ad4e15f60ea42a86b9056bdc8b8 && { [[ -d libxsmm-$XSMM_HASH ]] || { curl -L https://github.com/libxsmm/libxsmm/archive/$XSMM_HASH.tar.gz -o xsmm.tar.gz && tar zvxf xsmm.tar.gz && rm xsmm.tar.gz && make -C libxsmm-$XSMM_HASH -j$(nproc); }; } && export XSMM_DIR=$PWD/libxsmm-$XSMM_HASH && cd libCEED
     - echo "-------------- LIBXSMM -------------" && basename $XSMM_DIR
   script:
     - rm -f .SUCCESS

@jeremylt jeremylt marked this pull request as ready for review July 12, 2023 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants