-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update LIBXSMM backend #1248
Update LIBXSMM backend #1248
Conversation
Note - I've been waiting for libXSMM to make a new release. Do you know if one is coming? |
This mix of commits from other branches makes this branch hard for me to follow. |
Indeed, hence the draft state. I will clean this up and mark as ready shortly. |
I do not know a timeline unfortunately. There's some information on compatibility here: https://github.com/libxsmm/libxsmm/wiki/Compatibility#libxsmm-2x, which recommends moving to the |
fc9117a
to
d8ed5d4
Compare
Cool, you did some linked PRs earlier so I wasn't sure if this was a new strategy you were adopting. I've tried to move away from linked PRs as much as I can since they tend to get confusing for others.
Hmm. I read it as 'new work should use the new interface'. It's not a huge deal if they super promise this is close to v2, but it's definitely easier to communicate libCEED dependencies we know will work if we stick to releases. We've had issues in the past tracking main branches so I'm a bit hesitant to re-adopt a strategy we intentionally moved away from. |
Perhaps we can ask @hfp to comment on this? |
Re kernel cleanup: Does libXSMM now automatically clean those up on its own? libCEED tries to be very strict about cleaning up absolutely everything we allocate. 'It will go away when the application ends' isn't enough for us. If I'm reading the comment correctly, I think it would be preferable to drop our internal hash table and let libXSMM manage caching kernels. Then we bypass the question altogether. |
This is my impression after asking in LIBXSMM here: libxsmm/libxsmm#783 (comment). I noticed this running on libCEED |
It is not clear to me if that pattern of kernel management results in libXSMM cleaning up afterwards or using the 'it all goes away when the application ends' memory management. We try to avoid the second. |
8222938
to
1300445
Compare
I agree that it is preferable to avoid the latter behavior, and there might be a way to do so though I do not know. But at least the current behavior of calling |
The docstrings is libXSMM.h in the main branch seem to say that you need to call libxsmm_release_kernel to deallocate the memory. If we are updating, we should probably be using a newer commit. But, still, the comment you linked seems to say that libXSMM has its own way to cache kernels, so I think if we are updating, we should probably drop our hash table altogether so long as we verify it doesn't have performance impacts. |
OK, here are a few results, running
Using a more recent LIBXSMM (
The changes all follow from LIBXSMM's simple example: https://github.com/libxsmm/libxsmm/blob/main_stable/samples/hello/hello.c. |
LIBXSMM performs explicit cleanup (not left-up to OS or such). This cleanup might happen "late", i.e, when your application terminates. |
This has always been the case. We never had a version not attempting to cleanup the code registry. We always had the same lifetime policy to match normal function pointers, and the function |
Looks like letting libXSMM cache the kernels itself makes good sense and has basically the same performance.
Huzzah, one less thing to worry about. Seems like we should just let libXSMM manage the caching and cleanup itself. |
It seems, then, that the only question with this PR would remain to check compatibility? |
Yeah, I say we make this change as soon as libXSMM has tagged the new release |
Yep, sounds good. Thanks for doing this study @sebastiangrimberg. Do you think the very slight regression in the most recent version is real or within the noise? |
I ran these tests on my laptop, averaging results over 10 applications of I would have to think that if we are using the library in the supported/recommended way to compile/query/apply the GEMM kernels, I don't see how we could achieve better performance or how we could be doing something which is hurting our performance. But again, I'd love to defer to the expertise of @hfp here to ensure there is not something we could be doing better for the types of GEMM/matrix sizes we are operating on. |
…d (in preparation for v2)
…ve an unncessary LIBXSMM kernel compilation
Per the discussion in #1164, it would be good to merge this branch now in anticipation of a "soon" LIBXSMM release. If you let me know when this is PR just about ready, I can update the version of LIBXSMM that CI uses and we can run CI to verify before merging. |
1962c6d
to
fe99f24
Compare
I'll update CI right after my lunch and then I think we can merge |
Ok, if you apply this diff, then we can merge this branch: $ git diff
diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
index 11b5881c..dcafb306 100644
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
@@ -18,8 +18,8 @@ noether-cpu:
- echo "-------------- HIPCC ---------------" && $HIPCC --version
- echo "-------------- GCOV ----------------" && gcov --version
# Libraries for backends
-# -- LIBXSMM 44433be9426eddaed88415646c15b3bcc61afc85
- - cd .. && export XSMM_HASH=44433be9426eddaed88415646c15b3bcc61afc85 && { [[ -d libxsmm-$XSMM_HASH ]] || { curl -L https://github.com/libxsmm/libxsmm/archive/$XSMM_HASH.tar.gz -o xsmm.tar.gz && tar zvxf xsmm.tar.gz && rm xsmm.tar.gz && make -C libxsmm-$XSMM_HASH -j$(nproc); }; } && export XSMM_DIR=$PWD/libxsmm-$XSMM_HASH && cd libCEED
+# -- LIBXSMM 2c145a109b5a8ad4e15f60ea42a86b9056bdc8b8
+ - cd .. && export XSMM_HASH=2c145a109b5a8ad4e15f60ea42a86b9056bdc8b8 && { [[ -d libxsmm-$XSMM_HASH ]] || { curl -L https://github.com/libxsmm/libxsmm/archive/$XSMM_HASH.tar.gz -o xsmm.tar.gz && tar zvxf xsmm.tar.gz && rm xsmm.tar.gz && make -C libxsmm-$XSMM_HASH -j$(nproc); }; } && export XSMM_DIR=$PWD/libxsmm-$XSMM_HASH && cd libCEED
- echo "-------------- LIBXSMM -------------" && basename $XSMM_DIR
# -- OCCA v1.1.0
- cd .. && export OCCA_VERSION=occa-1.4.0 && { [[ -d $OCCA_VERSION ]] || { git clone --depth 1 --branch v1.4.0 https://github.com/libocca/occa.git $OCCA_VERSION && cd $OCCA_VERSION && export ENABLE_OPENCL="OFF" ENABLE_DPCPP="OFF" ENABLE_HIP="OFF" ENABLE_CUDA="OFF" && ./configure-cmake.sh && cmake --build build --parallel $NPROC_CPU && cmake --install build && cd ..; }; } && export OCCA_DIR=$PWD/$OCCA_VERSION/install && cd libCEED
@@ -162,8 +162,8 @@ noether-float:
# -- MAGMA from dev branch
- echo "-------------- MAGMA ---------------"
- export MAGMA_DIR=/projects/hipMAGMA && git -C $MAGMA_DIR -c safe.directory=$MAGMA_DIR describe
-# -- LIBXSMM 44433be9426eddaed88415646c15b3bcc61afc85
- - cd .. && export XSMM_HASH=44433be9426eddaed88415646c15b3bcc61afc85 && { [[ -d libxsmm-$XSMM_HASH ]] || { curl -L https://github.com/libxsmm/libxsmm/archive/$XSMM_HASH.tar.gz -o xsmm.tar.gz && tar zvxf xsmm.tar.gz && rm xsmm.tar.gz && make -C libxsmm-$XSMM_HASH -j$(nproc); }; } && export XSMM_DIR=$PWD/libxsmm-$XSMM_HASH && cd libCEED
+# -- LIBXSMM 2c145a109b5a8ad4e15f60ea42a86b9056bdc8b8
+ - cd .. && export XSMM_HASH=2c145a109b5a8ad4e15f60ea42a86b9056bdc8b8 && { [[ -d libxsmm-$XSMM_HASH ]] || { curl -L https://github.com/libxsmm/libxsmm/archive/$XSMM_HASH.tar.gz -o xsmm.tar.gz && tar zvxf xsmm.tar.gz && rm xsmm.tar.gz && make -C libxsmm-$XSMM_HASH -j$(nproc); }; } && export XSMM_DIR=$PWD/libxsmm-$XSMM_HASH && cd libCEED
- echo "-------------- LIBXSMM -------------" && basename $XSMM_DIR
script:
- rm -f .SUCCESS |
Provide a few updates to LIBXSMM to work with recent
main_stable
andmain
(broken on since May 25, 2023):main_stable
on 05/25 removedlibxsmm_dmmdispatch
andlibxsmm_smmdispatch
, which breaks the libCEED interface. These are superceded withlibxsmm_dispatch_gemm_v2
, available onmain_stable
since Jan. 2022 (libxsmm/libxsmm@c722901).libxsmm_release_kernel
on the JIT'd kernels from the hash tables inCeedTensorContractDestroy_Xsmm
(see this comment).BLAS_LIB
in the Makefile for a known LIBXSMM configuration (for example, withBLAS=0
we can setBLAS_LIB=
).TODO: