-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release 4.5.00 #2427
Open
ndellingwood
wants to merge
480
commits into
kokkos:master
Choose a base branch
from
ndellingwood:master-release-4.5.00
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Release 4.5.00 #2427
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Update cmtestallsandia
* Fix kokkos#2130 - Do not call BsrMatrix spmv impl if block size is 1 - Instead, convert it to unmanaged CrsMatrix and call spmv again - cuSPARSE returned an error code in this case - Better performance * Formatting * Remove redundant remove_pointer_t Handle is already a non-pointer type
…s#2135) This could be further automated to run on matrix from suite sparse
…okkos#2133) Since we are now in the 4.2 series we only support up to 4.1.00. Older version of Kokkos Core will require older version of Kokkos Kernels for compatibility. Once 4.3.00 is out we will move to drop support for the 4.1 series and only keep 4.2 and 4.3 series.
* ODE: adding BDF algorithms Implementing BDF formula for stiff ODEs. Orders 1 to 5 are available and tested. The integrators can be called on GPU to solve multiple systems in parallel. * ODE: fixing storage handling for start-up RK stack * ODE: clang-format * ODE: first adaptive version of BDF The current implementation only allows for adaptivity in time, at this point the BDF Step actually converges as expected with first order integration! * ODE: fixing issues with adaptive BDF The unit-test BDF_adaptive now shows the integration of the logistic equation using adaptive time steps and increasing integration order from 1 to 5. * ODE: running BDF on StiffChemistry problem The problem runs fine and is solved but there are oscillations while the behavior of the solution is smooth. More investigation is needed... * BDF: fixing types and template parameters in batched calls Bascially we need template parameters to be more versatile and cannot assume that all rank1 views will have the exact same underlying type, for instance layouts can be different. * More fixes for GPUs only in tests this time. * ODE: BDF adaptive, fix small bug After adding rhs and update vectors to temp the subviews taken for other variables need to be offset appropriately... * Revert "More fixes for GPUs only in tests this time." This reverts commit 2f70432. * Revert "Revert "More fixes for GPUs only in tests this time."" This reverts commit 836012b. * ODE: BDF small change to temporarily avoid compile time issue True fix involving a KOKKOS_VERSION check is upcoming after more tests on GPU side... * ODE: BDF fix for some printf statements that will go away soon... * ODE: adding benchmark for BDF The benchmark helps us monitor the performance of the BDF implementaiton across multiple platforms as well as impact of changes over time. * ODE: improve benchmark interface... * ODE: BDF changes to use RMS norm and change some default values Small changes to compare more closely with reference implementation. Some of these might be reverted eventually but that's fine for now. * ODE: BDF convergence more stable and results look pretty good now! Changing the Newton solver convergence criteria as well as changing a few default input parameters leads to a more stable algorithms which can now integrate the stiff Henderson autocatalytic example well in 66 time steps instead of 200k for fixed order integration... * ODE: BDF fix bug in initial time step calculation The initial step routine was overwriting the initial right hand side which led to obvious issues further down the road... now things should work fine. Need to figure out if I can re-initialize the variables in the perf test while excluding that time from each iteration. * ODE: BDF removing bad print statement... std::cout in device code * ODE - BDF: improving perf test Basically adding new untimed setup within the main loop of the benchmark to reset the intial conditions, buffers and vectors ahead of each iteration. * Modifying unit-test to catch proper return type * Applying clang-format
add rocm/5.6.1 and rocm/6.0.0, and openblas/0.3.23 as tpl
…2134) * Sparse MKL: changing the location of the MKL_SAFE_CALL macro Moving the macro outside of namespaces to ensure that it will be interpreted correctly when called from any other location in the library. It does not make much sense to guard Impl code in the Experimental namespace and in this case it cleans up a problem with namespace disambiguation for the compiler... * Sparse BsrSpMV: removing Experimental namespace from Impl namespace * Applying clang-format * Sparse SpMV: fixing more namespace issues!
…ia-caraway cm_test_all_sandia: update caraway compilers
…kos#2140) This change makes it easier for customer to leverage TPL support which almost always requires offset=int, ordinal=int to be enabled meaning that no TPL support is available with our default ETI...
Resolve compilation errors in nightly cuda/12.2 A100 build
…ssing_descriptor Spmv bsr matrix fix missing matrix descriptor (rocsparse)
Temporary objects like "A()" get destructed immediately. For the object to have scope lifetime, it needs a name like "A a();". This was causing cusparse/rocsparse spmv to always execute on the default stream, causing incorrect timing in the spmv perf test.
It actually is part of the public interface
…-namespacing KokkosSparse_spmv_bsrmatrix_spec: fix Bsr_TC_Precision namespacing
* Spmv perf test improvements - Add option to flush caches by filling a dummy buffer between iterations - Add option to call the non-reuse interface instead of handle/reuse interface - Fix modes T, H in nonsquare case (make x,y the correct length) * Fix mode help text
One of the overload requires an unused template, removing that extraneous template and simplify how that function is called in a second overload.
Co-authored-by: brian-kelley <[email protected]>
module updates post TOSS upgrade
This is only hit when spmv is called with integer scalars, which doesn't happen in our CI but does often in Tpetra.
…ia-solo cm_test_all_sandia: solo updates
* SPMV tpl fixes, workaround * Avoid possible integer conversion warnings * Document cusparseSpMM algos that were tested
KokkosKernels Utils: cleaning the zero_vector interface
Now a declaration like CrsMatrix<Scalar, Ordinal, Device> will by default use an ETI'd type combination (as int is the default ETI'd offset)
* implement batched serial pbtrs Signed-off-by: Yuuichi Asahi <[email protected]> * format Signed-off-by: Yuuichi Asahi <[email protected]> * fix: docstrings for pbtrs Signed-off-by: Yuuichi Asahi <[email protected]> * move implementation details under Impl namespace Signed-off-by: Yuuichi Asahi <[email protected]> * Add missing check for pbtrs Signed-off-by: Yuuichi Asahi <[email protected]> * fix: conflicts Signed-off-by: Yuuichi Asahi <[email protected]> * fix: use EXPECT_NEAR_KK_REL for check Signed-off-by: Yuuichi Asahi <[email protected]> * remove unused variable xm from pbtrs impl Signed-off-by: Yuuichi Asahi <[email protected]> --------- Signed-off-by: Yuuichi Asahi <[email protected]> Co-authored-by: Yuuichi Asahi <[email protected]>
* Fixing potential overflow issue in inner product trait When result type is double and inputs are floats, one input has to be cast to double so the multiplication operator for double is used instead of the float multiplication operator that could overflow for valid double values. Handle the complex case for mixed input/output fp types Signed-off-by: Luc Berger-Vergiat <[email protected]> * Adding fixes for various integer overflow fixes. Signed-off-by: Luc Berger-Vergiat <[email protected]> --------- Signed-off-by: Luc Berger-Vergiat <[email protected]>
Bumps [actions/dependency-review-action](https://github.com/actions/dependency-review-action) from 4.3.4 to 4.3.5. - [Release notes](https://github.com/actions/dependency-review-action/releases) - [Commits](actions/dependency-review-action@5a2ce3f...a6993e2) --- updated-dependencies: - dependency-name: actions/dependency-review-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.26.13 to 3.27.0. - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](github/codeql-action@f779452...6624720) --- updated-dependencies: - dependency-name: github/codeql-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [actions/checkout](https://github.com/actions/checkout) from 4.2.1 to 4.2.2. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](actions/checkout@eef6144...11bd719) --- updated-dependencies: - dependency-name: actions/checkout dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
In favor of KOKKOSBATCHED_IMPL_ENABLE_INTEL_MKL Signed-off-by: Carl Pearson <[email protected]>
* fix include path of Impl Signed-off-by: Yuuichi Asahi <[email protected]> * improve batched serial laswp tests Signed-off-by: Yuuichi Asahi <[email protected]> * fix comments in Test_Batched_SerialLaswp.hpp Signed-off-by: Yuuichi Asahi <[email protected]> --------- Signed-off-by: Yuuichi Asahi <[email protected]> Co-authored-by: Yuuichi Asahi <[email protected]>
* implement batched serial iamax Signed-off-by: Yuuichi Asahi <[email protected]> * Add missing static_assertion in iamax Signed-off-by: Yuuichi Asahi <[email protected]> * fix: CodeQL Signed-off-by: Yuuichi Asahi <[email protected]> * fix: reintroduce RealType in impl_test_batched_iamax Signed-off-by: Yuuichi Asahi <[email protected]> * fix: use view size_type as a return type of iamax Signed-off-by: Yuuichi Asahi <[email protected]> --------- Signed-off-by: Yuuichi Asahi <[email protected]> Co-authored-by: Yuuichi Asahi <[email protected]>
* CodeQL: trying to fix issues with multiplication results conversion This avoids potential overflow when low precision data is multiplied and then store in higher precision variable: size_t = int * int Focusing on issues in the library for now, unit-tests will be fixed later. Signed-off-by: Luc Berger-Vergiat <[email protected]> * Applying clang-format Signed-off-by: Luc Berger-Vergiat <[email protected]> * Switching a few static_cast to size_t for clarity After discussion in the PR, these changes should not result in issues when passed to the view constructors and improve clarity for future maintenance. Signed-off-by: Luc Berger-Vergiat <[email protected]> --------- Signed-off-by: Luc Berger-Vergiat <[email protected]>
In favor of KOKKOSBATCHED_IMPL_ENABLE_INTEL_MKL_BATCHED Signed-off-by: Carl Pearson <[email protected]>
Let's set a good example in our examples Signed-off-by: Carl Pearson <[email protected]>
Just like the previous round of fixes related to multiplication overflowing when result type has wider range, this should get CodeQL to be a little happier. Signed-off-by: Luc Berger-Vergiat <[email protected]>
Last one of a series of fixes to clean-up the CodeQL safety issues, after that we should be all clean! Signed-off-by: Luc Berger-Vergiat <[email protected]>
* Add address sanitizer and most of undefined sanitizer. Exclude vptr due to Preconditioner visibility. Exclude signed integer overflow because we do this all over the place. Signed-off-by: Carl Pearson <[email protected]> * Reducing ETI scope a lot to improve build size and time This is not a permanent fix, we probably need to set this build on a different platform but should be enough to get one set of results and observe how good/bad we are doing... Signed-off-by: Carl Pearson <[email protected]> * ci: osx-ci -> ubuntu-asan-ubsan-ci Signed-off-by: Carl Pearson <[email protected]> * ci: drop compiler warnings on ci sanitizers build Signed-off-by: Carl Pearson <[email protected]> * ci: Kokkos_DIR -> Kokkos_ROOT Signed-off-by: Carl Pearson <[email protected]> * ci: ditch relative paths and working directories Signed-off-by: Carl Pearson <[email protected]> * ci: drop Kokkos_ENABLE_DEPRECATED_CODE_3 Signed-off-by: Carl Pearson <[email protected]> * ci: fix kokkos kernels source path Signed-off-by: Carl Pearson <[email protected]> * ci: add UBSAN_OPTIONS to get stack trace Signed-off-by: Carl Pearson <[email protected]> --------- Signed-off-by: Carl Pearson <[email protected]> Co-authored-by: Luc Berger <[email protected]>
Bumps [actions/dependency-review-action](https://github.com/actions/dependency-review-action) from 4.3.5 to 4.4.0. - [Release notes](https://github.com/actions/dependency-review-action/releases) - [Commits](actions/dependency-review-action@a6993e2...4081bf9) --- updated-dependencies: - dependency-name: actions/dependency-review-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [softprops/action-gh-release](https://github.com/softprops/action-gh-release) from 2.0.8 to 2.0.9. - [Release notes](https://github.com/softprops/action-gh-release/releases) - [Changelog](https://github.com/softprops/action-gh-release/blob/master/CHANGELOG.md) - [Commits](softprops/action-gh-release@c062e08...e7a8f85) --- updated-dependencies: - dependency-name: softprops/action-gh-release dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* ODE - RK: fixing small issues reported by Yaro 1. fix integer division to floating point division 2. fix evaluation of max scaled error 3. increase or decrease time step using uniform formula 4. use num_steps instead of max_steps for dt calculation 5. add a time step when using constant dt to avoid issues with round-off errors 6. fixing exponent and moving adaptivity computation out of RKStep 7. adding time step counter 8. adding more tests and keep track of time steps if wanted Signed-off-by: Luc Berger-Vergiat <[email protected]> * RK: fixing variable name after rebase Signed-off-by: Luc Berger-Vergiat <[email protected]> * RK: enabling most methods after fixing test related issues Signed-off-by: Luc Berger-Vergiat <[email protected]> * RK: passing new unit-tests Signed-off-by: Luc Berger-Vergiat <[email protected]> * Applying clang-format Signed-off-by: Luc Berger-Vergiat <[email protected]> * RK: fix bad subview creation Signed-off-by: Luc Berger-Vergiat <[email protected]> * RK: fix bug that computes the inital step size for non-adaptive case This prevents having the user defined time step and leads to wrong results. The rate of convergence tests are now passing! Signed-off-by: Luc Berger-Vergiat <[email protected]> * clang-format... Signed-off-by: Luc Berger-Vergiat <[email protected]> * RK: tweaking the tolerances a bit On GPU the lowest order method (RK1-2) is accumulating a bit more errors than on CPU. Only an issue when comparing values to zero where the absolute tolerance is needed to detect good conv. Signed-off-by: Luc Berger-Vergiat <[email protected]> * Adding reference for some implementation details and heuristic values Signed-off-by: Luc Berger-Vergiat <[email protected]> --------- Signed-off-by: Luc Berger-Vergiat <[email protected]>
Signed-off-by: Nathan Ellingwood <[email protected]>
* Update changelog for 4.5.00 Signed-off-by: Nathan Ellingwood <[email protected]> * Update CHANGELOG.md Grouping some work for identifier redefinition, atomic API update. Moving SVD from ODE to LAPACK Adding ODE PR --------- Signed-off-by: Nathan Ellingwood <[email protected]> Co-authored-by: Luc Berger <[email protected]>
* ODE: skipping autocatalytic test on SYCL For the time being it is unclear why this particular case leads to a runtime error from the SYCL API? Signed-off-by: Luc Berger-Vergiat <[email protected]> * ODE: formatting Signed-off-by: Luc Berger-Vergiat <[email protected]> * ODE: forgot to check if the SYCL space is enabled in Kokkos Signed-off-by: Luc Berger-Vergiat <[email protected]> --------- Signed-off-by: Luc Berger-Vergiat <[email protected]>
Part of Kokkos C++ Performance Portability Programming EcoSystem 4.5 Signed-off-by: Nathan Ellingwood <[email protected]>
Signed-off-by: Nathan Ellingwood <[email protected]>
ndellingwood
requested review from
brian-kelley,
cwpearson,
srajama1,
vqd8a and
lucbv
November 11, 2024 20:12
Trilinos snapshot PR: trilinos/Trilinos#13589 |
brian-kelley
approved these changes
Nov 11, 2024
vqd8a
approved these changes
Nov 11, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.