Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Update and sandardize implementation of packages, in sync with spack update #593

Open
wants to merge 33 commits into
base: master
Choose a base branch
from

Conversation

adrienbernede
Copy link
Member

@adrienbernede adrienbernede commented Sep 25, 2024

Summary

Supersedes #588

This PR :

  • migrates CARE and Caliper to CachedCMakePackage, reducing the gap with implementations found in llnl/radiuss-spack-configs.
  • improves coherency in version constraints across RADIUSS packages.
  • updates Spack.

⚠️ TODO Before Merge:

@adrienbernede
Copy link
Member Author

@daboehme It appears that recent changes in Caliper main branch fixed the issues we were seeing with cce compilers.
Now remains an issue with rocm 6.2.0 I would like you to look at:
https://lc.llnl.gov/gitlab/radiuss/Caliper/-/jobs/2148980
Thank you.

@adrienbernede
Copy link
Member Author

@daboehme any idea what could be causing this ?

5/5 Test #5: CI_app_tests .....................***Failed   45.26 sec
..................................Efree(): double free detected in tcache 2
Efree(): double free detected in tcache 2
E......................cali-query: Error reading stdin: Unknown/invalid record: __rec=n
E............EEEE....E.....E...

@daboehme
Copy link
Member

@daboehme any idea what could be causing this ?

5/5 Test #5: CI_app_tests .....................***Failed   45.26 sec
..................................Efree(): double free detected in tcache 2
Efree(): double free detected in tcache 2
E......................cali-query: Error reading stdin: Unknown/invalid record: __rec=n
E............EEEE....E.....E...

Hi @adrienbernede, where did you see this happening? Can't find it in any of the recent CI results.

@adrienbernede
Copy link
Member Author

@daboehme any idea what could be causing this ?

5/5 Test #5: CI_app_tests .....................***Failed   45.26 sec
..................................Efree(): double free detected in tcache 2
Efree(): double free detected in tcache 2
E......................cali-query: Error reading stdin: Unknown/invalid record: __rec=n
E............EEEE....E.....E...

Hi @adrienbernede, where did you see this happening? Can't find it in any of the recent CI results.

@daboehme I think you just missed it, it right after the test summary in the logs of the only failing job:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~ 2024-10-08 10:15:13-07:00 ~ Testing Caliper
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cannot find file: /dev/shm/tioga14-2161536/[email protected]/DartConfiguration.tcl
   Site: 
   Build name: (empty)
Create new tag: 20241008-1715 - Experimental
Cannot find file: /dev/shm/tioga14-2161536/[email protected]/DartConfiguration.tcl
Test project /dev/shm/tioga14-2161536/[email protected]
    Start 1: test-caliper-common
1/5 Test #1: test-caliper-common ..............   Passed    0.01 sec
    Start 2: test-caliper-reader
2/5 Test #2: test-caliper-reader ..............   Passed    0.01 sec
    Start 3: test-adiak-services
3/5 Test #3: test-adiak-services ..............   Passed    1.13 sec
    Start 4: test-caliper
4/5 Test #4: test-caliper .....................   Passed    0.75 sec
    Start 5: CI_app_tests
5/5 Test #5: CI_app_tests .....................***Failed   45.26 sec
..................................Efree(): double free detected in tcache 2
Efree(): double free detected in tcache 2
E......................cali-query: Error reading stdin: Unknown/invalid record: __rec=n
E............EEEE....E.....E...

@daboehme
Copy link
Member

Hi @adrienbernede, thanks I found it. I tried building Caliper with the same compiler and libraries, but I can't reproduce these issues. All tests are running fine for me. It also doesn't seem like the CI is running this particular configuration lately. Can we simply retry running this config? Maybe it was a HW issue or something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants