You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ABACUS v3.8.0
Atomic-orbital Based Ab-initio Computation at UStc
Website: http://abacus.ustc.edu.cn/
Documentation: https://abacus.deepmodeling.com/
Repository: https://github.com/abacusmodeling/abacus-develop
https://github.com/deepmodeling/abacus-develop
Commit: 5329628 (Thu Oct 10 22:45:13 2024 +0800)
Fri Oct 11 00:28:57 2024
Info: Local MPI proc number: 4,OpenMP thread number: 1,Total thread number: 4,Local thread limit: 32
[j12r4n15:21269:0:21269] Caught signal 11 (Segmentation fault: Sent by the kernel at address (nil))
BFD: Dwarf Error: found dwarf version '5', this reader only handles version 2, 3 and 4 information.
BFD: Dwarf Error: found dwarf version '5', this reader only handles version 2, 3 and 4 information.
BFD: Dwarf Error: found dwarf version '5', this reader only handles version 2, 3 and 4 information.
BFD: Dwarf Error: found dwarf version '5', this reader only handles version 2, 3 and 4 information.
BFD: Dwarf Error: found dwarf version '5', this reader only handles version 2, 3 and 4 information.
BFD: Dwarf Error: found dwarf version '5', this reader only handles version 2, 3 and 4 information.
BFD: Dwarf Error: found dwarf version '5', this reader only handles version 2, 3 and 4 information.
BFD: Dwarf Error: found dwarf version '5', this reader only handles version 2, 3 and 4 information.
BFD: Dwarf Error: found dwarf version '5', this reader only handles version 2, 3 and 4 information.
BFD: Dwarf Error: found dwarf version '5', this reader only handles version 2, 3 and 4 information.
BFD: Dwarf Error: found dwarf version '5', this reader only handles version 2, 3 and 4 information.
BFD: Dwarf Error: found dwarf version '5', this reader only handles version 2, 3 and 4 information.
BFD: Dwarf Error: found dwarf version '5', this reader only handles version 2, 3 and 4 information.
BFD: Dwarf Error: found dwarf version '5', this reader only handles version 2, 3 and 4 information.
BFD: Dwarf Error: found dwarf version '5', this reader only handles version 2, 3 and 4 information.
BFD: Dwarf Error: found dwarf version '5', this reader only handles version 2, 3 and 4 information.
==== backtrace (tid: 21269) ====
0 0x0000000000051213 ucs_debug_print_backtrace() /public/home/bujd/tmp/hpcx-v2.6.0-gcc-MLNX_OFED_LINUX-4.7-1.0.0.1-redhat7.6-x86_64/sources/ucx-1.8.0/src/ucs/debug/debug.c:625
1 0x000000000008559c __GI___libc_free() :0
2 0x0000000000c35f99 std::string::assign() ???:0
3 0x0000000000c360b9 std::string::assign() ???:0
4 0x0000000000c3783e std::string::assign() ???:0
5 0x0000000000bc8cec std::string::assign() ???:0
6 0x0000000000c20106 std::string::assign() ???:0
7 0x0000000000c8583a hipGetCmdName() ???:0
8 0x0000000000ca05ee hipGetDeviceCount() ???:0
9 0x0000000000453344 base_device::information::get_device_flag() ???:0
10 0x0000000000183f08 std::_Function_handler<void (ModuleIO::Input_Item const&, Parameter&), ModuleIO::ReadInput::item_system()::$_169>::_M_invoke() read_input_item_system.cpp:0
11 0x00000000001e97aa ModuleIO::ReadInput::read_txt_input() ???:0
12 0x00000000001e90ac ModuleIO::ReadInput::read_parameters() ???:0
13 0x0000000000250de5 Driver::reading() ???:0
14 0x0000000000250c3d Driver::init() ???:0
15 0x00000000000602d7 main() ???:0
16 0x00000000000223d5 __libc_start_main() ???:0
17 0x0000000000060160 _start() ???:0
=================================
[j12r4n15:21269] *** Process received signal ***
[j12r4n15:21269] Signal: Segmentation fault (11)
[j12r4n15:21269] Signal code: (-6)
[j12r4n15:21269] Failing at address: 0x62e000005315
[j12r4n15:21269] [ 0] /lib64/libpthread.so.0(+0xf5d0)[0x2b34ca07f5d0]
[j12r4n15:21269] [ 1] /lib64/libc.so.6(cfree+0x1c)[0x2b34d47bc59c]
[j12r4n15:21269] [ 2] /public/software/compiler/rocm/dtk-23.10/lib/libgalaxyhip.so.5(+0xc35f99)[0x2b34cc079f99]
[j12r4n15:21269] [ 3] /public/software/compiler/rocm/dtk-23.10/lib/libgalaxyhip.so.5(+0xc360b9)[0x2b34cc07a0b9]
[j12r4n15:21269] [ 4] /public/software/compiler/rocm/dtk-23.10/lib/libgalaxyhip.so.5(+0xc3783e)[0x2b34cc07b83e]
[j12r4n15:21269] [ 5] /public/software/compiler/rocm/dtk-23.10/lib/libgalaxyhip.so.5(+0xbc8cec)[0x2b34cc00ccec]
[j12r4n15:21269] [ 6] /public/software/compiler/rocm/dtk-23.10/lib/libgalaxyhip.so.5(+0xc20106)[0x2b34cc064106]
[j12r4n15:21269] [ 7] /public/software/compiler/rocm/dtk-23.10/lib/libgalaxyhip.so.5(+0xc8583a)[0x2b34cc0c983a]
[j12r4n15:21269] [ 8] /public/software/compiler/rocm/dtk-23.10/lib/libgalaxyhip.so.5(hipGetDeviceCount+0x17e)[0x2b34cc0e45ee]
[j12r4n15:21269] [ 9] /public/home/abacus/abacus-dcu/build-dcu/abacus_pw(+0x453344)[0x55b838642344]
[j12r4n15:21269] [10] /public/home/abacus/abacus-dcu/build-dcu/abacus_pw(+0x183f08)[0x55b838372f08]
[j12r4n15:21269] [11] /public/home/abacus/abacus-dcu/build-dcu/abacus_pw(+0x1e97aa)[0x55b8383d87aa]
[j12r4n15:21269] [12] /public/home/abacus/abacus-dcu/build-dcu/abacus_pw(+0x1e90ac)[0x55b8383d80ac]
[j12r4n15:21269] [13] /public/home/abacus/abacus-dcu/build-dcu/abacus_pw(+0x250de5)[0x55b83843fde5]
[j12r4n15:21269] [14] /public/home/abacus/abacus-dcu/build-dcu/abacus_pw(+0x250c3d)[0x55b83843fc3d]
[j12r4n15:21269] [15] /public/home/abacus/abacus-dcu/build-dcu/abacus_pw(+0x602d7)[0x55b83824f2d7]
[j12r4n15:21269] [16] /lib64/libc.so.6(__libc_start_main+0xf5)[0x2b34d47593d5]
[j12r4n15:21269] [17] /public/home/abacus/abacus-dcu/build-dcu/abacus_pw(+0x60160)[0x55b83824f160]
[j12r4n15:21269] *** End of error message ***
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 21269 on node j12r4n15 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
Additional Context
No response
Task list for Issue attackers (only for developers)
Understand the testing issue described by the developer.
Review the specific test case, expected and actual results, and any error messages.
Identify the root cause of the test failure or issue.
If a possible solution is suggested, evaluate its feasibility and effectiveness.
Implement a fix for the test failure or issue, or create a new test case if needed.
Verify that the fix resolves the testing issue and the test case passes.
Review and update any relevant documentation, such as test plans or user guides.
Ensure the testing issue is resolved and close the ticket.
Share any lessons learned or best practices with the team to prevent similar issues in the future.
The text was updated successfully, but these errors were encountered:
Describe the Testing Issue
The daily dcu test failed on example
005_16Na
at 20241011.https://app.bohrium.dp.tech/abacustest/?request=GET%3A%2Fapplications%2Fabacustest%2Fjobs%2Fsched-abacustest-dcu-cg-372d8a
The error message:
Additional Context
No response
Task list for Issue attackers (only for developers)
The text was updated successfully, but these errors were encountered: