Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault (core dumped) when running gpgpusim 4.0/4.2 with cutlass 1.3(maybe due to .loc instruction syntax error in PTX) #296

Open
eecspan opened this issue May 24, 2024 · 1 comment

Comments

@eecspan
Copy link

eecspan commented May 24, 2024

I'm trying to run gpgpusim with cutlass, I followed the documentation requirements, using Cutlass 1.3 and testing with examples from Cutlass 1.3. However, regardless of whether I use GPGPU-Sim 4.0, GPGPU-Sim 4.2, or GPGPU-Sim under Accel-Sim, all result in a segmentation fault and program crashes:
image
Upon examining the output of GPGPU-Sim, there is a syntax error when executing PTX, as shown below.

GPGPU-Sim PTX: __cudaRegisterFunction _ZN7cutlass4gemm16gemm_kernel_nolbINS0_12GemmMainloopINS0_10GemmTraitsINS0_11SgemmConfigINS_5ShapeILi8ELi128ELi128ELi1EEENS5_ILi8ELi8ELi8ELi1EEELi1ELi1ELb0EEENS0_16GlobalLoadStreamILNS_11GemmOperand4KindE0ENS0_20GemmGlobalIteratorAbINS0_20GemmGlobalTileTraitsILSB_0ELNS_12MatrixLayout4KindE1EKfNS5_ILi1ELi8ELi128ELi1EEENS5_ILi1ELi8ELi32ELi1EEELi1EEEiEENS_17TileStoreIteratorINS0_27GemmSharedStoreTileAbTraitsIfNS5_ILi2ELi8ELi128ELi1EEESI_Li1EEEfLNS_15IteratorAdvance4KindE1ELNS_11MemorySpace4KindE1EifLNS_19FragmentElementType4KindE0ENS5_ILi0ELi0ELi0ELi0EEEEENS_4CopyINS_8FragmentIfLi4ELm16EEEEEEENS9_ILSB_1ENSC_INSD_ILSB_1ELSF_1ESG_NS5_ILi1ELi128ELi8ELi1EEENS5_ILi1ELi32ELi8ELi1EEELi1EEEiEENSL_INS0_35GemmSharedStoreWithSkewTileAbTraitsIfSN_S13_Li1ELi4EEEfLSQ_1ELSS_1EifLSU_0ESV_EES10_EENS0_16SharedLoadStreamINS_16TileLoadIteratorINS0_25GemmSharedLoadTileATraitsISG_S6_NS5_ILi1ELi4ELi2ELi1EEENS5_ILi1ELi4ELi8ELi1EEENS5_ILi1ELi1ELi1ELi1EEELi2ELi4ELi0EEEfLSQ_1ELSS_1EifLSU_0ESV_EENSX_INSY_IfLi8ELm16EEEEEEENS1A_INS1B_INS0_25GemmSharedLoadTileBTraitsISG_S6_S1D_S1E_S1F_Li2ELi4ELi4EEEfLSQ_1ELSS_1EifLSU_0ESV_EES1J_EENS0_12GemmEpilogueINS0_28SimplifiedGemmEpilogueTraitsIS8_NS0_13LinearScalingIfNS0_19FragmentMultiplyAddIffLb1EEEEEiNS0_24GemmEpilogueTraitsHelperIS8_S1U_iEEEEEENS0_20IdentityBlockSwizzleEiNS0_17ClearAccumulatorsIfLi1EEEEEEEEEvNT_6ParamsE : hostFun 0x0x55fc5d804630, fat_cubin_handle = 1
GPGPU-Sim PTX: Parsing basic_gemm.sm_75.ptx
GPGPU-Sim PTX: allocating shared region for "_ZN7cutlass4gemm21GemmSharedStorageBaseE" from 0x0 to 0x0 (shared memory space)
GPGPU-Sim PTX: instruction assembly for function '_Z23InitializeMatrix_kernelPfiiii'...   done.
GPGPU-Sim PTX: Warning -- ignoring pragma 'nounroll'
GPGPU-Sim PTX: instruction assembly for function '_Z20ReferenceGemm_kerneliiifPKfiS0_ifPfi'...   done.
basic_gemm.sm_75.ptx:233 Syntax error:

   .loc	3 170 9, function_name $L__info_string0, inlined_at 2 81 3
       	       ^

GPGPU-Sim PTX: finished parsing EMBEDDED .ptx file basic_gemm.sm_75.ptx
GPGPU-Sim PTX: loading globals with explicit initializers... 
GPGPU-Sim PTX: finished loading globals (0 bytes total).
GPGPU-Sim PTX: loading constants with explicit initializers...  done.
GPGPU-Sim PTX: Loading PTXInfo from basic_gemm.sm_75.ptx
GPGPU-Sim PTX: Kernel '_ZN7cutlass4gemm16gemm_kernel_nolbINS0_12GemmMainloopINS0_10GemmTraitsINS0_11SgemmConfigINS_5ShapeILi8ELi128ELi128ELi1EEENS5_ILi8ELi8ELi8ELi1EEELi1ELi1ELb0EEENS0_16GlobalLoadStreamILNS_11GemmOperand4KindE0ENS0_20GemmGlobalIteratorAbINS0_20GemmGlobalTileTraitsILSB_0ELNS_12MatrixLayout4KindE1EKfNS5_ILi1ELi8ELi128ELi1EEENS5_ILi1ELi8ELi32ELi1EEELi1EEEiEENS_17TileStoreIteratorINS0_27GemmSharedStoreTileAbTraitsIfNS5_ILi2ELi8ELi128ELi1EEESI_Li1EEEfLNS_15IteratorAdvance4KindE1ELNS_11MemorySpace4KindE1EifLNS_19FragmentElementType4KindE0ENS5_ILi0ELi0ELi0ELi0EEEEENS_4CopyINS_8FragmentIfLi4ELm16EEEEEEENS9_ILSB_1ENSC_INSD_ILSB_1ELSF_1ESG_NS5_ILi1ELi128ELi8ELi1EEENS5_ILi1ELi32ELi8ELi1EEELi1EEEiEENSL_INS0_35GemmSharedStoreWithSkewTileAbTraitsIfSN_S13_Li1ELi4EEEfLSQ_1ELSS_1EifLSU_0ESV_EES10_EENS0_16SharedLoadStreamINS_16TileLoadIteratorINS0_25GemmSharedLoadTileATraitsISG_S6_NS5_ILi1ELi4ELi2ELi1EEENS5_ILi1ELi4ELi8ELi1EEENS5_ILi1ELi1ELi1ELi1EEELi2ELi4ELi0EEEfLSQ_1ELSS_1EifLSU_0ESV_EENSX_INSY_IfLi8ELm16EEEEEEENS1A_INS1B_INS0_25GemmSharedLoadTileBTraitsISG_S6_S1D_S1E_S1F_Li2ELi4ELi4EEEfLSQ_1ELSS_1EifLSU_0ESV_EES1J_EENS0_12GemmEpilogueINS0_28SimplifiedGemmEpilogueTraitsIS8_NS0_13LinearScalingIfNS0_19FragmentMultiplyAddIffLb1EEEEEiNS0_24GemmEpilogueTraitsHelperIS8_S1U_iEEEEEENS0_20IdentityBlockSwizzleEiNS0_17ClearAccumulatorsIfLi1EEEEEEEEEvNT_6ParamsE' : regs=124, lmem=0, smem=0, cmem=872
GPGPU-Sim PTX: Kernel '_Z20ReferenceGemm_kerneliiifPKfiS0_ifPfi' : regs=52, lmem=0, smem=0, cmem=412
GPGPU-Sim PTX: Kernel '_Z23InitializeMatrix_kernelPfiiii' : regs=8, lmem=0, smem=0, cmem=376
GPGPU-Sim PTX: __cudaRegisterFunction _Z20ReferenceGemm_kerneliiifPKfiS0_ifPfi : hostFun 0x0x55fc5d8027a0, fat_cubin_handle = 1
GPGPU-Sim PTX: __cudaRegisterFunction _Z23InitializeMatrix_kernelPfiiii : hostFun 0x0x55fc5d802990, fat_cubin_handle = 1
GPGPU-Sim PTX: Setting up arguments for 8 bytes starting at 0x7fff06ec4c10..
GPGPU-Sim PTX: Setting up arguments for 4 bytes starting at 0x7fff06ec4bf8..
GPGPU-Sim PTX: Setting up arguments for 4 bytes starting at 0x7fff06ec4bfc..
GPGPU-Sim PTX: Setting up arguments for 4 bytes starting at 0x7fff06ec4c00..
GPGPU-Sim PTX: Setting up arguments for 4 bytes starting at 0x7fff06ec4c04..

The error message indicates that the error occurred during the execution of cudaLaunch for the address 0x55fc5d804630. This corresponds to the function hostFun at address 0x55fc5d804630 when it was being registered with __cudaRegisterFunction. The occurrence of a syntax error at this point leads me to suspect that this error caused the cudaLaunch crash.

The relevant PTX code is as follows:

.loc	3 170 9, function_name $L__info_string0, inlined_at 2 81 3
.loc	4 85 18, function_name $L__info_string1, inlined_at 3 170 9
.loc	4 70 86, function_name $L__info_string2, inlined_at 4 85 18

The first ptx code executes correctly, while the second code encounters a syntax error.

Therefore, is it because GPGPU-Sim does not support the second syntax of loc instruction as shown in the figure?

9a49fa6ab5fb15c9f39e602ca4d833c

Here is the OS version:
Ubuntu 18.04.6 LTS
The cuda toolkit version:
Cuda compilation tools, release 11.7, V11.7.99
The gcc version:
gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)

Looking forward to someone providing assistance. Thanks a lot~

@eecspan
Copy link
Author

eecspan commented May 27, 2024

I finally solved this problem. According to the method described at https://github.com/sxzhang1993/Run-cutlass-with-gpgpu-sim, it uses CUDA 9.1. In CUDA 9.1, the generated .loc instructions only have the first syntax, not the second syntax. However, CUDA 9.1 does not support the Turing architecture. If you want to use the Turing architecture, you can use CUDA 11, but the aforementioned problem will occur. I found that .loc is related to debugging. In cutlass_bench, the -lineinfo option is added during compilation. If we omit this option, no .loc instructions will be generated. We can comment out the -lineinfo option in cutlass_bench/CMakeLists.txt, and the final generated PTX will not contain .loc instructions. However, using GPGPU-Sim 4.0 will cause the error mentioned in #247. We need to use GPGPU-Sim 4.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant