You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to run gpgpusim with cutlass, I followed the documentation requirements, using Cutlass 1.3 and testing with examples from Cutlass 1.3. However, regardless of whether I use GPGPU-Sim 4.0, GPGPU-Sim 4.2, or GPGPU-Sim under Accel-Sim, all result in a segmentation fault and program crashes:
Upon examining the output of GPGPU-Sim, there is a syntax error when executing PTX, as shown below.
GPGPU-Sim PTX: __cudaRegisterFunction _ZN7cutlass4gemm16gemm_kernel_nolbINS0_12GemmMainloopINS0_10GemmTraitsINS0_11SgemmConfigINS_5ShapeILi8ELi128ELi128ELi1EEENS5_ILi8ELi8ELi8ELi1EEELi1ELi1ELb0EEENS0_16GlobalLoadStreamILNS_11GemmOperand4KindE0ENS0_20GemmGlobalIteratorAbINS0_20GemmGlobalTileTraitsILSB_0ELNS_12MatrixLayout4KindE1EKfNS5_ILi1ELi8ELi128ELi1EEENS5_ILi1ELi8ELi32ELi1EEELi1EEEiEENS_17TileStoreIteratorINS0_27GemmSharedStoreTileAbTraitsIfNS5_ILi2ELi8ELi128ELi1EEESI_Li1EEEfLNS_15IteratorAdvance4KindE1ELNS_11MemorySpace4KindE1EifLNS_19FragmentElementType4KindE0ENS5_ILi0ELi0ELi0ELi0EEEEENS_4CopyINS_8FragmentIfLi4ELm16EEEEEEENS9_ILSB_1ENSC_INSD_ILSB_1ELSF_1ESG_NS5_ILi1ELi128ELi8ELi1EEENS5_ILi1ELi32ELi8ELi1EEELi1EEEiEENSL_INS0_35GemmSharedStoreWithSkewTileAbTraitsIfSN_S13_Li1ELi4EEEfLSQ_1ELSS_1EifLSU_0ESV_EES10_EENS0_16SharedLoadStreamINS_16TileLoadIteratorINS0_25GemmSharedLoadTileATraitsISG_S6_NS5_ILi1ELi4ELi2ELi1EEENS5_ILi1ELi4ELi8ELi1EEENS5_ILi1ELi1ELi1ELi1EEELi2ELi4ELi0EEEfLSQ_1ELSS_1EifLSU_0ESV_EENSX_INSY_IfLi8ELm16EEEEEEENS1A_INS1B_INS0_25GemmSharedLoadTileBTraitsISG_S6_S1D_S1E_S1F_Li2ELi4ELi4EEEfLSQ_1ELSS_1EifLSU_0ESV_EES1J_EENS0_12GemmEpilogueINS0_28SimplifiedGemmEpilogueTraitsIS8_NS0_13LinearScalingIfNS0_19FragmentMultiplyAddIffLb1EEEEEiNS0_24GemmEpilogueTraitsHelperIS8_S1U_iEEEEEENS0_20IdentityBlockSwizzleEiNS0_17ClearAccumulatorsIfLi1EEEEEEEEEvNT_6ParamsE : hostFun 0x0x55fc5d804630, fat_cubin_handle = 1
GPGPU-Sim PTX: Parsing basic_gemm.sm_75.ptx
GPGPU-Sim PTX: allocating shared region for "_ZN7cutlass4gemm21GemmSharedStorageBaseE" from 0x0 to 0x0 (shared memory space)
GPGPU-Sim PTX: instruction assembly for function '_Z23InitializeMatrix_kernelPfiiii'... done.
GPGPU-Sim PTX: Warning -- ignoring pragma 'nounroll'
GPGPU-Sim PTX: instruction assembly for function '_Z20ReferenceGemm_kerneliiifPKfiS0_ifPfi'... done.
basic_gemm.sm_75.ptx:233 Syntax error:
.loc 3 170 9, function_name $L__info_string0, inlined_at 2 81 3
^
GPGPU-Sim PTX: finished parsing EMBEDDED .ptx file basic_gemm.sm_75.ptx
GPGPU-Sim PTX: loading globals with explicit initializers...
GPGPU-Sim PTX: finished loading globals (0 bytes total).
GPGPU-Sim PTX: loading constants with explicit initializers... done.
GPGPU-Sim PTX: Loading PTXInfo from basic_gemm.sm_75.ptx
GPGPU-Sim PTX: Kernel '_ZN7cutlass4gemm16gemm_kernel_nolbINS0_12GemmMainloopINS0_10GemmTraitsINS0_11SgemmConfigINS_5ShapeILi8ELi128ELi128ELi1EEENS5_ILi8ELi8ELi8ELi1EEELi1ELi1ELb0EEENS0_16GlobalLoadStreamILNS_11GemmOperand4KindE0ENS0_20GemmGlobalIteratorAbINS0_20GemmGlobalTileTraitsILSB_0ELNS_12MatrixLayout4KindE1EKfNS5_ILi1ELi8ELi128ELi1EEENS5_ILi1ELi8ELi32ELi1EEELi1EEEiEENS_17TileStoreIteratorINS0_27GemmSharedStoreTileAbTraitsIfNS5_ILi2ELi8ELi128ELi1EEESI_Li1EEEfLNS_15IteratorAdvance4KindE1ELNS_11MemorySpace4KindE1EifLNS_19FragmentElementType4KindE0ENS5_ILi0ELi0ELi0ELi0EEEEENS_4CopyINS_8FragmentIfLi4ELm16EEEEEEENS9_ILSB_1ENSC_INSD_ILSB_1ELSF_1ESG_NS5_ILi1ELi128ELi8ELi1EEENS5_ILi1ELi32ELi8ELi1EEELi1EEEiEENSL_INS0_35GemmSharedStoreWithSkewTileAbTraitsIfSN_S13_Li1ELi4EEEfLSQ_1ELSS_1EifLSU_0ESV_EES10_EENS0_16SharedLoadStreamINS_16TileLoadIteratorINS0_25GemmSharedLoadTileATraitsISG_S6_NS5_ILi1ELi4ELi2ELi1EEENS5_ILi1ELi4ELi8ELi1EEENS5_ILi1ELi1ELi1ELi1EEELi2ELi4ELi0EEEfLSQ_1ELSS_1EifLSU_0ESV_EENSX_INSY_IfLi8ELm16EEEEEEENS1A_INS1B_INS0_25GemmSharedLoadTileBTraitsISG_S6_S1D_S1E_S1F_Li2ELi4ELi4EEEfLSQ_1ELSS_1EifLSU_0ESV_EES1J_EENS0_12GemmEpilogueINS0_28SimplifiedGemmEpilogueTraitsIS8_NS0_13LinearScalingIfNS0_19FragmentMultiplyAddIffLb1EEEEEiNS0_24GemmEpilogueTraitsHelperIS8_S1U_iEEEEEENS0_20IdentityBlockSwizzleEiNS0_17ClearAccumulatorsIfLi1EEEEEEEEEvNT_6ParamsE' : regs=124, lmem=0, smem=0, cmem=872
GPGPU-Sim PTX: Kernel '_Z20ReferenceGemm_kerneliiifPKfiS0_ifPfi' : regs=52, lmem=0, smem=0, cmem=412
GPGPU-Sim PTX: Kernel '_Z23InitializeMatrix_kernelPfiiii' : regs=8, lmem=0, smem=0, cmem=376
GPGPU-Sim PTX: __cudaRegisterFunction _Z20ReferenceGemm_kerneliiifPKfiS0_ifPfi : hostFun 0x0x55fc5d8027a0, fat_cubin_handle = 1
GPGPU-Sim PTX: __cudaRegisterFunction _Z23InitializeMatrix_kernelPfiiii : hostFun 0x0x55fc5d802990, fat_cubin_handle = 1
GPGPU-Sim PTX: Setting up arguments for 8 bytes starting at 0x7fff06ec4c10..
GPGPU-Sim PTX: Setting up arguments for 4 bytes starting at 0x7fff06ec4bf8..
GPGPU-Sim PTX: Setting up arguments for 4 bytes starting at 0x7fff06ec4bfc..
GPGPU-Sim PTX: Setting up arguments for 4 bytes starting at 0x7fff06ec4c00..
GPGPU-Sim PTX: Setting up arguments for 4 bytes starting at 0x7fff06ec4c04..
The error message indicates that the error occurred during the execution of cudaLaunch for the address 0x55fc5d804630. This corresponds to the function hostFun at address 0x55fc5d804630 when it was being registered with __cudaRegisterFunction. The occurrence of a syntax error at this point leads me to suspect that this error caused the cudaLaunch crash.
The first ptx code executes correctly, while the second code encounters a syntax error.
Therefore, is it because GPGPU-Sim does not support the second syntax of loc instruction as shown in the figure?
Here is the OS version: Ubuntu 18.04.6 LTS
The cuda toolkit version: Cuda compilation tools, release 11.7, V11.7.99
The gcc version: gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)
Looking forward to someone providing assistance. Thanks a lot~
The text was updated successfully, but these errors were encountered:
I finally solved this problem. According to the method described at https://github.com/sxzhang1993/Run-cutlass-with-gpgpu-sim, it uses CUDA 9.1. In CUDA 9.1, the generated .loc instructions only have the first syntax, not the second syntax. However, CUDA 9.1 does not support the Turing architecture. If you want to use the Turing architecture, you can use CUDA 11, but the aforementioned problem will occur. I found that .loc is related to debugging. In cutlass_bench, the -lineinfo option is added during compilation. If we omit this option, no .loc instructions will be generated. We can comment out the -lineinfo option in cutlass_bench/CMakeLists.txt, and the final generated PTX will not contain .loc instructions. However, using GPGPU-Sim 4.0 will cause the error mentioned in #247. We need to use GPGPU-Sim 4.2.
I'm trying to run gpgpusim with cutlass, I followed the documentation requirements, using Cutlass 1.3 and testing with examples from Cutlass 1.3. However, regardless of whether I use GPGPU-Sim 4.0, GPGPU-Sim 4.2, or GPGPU-Sim under Accel-Sim, all result in a segmentation fault and program crashes:
Upon examining the output of GPGPU-Sim, there is a syntax error when executing PTX, as shown below.
The error message indicates that the error occurred during the execution of cudaLaunch for the address 0x55fc5d804630. This corresponds to the function hostFun at address 0x55fc5d804630 when it was being registered with __cudaRegisterFunction. The occurrence of a syntax error at this point leads me to suspect that this error caused the cudaLaunch crash.
The relevant PTX code is as follows:
The first ptx code executes correctly, while the second code encounters a syntax error.
Therefore, is it because GPGPU-Sim does not support the second syntax of loc instruction as shown in the figure?
Here is the OS version:
Ubuntu 18.04.6 LTS
The cuda toolkit version:
Cuda compilation tools, release 11.7, V11.7.99
The gcc version:
gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)
Looking forward to someone providing assistance. Thanks a lot~
The text was updated successfully, but these errors were encountered: