This is a simple LD_PRELOAD based tool that allows to collect OpenCL(TM) kernels within an application along with their total execution time and call count.
As a result, table like the following will be printed.
=== Device Timing Results: ===
Total Execution Time (ns): 370767821
Total Device Time for CPU (ns): 0
Total Device Time for GPU (ns): 174828332
== GPU Backend: ==
Kernel, Calls, SIMD, Time (ns), Time (%), Average (ns), Min (ns), Max (ns)
GEMM, 4, 32, 174828332, 100.00, 43707083, 43329166, 44306250
- Linux
- Windows
- CMake (version 3.12 and above)
- Git (version 1.8 and above)
- Python (version 2.7 and above)
- OpenCL(TM) ICD Loader
- Intel(R) Graphics Compute Runtime for oneAPI Level Zero and OpenCL(TM) Driver to run on GPU
- Intel(R) Xeon(R) Processor / Intel(R) Core(TM) Processor (CPU) Runtimes to run on CPU
Run the following commands to build the sample:
cd <pti>/samples/cl_hot_kernels
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make
Use this command line to run the tool:
./cl_hot_kernels <target_application>
One may use cl_gemm or dpc_gemm as target application:
./cl_hot_kernels ../../cl_gemm/build/cl_gemm
./cl_hot_kernels ../../dpc_gemm/build/dpc_gemm cpu
Use Microsoft* Visual Studio x64 command prompt to run the following commands and build the sample:
cd <pti>\samples\cl_hot_kernels
mkdir build
cd build
cmake -G "NMake Makefiles" -DCMAKE_BUILD_TYPE=Release -DCMAKE_LIBRARY_PATH=<opencl_icd_lib_path> ..
nmake
Use this command line to run the tool:
cl_hot_kernels.exe <target_application>
One may use cl_gemm or dpc_gemm as target application:
cl_hot_kernels.exe ..\..\cl_gemm\build\cl_gemm.exe
cl_hot_kernels.exe ..\..\dpc_gemm\build\Release\dpc_gemm.exe cpu