Skip to content

Latest commit

 

History

History
66 lines (60 loc) · 2.25 KB

README.md

File metadata and controls

66 lines (60 loc) · 2.25 KB

OpenCL(TM) Hot Functions

Overview

This is a simple LD_PRELOAD based tool that allows to collect OpenCL(TM) kernels within an application along with their total execution time and call count.

As a result, table like the following will be printed.

=== Device Timing Results: ===

Total Execution Time (ns): 370767821
Total Device Time for CPU (ns): 0
Total Device Time for GPU (ns): 174828332

== GPU Backend: ==

    Kernel,       Calls, SIMD,           Time (ns),  Time (%),        Average (ns),            Min (ns),            Max (ns)
      GEMM,           4,   32,           174828332,    100.00,            43707083,            43329166,            44306250

Supported OS

  • Linux
  • Windows

Prerequisites

Build and Run

Linux

Run the following commands to build the sample:

cd <pti>/samples/cl_hot_kernels
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make

Use this command line to run the tool:

./cl_hot_kernels <target_application>

One may use cl_gemm or dpc_gemm as target application:

./cl_hot_kernels ../../cl_gemm/build/cl_gemm
./cl_hot_kernels ../../dpc_gemm/build/dpc_gemm cpu

Windows

Use Microsoft* Visual Studio x64 command prompt to run the following commands and build the sample:

cd <pti>\samples\cl_hot_kernels
mkdir build
cd build
cmake -G "NMake Makefiles" -DCMAKE_BUILD_TYPE=Release -DCMAKE_LIBRARY_PATH=<opencl_icd_lib_path> ..
nmake

Use this command line to run the tool:

cl_hot_kernels.exe <target_application>

One may use cl_gemm or dpc_gemm as target application:

cl_hot_kernels.exe ..\..\cl_gemm\build\cl_gemm.exe
cl_hot_kernels.exe ..\..\dpc_gemm\build\Release\dpc_gemm.exe cpu