Skip to content

Commit

Permalink
[SYCLomatic] Update CodePin doc
Browse files Browse the repository at this point in the history
Signed-off-by: Huang, Andy <[email protected]>
  • Loading branch information
AndyCHHuang committed May 9, 2024
1 parent 0566d11 commit 687364f
Showing 1 changed file with 92 additions and 47 deletions.
139 changes: 92 additions & 47 deletions docs/dev_guide/migration/debug-with-codepin.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,8 @@ the CUDA and SYCL programs to help identify the source of divergent runtime beha
Enable CodePin
--------------

Enable CodePin with the ``–enable-codepin`` option. If ``–out-root`` is specified,
the instrumented CUDA program will be put into a folder with a ``_debug`` postfix
beside the out-root folder. Otherwise, the instrumented CUDA program will be put
in the default folder ``dpct_output_debug``.
Enable CodePin with the ``–enable-codepin`` option. The instrumented CUDA program will be put
in the folder ``dpct_output_codepin_cuda``.

Example
-------
Expand Down Expand Up @@ -93,26 +91,26 @@ To debug the issue, the migrate the CUDA program with CodePin enabled:
dpct example.cu --enable-codepin
After migration, there will be two files: ``dpct_output/example.dp.cpp`` and ``dpct_output_debug/example.cu``.
After migration, there will be two files: ``dpct_output_codepin_sycl/example.dp.cpp`` and ``dpct_output_codepin_cuda/example.cu``.

.. code-block:: bash
workspace
├── example.cu
├── dpct_output
├── dpct_output_codepin_sycl
│ ├── example.dp.cpp
│ ├── generated_schema.hpp
│ └── MainSourceFiles.yaml
├── dpct_output_debug
├── dpct_output_codepin_cuda
│ ├── example.cu
│ └── generated_schema.hpp
``dpct_output/example.dp.cpp`` is the migrated and instrumented SYCL program:
``dpct_output_codepin_sycl/example.dp.cpp`` is the migrated and instrumented SYCL program:

.. code-block:: c++

//dpct_output/example.dp.cpp
//dpct_output_codepin_sycl/example.dp.cpp
#include <dpct/dpct.hpp>
#include <sycl/sycl.hpp>

Expand Down Expand Up @@ -180,11 +178,11 @@ After migration, there will be two files: ``dpct_output/example.dp.cpp`` and ``d
Result[3]: (1, 1, 1) <--- incorrect result
*/
``dpct_output_debug/example.cu`` is the instrumented CUDA program:
``dpct_output_codepin_cuda/example.cu`` is the instrumented CUDA program:

.. code-block:: c++

//dpct_output_debug/example.cu
//dpct_output_codepin_cuda/example.cu
#include "generated_schema.hpp"
#include <dpct/codepin/codepin.hpp>
#include <iostream>
Expand Down Expand Up @@ -241,7 +239,8 @@ After migration, there will be two files: ``dpct_output/example.dp.cpp`` and ``d
Result[3]: (2, 3, 4)
*/
After building and executing ``dpct_output/example.dp.cpp`` and ``dpct_output_debug/example.cu``, the following reports will be generated. Line number 13 shows the point of divergence.
After building and executing ``dpct_output_codepin_sycl/example.dp.cpp`` and ``dpct_output_debug/example.cu``,
the following reports will be generated.

.. list-table::
:widths: 50 50
Expand All @@ -252,46 +251,92 @@ After building and executing ``dpct_output/example.dp.cpp`` and ``dpct_output_de
* - .. code-block::
:linenos:

{
"example.cu:23:3:0": {
"d_a[0]": {
"m_Data": "01, 00, 00, 00, 02, 00, 00, 00, 03, 00, 00, 00"
},
"d_a[1]": {
"m_Data": "01, 00, 00, 00, 02, 00, 00, 00, 03, 00, 00, 00"
},
"d_a[2]": {
"m_Data": "01, 00, 00, 00, 02, 00, 00, 00, 03, 00, 00, 00"
},
"d_a[3]": {
"m_Data": "01, 00, 00, 00, 02, 00, 00, 00, 03, 00, 00, 00"
},
"d_result[0]": {
"m_Data": "00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00"
},
[
{
"ID": "example.cu:26:3:prolog",
"Free Device Memory": "16374562816",
"Total Device Memory": "16882663424",
"Elapse Time(ms)": "0",
"CheckPoint": {
"d_a": {
"Type": "Pointer",
"Data": [
{
"Type": "int3",
"Data": [
{
"x": {
"Type": "int",
"Data": [
1
]
}
},
{
"y": {
"Type": "int",
"Data": [
2
]
}
},
...

- .. code-block::
:linenos:

{
"example.cu:23:3(SYCL):0": {
"d_a[0]": {
"m_Data": "01, 00, 00, 00, 02, 00, 00, 00, 03, 00, 00, 00"
},
"d_a[1]": {
"m_Data": "01, 00, 00, 00, 02, 00, 00, 00, 03, 00, 00, 00"
},
"d_a[2]": {
"m_Data": "01, 00, 00, 00, 02, 00, 00, 00, 03, 00, 00, 00"
},
"d_a[3]": {
"m_Data": "00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00"
},
"d_result[0]": {
"m_Data": "00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00"
},
[
{
"ID": "/home/yyergg/workspace/simple_test/test.cu:84:3:prolog",
"Free Device Memory": "0",
"Total Device Memory": "31023112192",
"Elapse Time(ms)": "0",
"CheckPoint": {
"d_a2d": {
"Type": "Pointer",
"Data": [
{
"Type": "Point2D",
"Data": [
{
"x": {
"Type": "int",
"Data": [
0
]
}
},
{
"y": {
"Type": "int",
"Data": [
0
]
}
},
...

The report helps identify where the runtime behavior of the CUDA and the SYCL
programs start to diverge from one another.
programs start to diverge from one another.

Analyze the Data Checkpoints
-------
codepin-report.py is a tool consumes the data point files from both CUDA and SYCL and performs auto analysis of the data checkpoints.
codepin-report.py can identify the in consistent data value and report the stats data of the data checkpoints.

codepin-report.py consumes the data point files from both CUDA and SYCL with the following commandline.
``codepin-report.py [-h] --instrumented-cuda-log <file path> --instrumented-sycl-log <file path>``

Following is an example of the analysis report.

.. code-block::
CodePin Summary
Totally APIs count, 2
Consistently APIs count, 2
Most Time-consuming Kernel(CUDA), /home/yyergg/workspace/codepin_demo/example.cu:26:3:epilog, time:8.2316
Most Time-consuming Kernel(SYCL), /home/yyergg/workspace/codepin_demo/example.cu:26:3:epilog, time:10.2575
Peak Device Memory Used(CUDA), 508100608
Peak Device Memory Used(SYCL), 31023112192
CUDA Meta Data ID, SYCL Meta Data ID, Type, Detail
example.cu:26:3:prolog,example.cu:26:3:prolog,Data value,[WARNING: METADATA MISMATCH] The pair of prolog data example.cu:26:3:prolog are mismatched, and the corresponding pair of epilog data matches. This mismatch may be caused by the initialized memory or argument used in the API example.cu.

0 comments on commit 687364f

Please sign in to comment.