Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NCU Reader Support for RAJA_CUDA and Lambda_CUDA #201

Open
wants to merge 19 commits into
base: develop
Choose a base branch
from

Conversation

michaelmckinsey1
Copy link
Collaborator

@michaelmckinsey1 michaelmckinsey1 commented Jul 29, 2024

tldr:

  • Support Lambda_CUDA and RAJA_CUDA variants by using demangled kernel names.
  • Add debug flag to see detailed information about kernel matches, so we can preliminarily investigate future issues without editing the source code.
  • Support cub kernels via string similarity to compare function signatures.
  • Adds unit testing by refactoring the matching functions

Description

Enables support for reading NCU report profiles for RAJA_CUDA and Lambda_CUDA variants and cub kernels by using the demangled action name.

The current Thicket NCU reader matches nodes in a Caliper cuda_activity_profile (CAP) and NCU report file by checking if an action in the report has the name action.name(_ncu_report.IAction_NameBase_FUNCTION), which for Base_CUDA is the name of the kernel (e.g. daxpy, energy1, or energy2). This name can be found in the CAP node name kernel_name in node.frame["name"].

For RAJA_CUDA and Lambda_CUDA, the above assumption does not hold, as the values for action.name(_ncu_report.IAction_NameBase_FUNCTION) will not be the kernel names. However, the kernel names are still embedded in the action.name(_ncu_report.IAction_NameBase_DEMANGLED) demangled action name. This PR parses the demangled name to match the nodes in the CAP, which also works for Base_CUDA profiles.

For cub kernels, there may be kernels with the same name, but different function signatures. For example, matching the ncu kernel void cub::DeviceRadixSortDownsweepKernel<cub::DeviceRadixSortPolicy<double, double, int>::Policy700, false, false, double, double, int>(const T4 *, T4 *, const T5 *, T5 *, T6 *, T6, int, int, cub::GridEvenShare<T6>) to the first DeviceRadixSortDownsweepKernel in the following calltree:

nan RAJAPerf
└─ nan Algorithm
   ├─ nan Algorithm_SORT
   │  ├─ nan cudaLaunchKernel
   │  │  ├─ 1016096.000 void cub::DeviceRadixSortDownsweepKernel<cub::DeviceRadixSortPolicy<double, cub::NullType, int>::Policy700, **false, false**, double, cub::NullType, int>(double const*, double*, cub::NullType const*, cub::NullType*, int*, int, int, int, cub::GridEvenShare<int>)
   │  │  ├─ 1399520.000 void cub::DeviceRadixSortDownsweepKernel<cub::DeviceRadixSortPolicy<double, cub::NullType, int>::Policy700, **true, false**, double, cub::NullType, int>(double const*, double*, cub::NullType const*, cub::NullType*, int*, int, int, int, cub::GridEvenShare<int>)

We use similarity matching using the standard library difflib SequenceMatcher to match the two, after first narrowing the search down to the Algorithm_SORT part of the calltree.

NCU kernel support by variant:

This PR (#201)

Base_CUDA Lambda_CUDA RAJA_CUDA
rajaperf kernels
cub kernels
kernels with multiple instances

Develop

Base_CUDA Lambda_CUDA RAJA_CUDA
rajaperf kernels x x
cub kernels x x x
kernels with multiple instances x x

@michaelmckinsey1 michaelmckinsey1 changed the title Fix ncu rajacuda Support for RAJA_CUDA Jul 29, 2024
@michaelmckinsey1 michaelmckinsey1 changed the title Support for RAJA_CUDA NCU Reader Support for RAJA_CUDA Jul 29, 2024
@michaelmckinsey1 michaelmckinsey1 self-assigned this Jul 29, 2024
@michaelmckinsey1 michaelmckinsey1 added area-external Issues and PRs related to external libraries used by Thicket priority-urgent Urgent priority issues and PRs status-ready-for-review This PR is ready to be reviewed by assigned reviewers type-bug Identifies bugs in issues and identifies bug fixes in PRs labels Jul 29, 2024
@michaelmckinsey1 michaelmckinsey1 changed the title NCU Reader Support for RAJA_CUDA NCU Reader Support for RAJA_CUDA and Lambda_CUDA Jul 29, 2024
@ilumsden ilumsden added status-work-in-progress PR is currently being worked on and removed status-ready-for-review This PR is ready to be reviewed by assigned reviewers labels Aug 12, 2024
@ilumsden
Copy link
Collaborator

Changing this PR to "work in progress" because there are complications due to mismatches in demangled kernel names from Caliper and NCU.

@michaelmckinsey1
Copy link
Collaborator Author

michaelmckinsey1 commented Aug 26, 2024

This PR supersedes https://github.com/LLNL/thicket/tree/rajaperf-paper branch used for the rajaperf paper.

@michaelmckinsey1 michaelmckinsey1 removed the status-work-in-progress PR is currently being worked on label Oct 4, 2024
@michaelmckinsey1 michaelmckinsey1 added the status-ready-for-review This PR is ready to be reviewed by assigned reviewers label Oct 4, 2024
@michaelmckinsey1
Copy link
Collaborator Author

Changing this PR to "work in progress" because there are complications due to mismatches in demangled kernel names from Caliper and NCU.

Addressed in new changes.

Copy link
Collaborator

@dyokelson dyokelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a pytest test for each of the cases in your table so we can make sure each is supported (Base_CUDA, Lambda_CUDA, RAJA_CUDA X rajaperf, cub, multiple

thicket/ncu.py Outdated Show resolved Hide resolved
thicket/ncu.py Outdated Show resolved Hide resolved
@michaelmckinsey1 michaelmckinsey1 added priority-normal Normal priority issues and PRs and removed priority-urgent Urgent priority issues and PRs labels Oct 11, 2024
@dyokelson dyokelson added this to the 2024.3.0 milestone Oct 18, 2024
Michael Richard Mckinsey added 2 commits October 23, 2024 15:17
@slabasan
Copy link
Collaborator

This PR supersedes https://github.com/LLNL/thicket/tree/rajaperf-paper branch used for the rajaperf paper.

Do you want to remove the rajaperf-paper branch on this repo?

@michaelmckinsey1
Copy link
Collaborator Author

This PR supersedes https://github.com/LLNL/thicket/tree/rajaperf-paper branch used for the rajaperf paper.

Do you want to remove the rajaperf-paper branch on this repo?

Done

Copy link
Collaborator

@dyokelson dyokelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, just the question about disabling tqdm, otherwise will approve

@@ -113,8 +237,12 @@ def _read_ncu(thicket, ncu_report_mapping):
pbar = tqdm(range)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can they disable tqdm?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-external Issues and PRs related to external libraries used by Thicket priority-normal Normal priority issues and PRs status-ready-for-review This PR is ready to be reviewed by assigned reviewers type-bug Identifies bugs in issues and identifies bug fixes in PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants