Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance and debugging #21

Open
lucaparisi91 opened this issue Aug 15, 2024 · 0 comments
Open

Performance and debugging #21

lucaparisi91 opened this issue Aug 15, 2024 · 0 comments
Assignees
Labels

Comments

@lucaparisi91
Copy link
Collaborator

lucaparisi91 commented Aug 15, 2024

Base material at https://epcced.github.io/APT-CUDA/lectures/optimisation.html .

Possible list of topics below. Some of these might be in the practical information sheet instead of the practicals.

  • parallelism : enough tasks to fill the GPU
  • occupancy limiting factors ( stretch ? ): registers, shared memory, max number of blocks/threads per sm
  • memory coalescing: have contiguous threads access contiguous elements
  • branching: avoid different warps to take different paths in the code
  • profiling tools ( mention ) : nvidia nsights compute & systems, rocprof , scalasca
  • openmp specific tips: see Best practices for OpenMP #14 .
  • Environment variables for debugging ( CRAY_ACC_DEBUG etc.. )
@lucaparisi91 lucaparisi91 changed the title Performance Performance and debugging Aug 15, 2024
@lucaparisi91 lucaparisi91 self-assigned this Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant