Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performing LES on devices #1580

Merged
merged 55 commits into from
Nov 6, 2024
Merged

Performing LES on devices #1580

merged 55 commits into from
Nov 6, 2024

Conversation

Shiyu-Sandy-Du
Copy link
Collaborator

@Shiyu-Sandy-Du Shiyu-Sandy-Du commented Nov 4, 2024

In this PR,

  • ax_helm_full_device is implemented for hip and cuda
  • sigma and vreman SGS models are implemented for hip and cuda as well
  • bug is fixed in the pnpn pressure res computation for full stress formulation on devices

Some other things to be discussed:

  • In pnpn_res_stress_device.F90, device math is called for multiple times. If we merge those kernels together as much as possible, how much can we benefit from it considering the solving time for the pressure equation is not short?
  • Implementation for 1D in ax_helm_full_device has not been implemented and thus autotune is not yet implemented as well. But do we really need it? I cannot see it in the vector version of ax_helm_device implementation. Is the 1D or kstep discussion for vector version of ax_helm happened before?

Shiyu-Sandy-Du and others added 30 commits October 1, 2024 17:08
…rnel together with other operations instead of using device_math
@njansson
Copy link
Collaborator

njansson commented Nov 4, 2024

Nice!

@njansson
Copy link
Collaborator

njansson commented Nov 4, 2024

To answer some of the questions,

  • In pnpn_res_stress_device.F90, device math is called for multiple times. If we merge those kernels together as much as possible, how much can we benefit from it considering the solving time for the pressure equation is not short?

There are gains to be made here, by fusing the e.g. the cluster of col3 and sub2. Remember that each call to a device math functions comes with a launch latency, which is not negligible.

  • Implementation for 1D in ax_helm_full_device has not been implemented and thus autotune is not yet implemented as well. But do we really need it? I cannot see it in the vector version of ax_helm_device implementation. Is the 1D or kstep discussion for vector version of ax_helm happened before?

Exactly, there's no 1d version of these, since 1d will run out of shared memory for most polynomial orders.

Copy link
Collaborator

@MartinKarp MartinKarp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff Shiyu! I just added two comments, feel free to take them to heart or not.

One thing I noted is the use of quite a lot of math functions such as pow, divisions and so on. I think we should just be aware this might be an issue if people run in single precision and one can perhaps use xreal in some places then. This is just a note for the future though.

src/les/bcknd/device/vreman_device.f90 Outdated Show resolved Hide resolved
src/les/bcknd/device/cuda/sigma_nut_kernel.h Show resolved Hide resolved
@Shiyu-Sandy-Du
Copy link
Collaborator Author

Great stuff Shiyu! I just added two comments, feel free to take them to heart or not.

One thing I noted is the use of quite a lot of math functions such as pow, divisions and so on. I think we should just be aware this might be an issue if people run in single precision and one can perhaps use xreal in some places then. This is just a note for the future though.

I changed the unnecessary pow into multiplications here. And I think we should note the sin, cos and sqrt here for future maybe.

@njansson njansson merged commit 1997286 into develop Nov 6, 2024
25 of 27 checks passed
@njansson njansson deleted the feature/sgs_gpu branch November 6, 2024 11:44
@njansson njansson linked an issue Nov 6, 2024 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AMD AMD GPUs and HIP enhancement New feature or request NVIDIA NVIDIA GPUs and CUDA
Projects
None yet
4 participants