Performing LES on devices #1580

Shiyu-Sandy-Du · 2024-11-04T12:13:15Z

In this PR,

ax_helm_full_device is implemented for hip and cuda
sigma and vreman SGS models are implemented for hip and cuda as well
bug is fixed in the pnpn pressure res computation for full stress formulation on devices

Some other things to be discussed:

In pnpn_res_stress_device.F90, device math is called for multiple times. If we merge those kernels together as much as possible, how much can we benefit from it considering the solving time for the pressure equation is not short?
Implementation for 1D in ax_helm_full_device has not been implemented and thus autotune is not yet implemented as well. But do we really need it? I cannot see it in the vector version of ax_helm_device implementation. Is the 1D or kstep discussion for vector version of ax_helm happened before?

… on cuda and hip, not opencl

Merge develop

Merge latest develop

…rnel together with other operations instead of using device_math

…helm_vector for reference

pipe from private fork

… feature/sgs_gpu

njansson · 2024-11-04T17:48:53Z

Nice!

src/les/bcknd/device/hip/hip_sigma_nut.f90

src/les/bcknd/device/hip/hip_vreman_nut.f90

njansson · 2024-11-04T17:56:33Z

To answer some of the questions,

In pnpn_res_stress_device.F90, device math is called for multiple times. If we merge those kernels together as much as possible, how much can we benefit from it considering the solving time for the pressure equation is not short?

There are gains to be made here, by fusing the e.g. the cluster of col3 and sub2. Remember that each call to a device math functions comes with a launch latency, which is not negligible.

Implementation for 1D in ax_helm_full_device has not been implemented and thus autotune is not yet implemented as well. But do we really need it? I cannot see it in the vector version of ax_helm_device implementation. Is the 1D or kstep discussion for vector version of ax_helm happened before?

Exactly, there's no 1d version of these, since 1d will run out of shared memory for most polynomial orders.

Pipe from personal fork

fuse some kernels in pnpn_res_stress_device.F90

MartinKarp

Great stuff Shiyu! I just added two comments, feel free to take them to heart or not.

One thing I noted is the use of quite a lot of math functions such as pow, divisions and so on. I think we should just be aware this might be an issue if people run in single precision and one can perhaps use xreal in some places then. This is just a note for the future though.

src/les/bcknd/device/vreman_device.f90

src/les/bcknd/device/cuda/sigma_nut_kernel.h

Shiyu-Sandy-Du · 2024-11-06T10:53:54Z

Great stuff Shiyu! I just added two comments, feel free to take them to heart or not.

One thing I noted is the use of quite a lot of math functions such as pow, divisions and so on. I think we should just be aware this might be an issue if people run in single precision and one can perhaps use xreal in some places then. This is just a note for the future though.

I changed the unnecessary pow into multiplications here. And I think we should note the sin, cos and sqrt here for future maybe.

Shiyu-Sandy-Du and others added 30 commits October 1, 2024 17:08

scratch the device implementation for Vreman model, and should be run…

f4b9bf9

… on cuda and hip, not opencl

add vecsqrt1 and rmneg for cpu math

0f95fee

interface the device compute for Vreman into vreman.f90

81d53d6

add hip backend for sigma

01d231b

currently mute cuda backend for sigma for testing

16268a4

Fix little typo in sigma hip kernel

243433d

make sigma compilbale for hip

1cdd134

fix bugs to compile

fc33583

Merge pull request #7 from Shiyu-Sandy-Du/develop

3a0faad

Merge develop

fix bugs

fdc404e

fit bugs when compiling

5e0f5dc

fix bugs in Makefile.am

6a8c316

Merge pull request #8 from ExtremeFLOW/develop

dd5ac71

Merge latest develop

sigma model gpu: make the multiplication of mult after gsop in the ke…

83dd750

…rnel together with other operations instead of using device_math

fix some syntax error in Cpp

a107a91

fix typo in hip

4ea84ab

move Vreman device into kernel instead of using device_math

dcbbb02

build up the framework for ax_helm_full_device and copy code from ax_…

164bf8d

…helm_vector for reference

add part2 of vector form as well

1deed66

add ax_helm_full for hip

6787973

update depends

699a35e

fix some typos

2e8267c

remove unnecessary operations

e16da50

add hip support for prs_res_stress

e38c2e2

get hip implementation of fused coupled cg on the rail

28cad2e

fix bug and manage to run a tgv for a few convergent time steps

5bd5833

update depend

5f0daa1

remove redundants

a089c88

add the check of v_res to see the bug

7cac535

check the residual before iteration starts

738fd2b

Shiyu-Sandy-Du and others added 6 commits November 4, 2024 16:05

fix typo in ax_helm_full.cu

be0303e

Merge branch 'ExtremeFLOW:feature/sgs_gpu' into feature/sgs_gpu

18ab2d4

Merge pull request #1583 from Shiyu-Sandy-Du/feature/sgs_gpu

3c439be

pipe from private fork

fix typos in ax_helm_full.cu

c184cfb

Merge branch 'feature/sgs_gpu' of github.com:Shiyu-Sandy-Du/neko into…

7b76ee3

… feature/sgs_gpu

please the ifx compiler

6c95605

njansson reviewed Nov 4, 2024

View reviewed changes

src/les/bcknd/device/hip/hip_sigma_nut.f90 Outdated Show resolved Hide resolved

njansson reviewed Nov 4, 2024

View reviewed changes

src/les/bcknd/device/hip/hip_vreman_nut.f90 Outdated Show resolved Hide resolved

This was linked to issues Nov 4, 2024

Device implementation for ax_helm_full #1573

Closed

No HIP implementation for Fused Conjugate Gradient method #1569

Closed

Shiyu-Sandy-Du and others added 8 commits November 5, 2024 10:32

move interface for nut device into device_xxx_nut.F90

4827522

update depends

e591e6e

add cuda counterpart of vreman and sigma models

6d256a8

Merge pull request #1584 from Shiyu-Sandy-Du/feature/sgs_gpu

c78f0d6

Pipe from personal fork

fix typos

434b1f1

Merge branch 'ExtremeFLOW:feature/sgs_gpu' into feature/sgs_gpu

863c5d3

fuse some kernels in pnpn_res_stress_device.F90

219d1f5

Merge pull request #1585 from Shiyu-Sandy-Du/feature/sgs_gpu

287673c

fuse some kernels in pnpn_res_stress_device.F90

njansson approved these changes Nov 6, 2024

View reviewed changes

njansson enabled auto-merge November 6, 2024 09:38

MartinKarp approved these changes Nov 6, 2024

View reviewed changes

src/les/bcknd/device/vreman_device.f90 Outdated Show resolved Hide resolved

src/les/bcknd/device/cuda/sigma_nut_kernel.h Show resolved Hide resolved

Shiyu-Sandy-Du added 2 commits November 6, 2024 11:37

change some gs operation coding in les models for readability

0439bba

change pow(a, 3.0/2.0) into sqrt(a*a*a)

926f527

njansson merged commit 1997286 into develop Nov 6, 2024
25 of 27 checks passed

njansson deleted the feature/sgs_gpu branch November 6, 2024 11:44

njansson linked an issue Nov 6, 2024 that may be closed by this pull request

No HIP implementation for Fused coupled Conjugate Gradient method #1572

Closed

njansson mentioned this pull request Nov 6, 2024

No HIP implementation for Fused coupled Conjugate Gradient method #1572

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performing LES on devices #1580

Performing LES on devices #1580

Shiyu-Sandy-Du commented Nov 4, 2024 •

edited by njansson

Loading

njansson commented Nov 4, 2024

njansson commented Nov 4, 2024 •

edited by Shiyu-Sandy-Du

Loading

MartinKarp left a comment

Shiyu-Sandy-Du commented Nov 6, 2024

Performing LES on devices #1580

Performing LES on devices #1580

Conversation

Shiyu-Sandy-Du commented Nov 4, 2024 • edited by njansson Loading

njansson commented Nov 4, 2024

njansson commented Nov 4, 2024 • edited by Shiyu-Sandy-Du Loading

MartinKarp left a comment

Choose a reason for hiding this comment

Shiyu-Sandy-Du commented Nov 6, 2024

Shiyu-Sandy-Du commented Nov 4, 2024 •

edited by njansson

Loading

njansson commented Nov 4, 2024 •

edited by Shiyu-Sandy-Du

Loading