-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performing LES on devices #1580
Conversation
… on cuda and hip, not opencl
Merge develop
Merge latest develop
…rnel together with other operations instead of using device_math
…helm_vector for reference
pipe from private fork
Nice! |
To answer some of the questions,
There are gains to be made here, by fusing the e.g. the cluster of col3 and sub2. Remember that each call to a device math functions comes with a launch latency, which is not negligible.
Exactly, there's no 1d version of these, since 1d will run out of shared memory for most polynomial orders. |
Pipe from personal fork
fuse some kernels in pnpn_res_stress_device.F90
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff Shiyu! I just added two comments, feel free to take them to heart or not.
One thing I noted is the use of quite a lot of math functions such as pow, divisions and so on. I think we should just be aware this might be an issue if people run in single precision and one can perhaps use xreal in some places then. This is just a note for the future though.
I changed the unnecessary pow into multiplications here. And I think we should note the sin, cos and sqrt here for future maybe. |
In this PR,
Some other things to be discussed: