-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build OpenCL kernels requiring CL2.0 (needed for __generic args) #135
base: main
Are you sure you want to change the base?
Conversation
I would be careful with OpenCL 2.0, it is not supported well across drivers. 3.0 is better supported, but it does not include many 2.0 features. I could test on the Intel driver and PoCL in case... |
Sorry for the late response, I noted it already yesterday. I can experiment with 3.0 too if you think relying on 2.0 is a problem, I had not known this. I had a quick look at the spec and CL3.0 also supports the We could leave the version configurable to the user, though I'm not sure of the benefit of that if our code does not compile on different versions (like currently, under 2.1)... Do you support the idea of bumping the version in principle? The alternative is working around it, and it will be a bit annoying, though not impossible. |
So:
|
Does pocl choke on the C code or the flag? CPUs have the same memory space, so I expect the feature to be supported. |
It chokes on the flag. I will have to check without, I wouldn't be surprised if it still has problems though: I'm seeing in |
Okay, so to complicate matters further, I have confirmed a bug in Xobjects (or more likely in pyopencl): no matter which platform is selected, the first one is actually chosen for running the kernel. It's very sneaky, because the compiler messages seem "okay" (see screenshot), but the actual kernel is not run where we need it. I'm not sure why at this point, I briefly looked at the code of pyopencl and nothing particularly bad jumps out. I first observed strange behaviour with gpustat where the gpus would light up even if allegedly running on CPU, and confirmed by using vendor specific macros in the kernel code ( |
benchmarks on https://github.com/rdemaria/simpletrack works normally for me. I have upgraded last version everything.
|
Description
It seems that OpenCL on CUDA is quite permissive, whereas on AMD machines, which more closely follow OpenCL spec, CL1.2 is taken as the default version even if the actual device supports higher ones.
The current version of Xtrack contains code incompatible with CL1.2. This is because in some instances local scope values are passed to
__global
parameters. CL2.0 introduces a default__generic
parameter, which accepts either, in a manner similar to CUDA. In particular the functionmultipole_compute_dpx_dpy_single_particle
receives arguments from either memory. Requiring CL2.0 is the easiest fix in this case: the alternatives are to explicitly make two versions of the function, or manually copy from the global memory to local.I am preparing a PR for Xtrack that fixes issues encountered on AMD, this is a prerequisite for those changes.
Checklist
Mandatory:
Optional: