Investigate OpenCL acceleration using pyopencl/pyvkfft #855

happycube · 2023-05-14T15:14:44Z

I finally decided to look into GPU acceleration after playing with whisper.cpp and realizing that OpenCL was still Actually Useful(tm). (Seriously, someone should've nudged me a while ago. Maybe I'm stubborner than I think I am... ;) )

This would involve a bit of refactoring, but if it gets a 2x performance boost it'd be worth it:

The multithreaded demodcache should be torn out (this would generally be a win)
Data needs to be kept GPU-side as much as possible
There would probably have to be a wrapper so that non-OpenCL envs would still run. (Apple has deprecated it so it will probably go away sooner or later on Mac - there will probably be something to run there by the time it does...)

I'm still in the testing phase. On my main test platform (Dell T3600 w/6-core Sandy Bridge and a Geforce 3060 12GB) pyvkfft is 150% faster at the standard blocksize (64K samples), and ~15x faster at 1MB. So this will probably shift the bottleneck to the TBC even further unless things can be kept on the GPU side most of the time.

I'm also going to look at a secondary test potato^Wplatform, a Mele Quieter3C which has a Celeron N5105 and it's integrated GPU. The latter does pyvkfft benchmarks at about 4-5% the speed of the 3060, but since the CPU does not support AVX(2) it might still be faster. (By the way, the new Nxxx series does have AVX2 and would only lag behind a Haswell i5 because it only has one memory channel. Not bad.)

At a later point, I'm planning on getting my hands on an rk3588 board - if OpenCL is running there with the free drivers I'll try that too, but the A76 has enough SIMD that it might not help.

happycube · 2023-05-14T15:28:31Z

N5105 notes: Not nearly as slow as I expected. Looks like ~1fps on ld-decode, and most of the slowness in OpenCL is data transfer, so the GPU results are even close.

An Alder Lake-N PC would probably do quite well for ld-decode if you put a nice NVM-e drive in it. These are not your father's Atoms.

happycube · 2023-05-14T16:02:29Z

I played around with doing int16->complex64 conversion on the GPU side, and it's now 50x faster at 1MB buffers and ~2x with 32K buffers on my main system, if I'm running things right.

(The n3050 is 7.2x/1.87x respectively, I aparently finally got the 3060 properly in play)

So overall speedup will be limited on how much I can use the GPU-side buffers to help with TBC/scaling.

happycube · 2023-06-20T13:50:49Z

I OpenCL'ified the RF stage, but performance gains are slight now because of pyopencl not releasing the GIL much, on top of switching to the threading model.

https://github.com/happycube/ld-decode/tree/chad-2023.06.11-opencl2

I hear PyCUDA isn't as bad, but since that's locked to nVidia I'd have to make sure the fallback will always work.

typedrat · 2023-11-21T23:10:50Z

Obviously I'm not telling you to rewrite your whole project in another language, but I think this is really edging into territory that Python is bad at. I don't know if it's ready yet, but in the long term this seems like exactly the sort of thing that Mojo is going to be great for.

happycube self-assigned this May 14, 2023

happycube added enhancement ld-decode An issue only affecting the ld-decode[r] labels May 14, 2023

happycube added this to the Revision 8 milestone May 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate OpenCL acceleration using pyopencl/pyvkfft #855

Investigate OpenCL acceleration using pyopencl/pyvkfft #855

happycube commented May 14, 2023 •

edited

Loading

happycube commented May 14, 2023 •

edited

Loading

happycube commented May 14, 2023 •

edited

Loading

happycube commented Jun 20, 2023 •

edited

Loading

typedrat commented Nov 21, 2023

Investigate OpenCL acceleration using pyopencl/pyvkfft #855

Investigate OpenCL acceleration using pyopencl/pyvkfft #855

Comments

happycube commented May 14, 2023 • edited Loading

happycube commented May 14, 2023 • edited Loading

happycube commented May 14, 2023 • edited Loading

happycube commented Jun 20, 2023 • edited Loading

typedrat commented Nov 21, 2023

happycube commented May 14, 2023 •

edited

Loading

happycube commented May 14, 2023 •

edited

Loading

happycube commented May 14, 2023 •

edited

Loading

happycube commented Jun 20, 2023 •

edited

Loading