Threading execution #2

ngaloppo · 2022-06-16T23:50:48Z

Based on Ben's concurrent execution sample code, created a simplified version that submits a time sink kernel on two threads concurrently. Execution time per kernel instance on a single thread is only half as long as execution time on two threads.

@bashbaug Any idea what could be going on here? Is there an issue perhaps with sharing a context / device across multiple threads?

[@tgl:~/code/simple-sycl-samples/build] [intelpython-python3.9] concurrent-execution(+1/-0)* ± ../install/Release/thread_concurrency -p 2
Running on SYCL platform: Intel(R) OpenCL HD Graphics
Running on SYCL device: Intel(R) Iris(R) Xe Graphics [0x9a49]
Initializing tests...
... done!
Testing without threads
                                      go (i=  0): Average time: 0.031794 seconds
Testing with threads
                                      go (i=  1): Average time: 0.062274 seconds
                                      go (i=  0): Average time: 0.063263 seconds
Cleaning up...
... done!

bashbaug · 2022-06-17T22:11:37Z

Brief notes in case this helps somebody else in the future:

This device is able to execute kernels concurrently by batching them together into one submission. One way to do this is to put kernels into the same out-of-order queue without any dependencies. The driver will also batch submissions from multiple queues - both in-order and out-of-order queues - though the submissions need to be close enough together to batch.

In this particular case, one of the threads get start slightly before the other, so the calls go:

// Thread 1:
clEnqueueNDRangeKernel
clFinish

// Thread 2, a little later:
clEnqueueNDRangeKernel
clFinish

// Repeat

Because the clFinish from Thread 1 happens before the clEnqueueNDRangeKernel from Thread 2, the driver will not batch these two submissions together, and they won't run concurrently.

If the calls happened to occur very close together in time, or if this is enforced with e.g. a thread barrier, the kernels should execute concurrently. In other words, there is no inherent reason why kernels from multiple threads cannot run concurrently, they just happen not to run concurrently in this case.

bashbaug and others added 7 commits June 3, 2022 13:41

infrastructure

fa0fd02

first working version, added all scenarios except interop

8a259ba

add an out-of-order queue with dependencies example

48981e0

add a read-only accessor variant

513107a

add cmake options to build for CUDA devices

f43e94d

Add concurrency with threading example

809121c

Add alternative measurement strategy

f4c753a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Threading execution #2

Threading execution #2

ngaloppo commented Jun 16, 2022

bashbaug commented Jun 17, 2022

Threading execution #2

Are you sure you want to change the base?

Threading execution #2

Conversation

ngaloppo commented Jun 16, 2022

bashbaug commented Jun 17, 2022