Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Threading execution #2

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from
Draft

Conversation

ngaloppo
Copy link

Based on Ben's concurrent execution sample code, created a simplified version that submits a time sink kernel on two threads concurrently. Execution time per kernel instance on a single thread is only half as long as execution time on two threads.

@bashbaug Any idea what could be going on here? Is there an issue perhaps with sharing a context / device across multiple threads?

[@tgl:~/code/simple-sycl-samples/build] [intelpython-python3.9] concurrent-execution(+1/-0)* ± ../install/Release/thread_concurrency -p 2
Running on SYCL platform: Intel(R) OpenCL HD Graphics
Running on SYCL device: Intel(R) Iris(R) Xe Graphics [0x9a49]
Initializing tests...
... done!
Testing without threads
                                      go (i=  0): Average time: 0.031794 seconds
Testing with threads
                                      go (i=  1): Average time: 0.062274 seconds
                                      go (i=  0): Average time: 0.063263 seconds
Cleaning up...
... done!

@bashbaug
Copy link
Owner

Brief notes in case this helps somebody else in the future:

This device is able to execute kernels concurrently by batching them together into one submission. One way to do this is to put kernels into the same out-of-order queue without any dependencies. The driver will also batch submissions from multiple queues - both in-order and out-of-order queues - though the submissions need to be close enough together to batch.

In this particular case, one of the threads get start slightly before the other, so the calls go:

// Thread 1:
clEnqueueNDRangeKernel
clFinish

// Thread 2, a little later:
clEnqueueNDRangeKernel
clFinish

// Repeat

Because the clFinish from Thread 1 happens before the clEnqueueNDRangeKernel from Thread 2, the driver will not batch these two submissions together, and they won't run concurrently.

If the calls happened to occur very close together in time, or if this is enforced with e.g. a thread barrier, the kernels should execute concurrently. In other words, there is no inherent reason why kernels from multiple threads cannot run concurrently, they just happen not to run concurrently in this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants