-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Java Alternative Algorithm does not work for arbitrary NDRanges #142
Comments
Aparapi "falls back" to a "Java Alternative Algorithm" (JAA) in cases where the Java byte code cannot be translated to OpenCL code. The JAA, however, only works if the NDRange is an exact power of 2. The following example illustrates the problem:
Execution yields the following incorrect output:
|
It seems to be multiples of 4, not powers of 2. Anything below 4 is 0, below 16 is 12. |
Hmmm that is interesting. I will have to investigate this issue again and
see if that provides any new clues.
…On Wed, Mar 6, 2019 at 1:04 PM NorbiPeti ***@***.***> wrote:
It seems to be multiples of 4, not powers of 2. Anything below 4 is 0,
below 16 is 12.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#142 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AC5JAscx41TgYJOkMnzY71hvzbJYO-Bqks5vT67VgaJpZM4Y2SiB>
.
|
Same problem here. If NDRange is a multiple of available threads (12 in my case), all is fine. If not, then the results are (partially) incorrect. Tested by fiddling with ArrayTest (changing SIZE constant and forcing Java fall-back by adding a println of current id). |
SIZE = 6 (less than number of threads)
SIZE = 12 (number of threads)
SIZE = 13 (number of threads + 1)
SIZE = 23 ( 2 * number of threads - 1)
From this it seems as if the final non-full batch is not calculated at all (arrays are initialized with 99 in this test). |
Setting SEQUENTIAL as the preferred device would work as a fallback ... |
This seems to be boiling down to |
This optimization(?)
is only happening for Edit: it also contradicts the javadoc, which reads
which is what the thread scheduling relies upon. |
@rkraneis Thanks for your work on finding the root cause. The reason for that code being there is that Java performance is likely to be best if the number of Java Thread Pool threads doesn't exceed the CPU available cores (eventually Hyper-Threading siblings are also included in the count which is undesirable for High Performance Computing workloads, anyway). So either this is removed, since it can violate the rule _globalWith % _localWidth == 0, or the logic must be modified so that it respects such logic. |
@CoreRasurae I can indeed confirm that the Java Alternative Algorithm only works if NDRange is a multiple of the available logical processors. I tested this with two machines, one with 4, the other with 8 logical processors. In the first case, NDRange had to be a multiple of 4 for the alternative algorithm to work correctly, in the second case NDRange had to be a multiple of 8. |
@Helios85 Thanks for confirming this. |
Tested whether the bug is still there on my device using the provided class JAATest. |
No description provided.
The text was updated successfully, but these errors were encountered: