You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently working on a Java library for matrix operations and ML. I use Aparapi to utilize the GPU.
I've written this code to multiply two matrices:
public static NDmatrix matMul(float[][] a, float[][] b) {
int[] aDim = new int[]{a.length, a[0].length};
int[] bDim = new int[]{b.length, b[0].length};
if(aDim[1] == bDim[0]){
int[] Dim = new int[]{aDim[0], bDim[1]};
int aVSize = aDim[0] * aDim[1];
float[] aVector = new float[aVSize];
for(int i = 0; i < aDim[0]; i++)
System.arraycopy(a[i], 0, aVector, i * aDim[1], aDim[1]);
int bVSize = bDim[0] * bDim[1];
float[] bVector = new float[bVSize];
for(int i = 0; i < bDim[0]; i++)
System.arraycopy(b[i], 0, bVector, i * bDim[1], bDim[1]);
int resVSize = Dim[0] * Dim[1];
float[] resVector = new float[resVSize];
int d[] = new int[]{aDim[1]};
Kernel mKernel = new Kernel() {
final int ht = Dim[0];
final int wt = Dim[1];
final int dpt = d[0];
public void run() {
int c = getGlobalId(0);
int r = getGlobalId(1);
int l = getGlobalId(2); // used with 3D range
localBarrier(); // used with 3D range
//for(int l = 0; l < dpt; l++) // used with 2D range
resVector[r * wt + c] += aVector[r * dpt + l] * bVector[l * wt + c]; // in 3D range, this only works for l = 0
}
};
mKernel.setExplicit(true);
mKernel.put(aVector);
mKernel.put(bVector);
mKernel.put(resVector);
mKernel.put(Dim);
mKernel.put(d);
mKernel.execute(Range.create3D(Dim[1], Dim[0], d[0]));
//mKernel.execute(Range.create2D(Dim[1], Dim[0]));
mKernel.get(resVector);
mKernel.dispose();
return new NDmatrix(Dim, resVector, null);
}
System.out.println("The number of columns in left matrix and number of rows in right matrix do not match.");
System.out.println();
return null;
}
However, it looks like resVector[some_index] only gets updated once(r + aVector[0 * dpt + 0] * bVector[0 * wt + c]). If, instead, I use a 2D range and a loop(the commented bits in the code), it works correctly. What could be the reason for such behavior and how can I force it to work fully parallel?
Interestingly, I tried one thing that "worked" - after updating resVector[some_index], I called this.put(resVector). However, it then couldn't compile in OpenCL and ended up using Java's multi-threading instead, eventually resulting in a correct result.
The text was updated successfully, but these errors were encountered:
Currently working on a Java library for matrix operations and ML. I use Aparapi to utilize the GPU.
I've written this code to multiply two matrices:
However, it looks like resVector[some_index] only gets updated once(r + aVector[0 * dpt + 0] * bVector[0 * wt + c]). If, instead, I use a 2D range and a loop(the commented bits in the code), it works correctly. What could be the reason for such behavior and how can I force it to work fully parallel?
Interestingly, I tried one thing that "worked" - after updating resVector[some_index], I called this.put(resVector). However, it then couldn't compile in OpenCL and ended up using Java's multi-threading instead, eventually resulting in a correct result.
The text was updated successfully, but these errors were encountered: