You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In short, Cache-Oblivious Algorithm (COA) is an algorithm that takes advantage of cache without knowing the cache size. This paper proposes COA for rectangular matrix transpose, fast Fourier transform (FFT), and sorting on computers with multiple levels of caching.
It would be nice to have a tutorial on the cache, how it works (visually), and the COA.
The text was updated successfully, but these errors were encountered:
In #938, we use the COA approach for transposing x.reshape(n, n), where x is a 1D array with length $n^2$ , and $n = 2^{2k}$ (see below)
@njit
def func(x, n, x_transpose):
"""
Parameters
-----------
x : numpy.ndarray
A numpy 1D array, with length n^2.
n : int
square-root of len(x). n is a power of 4.
x_transpose : numpy.ndarray
A numpy 1D array with same length as x. x_transpose.reshape(n,n) will contain the transpose of x.reshape(n,n)
"""
blocksize = 32
blocksize = min(blocksize, n)
x = x.reshape(n, n)
x_transpose = x_transpose.reshape(n, n)
for i in range(0, n, blocksize):
for j in range(0, n, blocksize):
x_transpose[i:i + blocksize, j:j + blocksize] = np.transpose(x[j:j + blocksize, i:i + blocksize])
return
The above COA-based function for transposing matrix is faster than the following functions (asymptotically speaking):
@njit
def func1(x, n):
x[:] = np.transpose(x.reshape(n, n)).ravel()
return
@njit(fastmath=True)
def func2(x, n):
for k in range(n):
for p in range(k + 1, n):
i = k + p * n
j = p + k * n
x[i], x[j] = x[j], x[i]
return
In short, Cache-Oblivious Algorithm (COA) is an algorithm that takes advantage of cache without knowing the cache size. This paper proposes COA for rectangular matrix transpose, fast Fourier transform (FFT), and sorting on computers with multiple levels of caching.
It would be nice to have a tutorial on the cache, how it works (visually), and the COA.
The text was updated successfully, but these errors were encountered: