Systolic array implementations for Cholesky, LU, and QR decomposition using HLS
- Ubuntu 16.04.5 LTS
- Xilinx Vivado HLS v2017.4
- Matlab R2017a
Inside each design folder, here are:
|-- Design_Folder/
|- common/
|- model4x4/
|- template/
Folder common/
includes script files shared for different designs.
Folder model4x4/
gives an example of 4x4 implementation, with detailed comments alongside the codes.
Folder template/
includes template cpp files used for generating codes.
For a understanding of each design, please go to model4x4/
and view the comments in design_name.cpp
, and refer to the illustrations shown below if necessary : )
For each design,
- Go to
common/
. Findalgorithm_name.cfg.xml
, revise it according to your matrix size MxN.
Please manually modify the parameterBIT
according toBIT = ceiling(log2(SIZE))
. - Run
runit.csh
. It will generate a new folderdesign_files/
with the designMxN/
inside:
|-- Design_Folder/
|- common/
|- design_files/
|- MxN/
|- model4x4/
|- template/
- Go to
Design_Folder/
and callgenA()
inMATLAB
to generate a random operand matrix A required for testbench.
Some specific cases like 8x8, 16x16 etc. are provided underDesign_Folder/
. You can use it or generate a new one.
How to use function genA():
For Cholesky: generating NxN symmetric positive definite matrix by calling genA(N)
For LU: generating NxN full rank matrix by calling genA(N)
For QR: generating MxN full rank matrix by calling genA(M,N)
- Go to
Design_Folder/design_files/MxN/
, runscript.tcl
in vivado_hls environment by$vivado_hls script.tcl
. - Revise
script.tcl
according to your demands. It by default runs through csim, synthesis and cosim.
- cholesky_v1.3:
A 1-D systolic array design for Cholesky Decomposition along projection vector (i,j,k)=(0,1,0) and (i,k)=(0,1), as illustrated below in (b). - cholesky_v4.0:
A 1-D systolic array design for Cholesky Decomposition along projection vector (i,j,k)=(0,1,0) and (i,k)=(1,0), as illustrated below in (c).
- cholesky_v3.2:
A 2-D systolic array design for Cholesky Decomposition along projection vector (i,j,k)=(1,0,0), as illustrated below in (b).
- cholesky_v2.2:
A 1-D systolic array design for Cholesky Decomposition along projection vector (i,j,k)=(0,1,0) and (i,k)=(0,1), as illustrated below in (b).
-
lu1D_v1.0:
A 1-D systolic array design for LU Decomposition along projection vector (i,j,k)=(0,1,0) and (i,k)=(0,1), as illustrated at the bottom of the picture below. -
lu1D_v2.0:
A 1-D systolic array design for LU Decomposition along projection vector (i,j,k)=(0,1,0) and (i,k)=(1,0), as illustrated at the right of the picture below.
- lu2D_v1.0:
A 2-D systolic array design for LU Decomposition along projection vector (i,j,k)=(0,1,0), as illustrated below in (b).
-
qr_v1.1:
A 1-D systolic array design for QR Decomposition along projection vector (i,j,k)=(1,0,0) and (j,k)=(0,1), as illustrated at the bottom of the picture below. -
qr_v1.2:
Replace unroll with pipeline in qr_v1.1 for better performance, as automatically unrolled rotations will not be parallelized due to FIFO conficts.
- qr_v2.1:
A 2-D systolic array design for QR Decomposition along projection vector (i,j,k)=(1,0,0), as illustrated below in (b).
I'd appreciate it if you could take a look at the following abstract and please cite it if it helps your work. :)
@inproceedings{Liu:2019:DSA:3289602.3293969,
author = {Liu, Jie and Cong, Jason},
title = {Dataflow Systolic Array Implementations of Matrix Decomposition Using High Level Synthesis},
booktitle = {Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays},
series = {FPGA '19},
year = {2019},
isbn = {978-1-4503-6137-8},
location = {Seaside, CA, USA},
pages = {187--187},
numpages = {1},
url = {http://doi.acm.org/10.1145/3289602.3293969},
doi = {10.1145/3289602.3293969},
acmid = {3293969},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {dataflow, high-level synthesis, matrix decomposition, systolic array, throughput},
}