Optimizing Self-Organizing Maps for Bacterial Genome Identification on Parallel Ultra-Low-Power Platforms
The kernel design includes a Python model and a C program. The Python model generates the input dataset, computes the kernel output as a golden reference, and assesses the accuracy using a customizable error metric.
The golden model is built on top of PyTorch data types.
This implementation has been tested and verified with Python 3.10.
if you are not going to use float8, the following packages need to be installed:
pip install torch
To enable support for float8, utilize our e5m2 implementation. Subsequently, proceed with the instructions for installing PyTorch from the source provided in this directory.
We conducted tests using our implementation on the CPU. To replicate our setup, kindly disable CUDA support by exporting the environment variable USE_CUDA=0:
export USE_CUDA=0
Please use
git clone [email protected]:ahmad-mirsalari/PyTorch_E5M2.git
instead of
git clone --recursive https://github.com/pytorch/pytorch
in the "Get the PyTorch Source" step.
In case you encounter the error "multiple definition of `gloo::rendezvous::Store::kDefaultTimeout'", please refer to the solution outlined in this GitHub issue. It's important to note that this issue is unrelated to our implementation.
Once Torch is installed, navigate to the root directory of the modified PyTorch codebase in the terminal or command prompt. Run the following command to install PyTorch in editable mode:
pip install -e .
The . at the end indicates that the current directory should be installed in editable mode.
Once the installation is complete, you can import the modified version of PyTorch in your Python code just like you would with the regular PyTorch library:
import torch
This will import the modified version of PyTorch that you installed in editable mode.
These tests requires the PULP-SDK. Once you cloned the PULP-SDK repository and you have the RISC-V GNU Compiler Toolchain installed, you need to compile GVSOC. Please refer to the links to correctly setup your working environment.
Here is my suggestion:
1- First install and compile the RISC-V GNU Compiler Toolchain.
Follow the next steps in the RISC-V GNU Compiler Toolchain repository.
2- Install and compile PULP-SDK.
Please follow the next setups in the PULP-SDK repository
3- Finally, test the installation according to Test execution
Don't forget to source the file corresponding to the desired configuration when you want to use the the project again :
cd pulp-sdk
source configs/pulp-open.sh
After the platform and the SDK setup, you can run the test.
If you want to generate a golden model, you can use the data_generator.py script with the following command:
./data_generator.py --I=Train_sequence --T=Test_sequence --R=Number_of_Bacteria --S=slice_length --N=Neurons --float_type= --MAC_flag=false --vec_flag=false
- specifies the floating-point format for data; by default, it is set to
FP32
, but you can also chooseFP16
,FP16ALT
, andFP8
formats. Also, you can run the mixed-precision golden model by using--float_type=FP_INP,FP_Weight,FP_OUT
(input, SOM weights, output). MAC_flag
is used to emulate the multiply-and-add operator available on most DSP instruction sets for embedded devices. It can be true or false. To emulateFP16
,FP8
, andFP16ALT
behavior on PULP, true this flag.- vector flag to emulate SIMD vector instructions. It can be true or false. To emulate vectorized
FP16
,FP8
andFP16ALT
behavior on PULP, true this flag. I
is the number of train data(e.g., 40000)T
is the number of test data(e.g., 1000)R
is the number of Bacteria(SOM)(e.g., 2).S
is the number of slice lengths (e.g., 8)N
is the number of neurons per each network(e.g., 40000) The script will generate floating-point data and a reference output of formatfmt
(FP32/FP16/FP8/FP16ALT):
make clean all run stats=1 check=1 w_block=128 i_block=32 vec=1 cores=1 fmt=FP16 verbose=1 IN_ORDER=1
make clean all run stats=1 check=1 w_block=128 i_block=32 cores=1 fmt=FP16 verbose=1 IN_ORDER=1 There are several flags useful to activate some functionalities:
cores=N_CORES
sets the number of cores used for the execution toN_CORES
, by defaultcores=1
. There is also the ability to run on the Fabric controller by usingFABRIC=1
instead ofcores=N_CORE
.fmt=FP_FMT
specifies the floating-point format for data, by default, it is set toFP32
but you can also chooseFP16
,FP8
orFP16ALT
formats.vec=1
activates vectorial format only for half-precision floating point (FP16 and FP16ALT) or FP8check=1
activates the result checkverbose=1
prints the wrong resultsstats=1
activates performance measurementPRINT_RESULTS=1
print outputs of C codew_block
the tile size of the SOM network (The number of neurons must be divisible by this number)i_block
the tile size of the input (The number of input must be divisible by this number)IN_ORDER=1
if you want to use the Vertical Mapping approach. Please consider that the number of cores should be >1 in the Horizontal mapping mode
- Extend the support to additional FP data types (e.g., different flavors of 8-bit FP types)
This project is released under Apache 2.0, see the LICENSE file in the root of this repository for details.
This work was supported by the APROPOS project (g.a. no. 956090), founded by the European Union’s Horizon 2020 research and innovation program.
- Seyed Ahmad Mirsalari, University of Bologna,E-mail