Optimize performance for non-amd64 architectures #21
Labels
enhancement
Improving existing functionality
I2
Regular impact
performance
More of something per second
S4
Routine
U4
Nothing urgent
Currently we have optimized versions only for AVX/AVX2-enabled architectures. While pure Go implementation also exists, we might want to optimize implementation for ARM or RISC-V architectures. Note that reference implementation from https://github.com/srijs/hwsl2-core deliberately implements only AVX* optimisations, so we must craft ARM/RISC-V code ourselves.
The first step is to optimize GF127 code (with our modulo) in isolation.
Comparing assembly output for pure Go and C version might be helpful.
Some links to study:
https://www.ssrc.ucsc.edu/Papers/greenan-mascots08.pdf
RISC-V has crypto-extensions as well, some of them might be useful:
https://riscv.org/wp-content/uploads/2017/12/Wed-1354-RISCV-CryptoExtensions-RichardNewell.pdf
The text was updated successfully, but these errors were encountered: