string/aarch64: optimizing SVE routines #65

Optimized the following SVE string routines: `memcmp`, `strchr`, `strcmp`, `strcpy`, `strlen`, `strncmp`, `strnlen`, `strrchr`. On Arm Neoverse V1 microarchitectures, `INCx` instructions used to increment the loop offset can cause significant slowdowns. One solution is to hoist the retrieval of the SVE register width out of the loop using `CNTx` in the loop prelude and replace the `INCx` with a simple `ADD`. This change should not incur any performance penalty on other SVE-supporting microarchitectures (e.g., Neoverse V2, A64FX, etc...).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

string/aarch64: optimizing SVE routines #65

string/aarch64: optimizing SVE routines #65

Commits on Jan 24, 2024

string/aarch64: optimizing SVE routines #65

Are you sure you want to change the base?

string/aarch64: optimizing SVE routines #65

Commits on Jan 24, 2024