Add vectorization sample. #1067

DanAlbert · 2024-06-12T22:14:36Z

This adds a new generic vectorization sample to replace hello-neon. Most importantly, it covers non-Neon options for SIMD. One of the useful things it shows, for example, is that there's actually no reason to write SIMD code the way that hello-neon does any more.

This also solves a much simpler problem (small matrix multiplication), which makes it easier to see how to deal with the SIMD features rather than figuring out what a FIR filter is.

Finally, this sample benchmarks each of the implementations so it's obvious what is and isn't worth doing. I was sort of surprised that auto-vectorization didn't do better, and was pleased to learn that there's no reason at all to write Neon intrinsics.

I'll delete hello-neon after this merges and I've fixed up the doc links.

#1011

DanAlbert · 2024-06-12T22:15:15Z

Currently a draft because I seem to have encountered an AGP bug that makes cross-module dependencies flaky, so I may need to deal with base in a different way...

The code itself wouldn't change to accommodate that though. Aside from the todo about checking .at() vs operator[] (which might actually be significant here), the code is, I think, done.

vectorization/README.md

enh-google · 2024-06-13T12:35:40Z

vectorization/README.md

+making for less concise code than is needed for a constrained vector size like
+we have here, handle windowing of data to fit the hardware vector size for you.
+For problems like the small matrix multiply we do here, it's overkill. For
+portability to a wide variety of (Arm) CPUs, it can reduce the difficulty of


this makes it sound more available than it is ... "qualcomm has not shipped SVE" is a pretty huge caveat for real-world use.

That's fair. I've toned it down, but avoided being specific because I know we won't remember to come fix it as the hardware does roll out. PTAL

vectorization/src/main/cpp/auto_vectorization.h

vectorization/src/main/cpp/benchmark.cpp

vectorization/src/main/cpp/matrix.h

DanAlbert · 2024-06-17T22:02:15Z

Added a fair amount of content to the readme for explaining FMV (to the best of my knowledge, anyway, and since I learned it all just now it might be wrong). That probably belongs on DAC rather than here, but it can go here for now.

enh-google · 2024-06-18T14:18:18Z

vectorization/README.md

+that it is limited by the width of the vector registers for the target hardware.
+To deal with problems that don't fit in the target's vector registers, you would
+need to either alter the algorithm to tile the operations, or use Scalable
+Vector Extensions (AKA [SVE]).


does SVE actually help with this problem? (have you considered that negge might be a better reviewer for this stuff? :-) )

I pointed him at it a few days ago but haven't heard back yet. @negge explicitly now that I know his github username :)

enh-google · 2024-06-18T14:19:49Z

vectorization/README.md

+support big endian, so we can ignore that caveat) is use the `*` operator and
+leave the correct instruction selection up to Clang.
+
+In other words, you should probably never use the Neon-specific approach. The


(this really seems worth a separate note on the d.a.c neon page...)

Yes. One thing at a time though :)

vectorization/README.md

Deriving pointer alignment essentially makes this do nothing, since for whatever reason (fighting with Studio maybe?), for new files it's deriving the wrong style.

This adds a new generic vectorization sample to replace hello-neon. Most importantly, it covers non-Neon options for SIMD. One of the useful things it shows, for example, is that there's actually no reason to write SIMD code the way that hello-neon does any more. This also solves a much simpler problem (small matrix multiplication), which makes it easier to see how to deal with the SIMD features rather than figuring out what a FIR filter is. Finally, this sample benchmarks each of the implementations so it's obvious what is and isn't worth doing. I was sort of surprised that auto-vectorization didn't do better, and was pleased to learn that there's no reason at all to write Neon intrinsics. I'll delete hello-neon after this merges and I've fixed up the doc links. android#1011

DanAlbert requested a review from enh-google June 12, 2024 22:14

DanAlbert commented Jun 12, 2024

View reviewed changes

vectorization/README.md Outdated Show resolved Hide resolved