Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vectorization sample. #1067

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft

Conversation

DanAlbert
Copy link
Member

This adds a new generic vectorization sample to replace hello-neon. Most importantly, it covers non-Neon options for SIMD. One of the useful things it shows, for example, is that there's actually no reason to write SIMD code the way that hello-neon does any more.

This also solves a much simpler problem (small matrix multiplication), which makes it easier to see how to deal with the SIMD features rather than figuring out what a FIR filter is.

Finally, this sample benchmarks each of the implementations so it's obvious what is and isn't worth doing. I was sort of surprised that auto-vectorization didn't do better, and was pleased to learn that there's no reason at all to write Neon intrinsics.

I'll delete hello-neon after this merges and I've fixed up the doc links.

#1011

@DanAlbert
Copy link
Member Author

DanAlbert commented Jun 12, 2024

Currently a draft because I seem to have encountered an AGP bug that makes cross-module dependencies flaky, so I may need to deal with base in a different way...

The code itself wouldn't change to accommodate that though. Aside from the todo about checking .at() vs operator[] (which might actually be significant here), the code is, I think, done.

vectorization/README.md Outdated Show resolved Hide resolved
vectorization/README.md Outdated Show resolved Hide resolved
vectorization/README.md Outdated Show resolved Hide resolved
vectorization/README.md Outdated Show resolved Hide resolved
vectorization/README.md Outdated Show resolved Hide resolved
vectorization/README.md Outdated Show resolved Hide resolved
making for less concise code than is needed for a constrained vector size like
we have here, handle windowing of data to fit the hardware vector size for you.
For problems like the small matrix multiply we do here, it's overkill. For
portability to a wide variety of (Arm) CPUs, it can reduce the difficulty of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes it sound more available than it is ... "qualcomm has not shipped SVE" is a pretty huge caveat for real-world use.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. I've toned it down, but avoided being specific because I know we won't remember to come fix it as the hardware does roll out. PTAL

@DanAlbert
Copy link
Member Author

DanAlbert commented Jun 17, 2024

Added a fair amount of content to the readme for explaining FMV (to the best of my knowledge, anyway, and since I learned it all just now it might be wrong). That probably belongs on DAC rather than here, but it can go here for now.

that it is limited by the width of the vector registers for the target hardware.
To deal with problems that don't fit in the target's vector registers, you would
need to either alter the algorithm to tile the operations, or use Scalable
Vector Extensions (AKA [SVE]).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does SVE actually help with this problem? (have you considered that negge might be a better reviewer for this stuff? :-) )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pointed him at it a few days ago but haven't heard back yet. @negge explicitly now that I know his github username :)

support big endian, so we can ignore that caveat) is use the `*` operator and
leave the correct instruction selection up to Clang.

In other words, you should probably never use the Neon-specific approach. The
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this really seems worth a separate note on the d.a.c neon page...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. One thing at a time though :)

vectorization/README.md Outdated Show resolved Hide resolved
Deriving pointer alignment essentially makes this do nothing, since for
whatever reason (fighting with Studio maybe?), for new files it's
deriving the wrong style.
This adds a new generic vectorization sample to replace hello-neon. Most
importantly, it covers non-Neon options for SIMD. One of the useful
things it shows, for example, is that there's actually no reason to
write SIMD code the way that hello-neon does any more.

This also solves a much simpler problem (small matrix multiplication),
which makes it easier to see how to deal with the SIMD features rather
than figuring out what a FIR filter is.

Finally, this sample benchmarks each of the implementations so it's
obvious what is and isn't worth doing. I was sort of surprised that
auto-vectorization didn't do better, and was pleased to learn that
there's no reason at all to write Neon intrinsics.

I'll delete hello-neon after this merges and I've fixed up the doc
links.

android#1011
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants