You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I find your benchmark to be very valuable. Do you have any good ideas or suggestions for testing the performance (throughput or latency) of various vector load instructions? I would like to explore the vector load performance on the K1 and K230.
Thanks
The text was updated successfully, but these errors were encountered:
Hi, I played around with adding the vector load/stores to the single instructions measurements, but I came to the conclusion that it would be more useful in a separate benchmark.
#12 has some measurements that show how different stride values perform. Ideally we'd measure something like that, with data from the different caches and from memory. I'm not sure how to properly do those measurements, though. This should probably also take into account different prefetch strategies.
For now you can look at the LUT4, and ascii to utf16/utf32, where indexed, strided and segmented loads are used in some of the implementations.
If you have suggestions please share them, I was planing to look at some memory measurements done on other ISAs, but I haven't gotten around to that yet.
Hi,
I find your benchmark to be very valuable. Do you have any good ideas or suggestions for testing the performance (throughput or latency) of various vector load instructions? I would like to explore the vector load performance on the K1 and K230.
Thanks
The text was updated successfully, but these errors were encountered: