-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Platform detection incorrect for AVX2 support #36
base: master
Are you sure you want to change the base?
Conversation
and hence uses slower functions on non-Intel platforms.This fix causes AVX2 related functions to be invoked correctly and giving the required 2.5X+ speed-up on AMD and other platforms which support AVX2.
Hi @techvintage Thanks for this PR. |
Hi @tianjunwork ,this is not hardcoded but changing asm_type to highest level (=7) to begin with.If AVX2 functionality is not supported,it starts downgrading and uses corresponding slower functions instead.This is opposite approach to setting asm_type=0 (lowest value) and increasing it incrementally and fixes the bug. |
Hi @techvintage, I don't get it, if the logic after uint32_t asm_type = 7; works as expected on your platform, asm_type will be reassigned to a new value. How does your patch work? |
So can_use_intel_avx512() and can_use_Intel_4thGen_features function do not work correctly on non-Intel Platforms and proceeds with asm_type=7 and picks up the faster functions on AMD platforms correctly.We saw a 2.5X+ gain with this on Matisse with this. |
I don't have a non-intel platform so don't know how these two functions behave. You mean can_use_intel_avx512() doesn't work but still return 1? Otherwise it runs to "asm_type = 1;". |
Yes,that's correct.So if we start with asm_type=7(highest),If can_use_intel_avx512() returns true for Intel platforms,it will be still at 7 else will start lowering asm_type based on support available. |
If can_use_intel_avx512() still returns 1 for non-intel platform, why do you need to set asm_type=7 at the beginning? |
@@ -240,7 +240,7 @@ static int32_t can_use_intel_avx512() | |||
// Using bit-fields, the fastest function will always be selected based on the available functions in the function arrays | |||
uint32_t get_cpu_asm_type() | |||
{ | |||
uint32_t asm_type = 0; | |||
uint32_t asm_type = 7; //This fixes the case where AVX2 is not getting invoked even when AVX2 is supported in the platform. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @techvintage ,
Please consider to implemente the checking for AMD CPUs, like what CanUseIntelAVX512() and Check4thGenIntelCoreFeatures() do for Intel CPUs. Your change occationally can work on your platform which supports AVX2, but cannot for other (old) platforms which can only support SSEx or no intrinsics.
In theory, GetCpuAsmType() should check all the x86 CPUs with SSE and/or AVX intrinsics supported, including Intel, AMD (and VIA).
Thanks!
The current code does not detect platform correctly for AVX2 support and hence uses slower functions on non-Intel platforms.This fix causes AVX2 related functions to be invoked correctly and giving the required 2.5X+ speed-up on AMD and possibly other platforms which support AVX2.