Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Platform detection incorrect for AVX2 support #36

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

techvintage
Copy link

The current code does not detect platform correctly for AVX2 support and hence uses slower functions on non-Intel platforms.This fix causes AVX2 related functions to be invoked correctly and giving the required 2.5X+ speed-up on AMD and possibly other platforms which support AVX2.

and hence uses slower functions on non-Intel platforms.This fix causes AVX2 related functions
to be invoked correctly and giving the required 2.5X+ speed-up on AMD and other platforms which support AVX2.
@tianjunwork
Copy link
Contributor

Hi @techvintage Thanks for this PR.
Instead of hard code, could you make the change in can_use_intel_avx512()? You can change the name of this function too.

@techvintage
Copy link
Author

Hi @tianjunwork ,this is not hardcoded but changing asm_type to highest level (=7) to begin with.If AVX2 functionality is not supported,it starts downgrading and uses corresponding slower functions instead.This is opposite approach to setting asm_type=0 (lowest value) and increasing it incrementally and fixes the bug.

@tianjunwork
Copy link
Contributor

Hi @techvintage, I don't get it, if the logic after uint32_t asm_type = 7; works as expected on your platform, asm_type will be reassigned to a new value. How does your patch work?

@techvintage
Copy link
Author

So can_use_intel_avx512() and can_use_Intel_4thGen_features function do not work correctly on non-Intel Platforms and proceeds with asm_type=7 and picks up the faster functions on AMD platforms correctly.We saw a 2.5X+ gain with this on Matisse with this.
Also, I see a FFmpeg patch failing on travis CI,My travis CI is synced to github but looks like cloning ffmpeg git repo timed out.Do we developers have a way of triggering the build again?

@tianjunwork
Copy link
Contributor

I don't have a non-intel platform so don't know how these two functions behave. You mean can_use_intel_avx512() doesn't work but still return 1? Otherwise it runs to "asm_type = 1;".
I re-run the travis task. It is passed now.

@techvintage
Copy link
Author

Yes,that's correct.So if we start with asm_type=7(highest),If can_use_intel_avx512() returns true for Intel platforms,it will be still at 7 else will start lowering asm_type based on support available.

@tianjunwork
Copy link
Contributor

If can_use_intel_avx512() still returns 1 for non-intel platform, why do you need to set asm_type=7 at the beginning?
Intel platform works just fine. Your goal is to handle non-intel platform, right?
If so, instead of letting non-working can_use_intel_avx512() and can_use_intel_core4th_gen_features() mess up the logic. Can we make these two function more generic?

@@ -240,7 +240,7 @@ static int32_t can_use_intel_avx512()
// Using bit-fields, the fastest function will always be selected based on the available functions in the function arrays
uint32_t get_cpu_asm_type()
{
uint32_t asm_type = 0;
uint32_t asm_type = 7; //This fixes the case where AVX2 is not getting invoked even when AVX2 is supported in the platform.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @techvintage ,

Please consider to implemente the checking for AMD CPUs, like what CanUseIntelAVX512() and Check4thGenIntelCoreFeatures() do for Intel CPUs. Your change occationally can work on your platform which supports AVX2, but cannot for other (old) platforms which can only support SSEx or no intrinsics.
In theory, GetCpuAsmType() should check all the x86 CPUs with SSE and/or AVX intrinsics supported, including Intel, AMD (and VIA).

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants