Skip to content
This repository has been archived by the owner on Jul 29, 2024. It is now read-only.

AVX512_changes #478

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
Open

Conversation

deeptiag1
Copy link

@deeptiag1 deeptiag1 commented Feb 19, 2020

Signed-off-by: deeptiag1 [email protected]

Signed-off-by: deeptiag1 <[email protected]>
@1480c1
Copy link
Member

1480c1 commented Feb 19, 2020

@deeptiag1, you may want to check your git config, it seems you might have misspelled your email, unless your email domain is intel.comm

Signed-off-by: deeptiag1 <[email protected]>
Signed-off-by: deeptiag1 <[email protected]>
@tianjunwork
Copy link
Contributor

tianjunwork commented Feb 19, 2020

Hi @deeptiag1 , since AVX512 code doesn't execute when running encoder, our CI can't validate your code. Could you let us know what tests you did and the result? Thank you very much for your contribution.
Could you also give a better title to this PR which be more specific. Thank you.

@tianjunwork tianjunwork added the enhancement New feature or request label Feb 19, 2020
Signed-off-by: deeptiag1 <[email protected]>
@deeptiag1
Copy link
Author

Hi @tianjunwork, Enable the AVX512 flag in Ebdefinitions and then run the same test as one would run for any build. Let me know if all test pass.

Signed-off-by: deeptiag1 <[email protected]>
Signed-off-by: deeptiag1 <[email protected]>
Signed-off-by: deeptiag1 <[email protected]>
Signed-off-by: deeptiag1 <[email protected]>
@@ -0,0 +1,195 @@
#include "EbTransforms_AVX2.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

License header missing

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -0,0 +1,457 @@
#include "EbMcp_SSSE3.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing license header

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the AVX512 code should be moved to the AVX2 project

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

54, 67, -31, -73, 4, 78, 22, -82, 54, 67, -31, -73, 4, 78, 22, -82, -46, 85, 67, -88, -82, 90, 90, -90, -46, 85, 67, -88, -82, 90, 90, -90
};

extern void EbHevcTransform32_AVX512_INTRIN(EB_S16 *src, EB_U32 src_stride, EB_S16 *dst, EB_U32 dst_stride, EB_U32 shift)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be more appropriate being moved to the AVX2 project to keep all AVX512 code in the same location.

Signed-off-by: deeptiag1 <[email protected]>
@intelmark
Copy link
Contributor

intelmark commented Mar 27, 2020

From @hassount 3/17/2020:

From the results below, it seems that we need to disable the optimization for the LumaInterpolationFilterOneDOutRawHorizontal_AVX512 kernel and make sure that the documentation contains a clear reference to the gcc version requirement when vnni is on, then I don’t see any blockers to this PR moving forward assuming functional testing is passing._

The current PR shows:

#ifdef VNNI_SUPPORT
#ifndef NON_AVX512_SUPPORT
#define LumaInterpolationFilterOneDOutRawHorizontalLumaInterpolationFilterOneDOutRawHorizontal_AVX512
#else
#define LumaInterpolationFilterOneDOutRawHorizontalLumaInterpolationFilterOneDOutRawHorizontal_SSSE3
#endif

But isn't VNNI a subset of AVX512 - Are all combinations of the VNNI_SUPPORT and AVX512_SUPPORT defines possible?

Also there doesn't seem to be a mention of the gcc requirements for compiling the VNNI specific code anywhere in the PR

Signed-off-by: deeptiag1 <[email protected]>
Signed-off-by: deeptiag1 <[email protected]>
README.md Outdated
@@ -69,6 +69,7 @@ In order to run the highest resolution supported by the encoder, at least 64GB o
- Download the yasm exe from the following [link](http://www.tortall.net/projects/yasm/releases/yasm-1.3.0-win64.exe)
- Rename yasm-1.3.0-win64.exe to yasm.exe
- Copy yasm.exe into a location that is in the PATH environment variable
- Vnni requires gcc version greater then 9.2.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. greater than
  2. On my server it seems to be working with version: gcc version 9.2.1

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Signed-off-by: deeptiag1 <[email protected]>
Signed-off-by: deeptiag1 <[email protected]>
@deeptiag1
Copy link
Author

#define vnni in EbDefinitions enable vnni code committed in this PR.

Perf Data;
Encoder mode :0
reference_avx512 : branch vnni_changes_1 @ d92a7c4
vnni_avx512: reference_avx512 + VNNI_SUPPORT + LumaInterpolationFilterOneDOutRawHorizontal_SSSE3
speed
reference_avx512 | 0.95fps
vnni_avx512 | 0.96fps
Speed gain: 1%

Encoder mode :0
reference_avx2 : branch vnni_changes_1 @ d92a7c4 + NON_AVX512_SUPPORT
vnni_avx2: reference_avx2 + VNNI_SUPPORT + LumaInterpolationFilterOneDOutRawHorizontal_SSSE3
speed
reference_avx2 | 0.96fps
vnni_avx2 | 0.97fps
Speed gain: 1%

@intelmark
Copy link
Contributor

@deeptiag1 Regarding the performance data - great to see the speed up, but what size video and is there a command line that was used?

};

#ifndef NON_AVX512_SUPPORT
void LumaInterpolationFilterOneDOutRawHorizontal_AVX512(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why no code is calling LumaInterpolationFilterOneDOutRawHorizontal_AVX512 ?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants