Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize put_uni_pixels_N_128x128 AVX2/SSE4 code #147

Open
nuomi2021 opened this issue Oct 6, 2023 · 4 comments
Open

optimize put_uni_pixels_N_128x128 AVX2/SSE4 code #147

nuomi2021 opened this issue Oct 6, 2023 · 4 comments
Labels
asm good first issue Good for newcomers

Comments

@nuomi2021
Copy link
Member

nuomi2021 commented Oct 6, 2023

see #146 (comment)
we have a similar issue for put_pixels too, see #145 (comment)

@nuomi2021 nuomi2021 added asm good first issue Good for newcomers labels Oct 6, 2023
@nuomi2021 nuomi2021 changed the title optimize put_uni_pixelsN_128x128 AVX2/SSE4 code optimize put_uni_pixels_N_128x128 AVX2/SSE4 code Oct 6, 2023
@nuomi2021
Copy link
Member Author

How to reproduce it:
make checkasm -j && ./tests/checkasm/checkasm --test=vvc_mc --bench

@rohanjulka19
Copy link

Hi, I have been investigating the performance issue and it seems like memcopy in the C code is moving 128 bytes in single iteration and sse4 code is moving 16 bytes in a single iteration. Can this be the reason of slowness ?

This was the code I saw while debugging.

Memcopy Code

Screenshot 2024-02-04 at 4 55 48 PM

ff_vvc_put_uni_pixels16_8_sse4

Screenshot 2024-02-04 at 4 56 00 PM

@nuomi2021
Copy link
Member Author

@rohanjulka19 , sorry for missed your post.
Yes, this may be the reason, could you help send 3 patches to the mailing list for this? One for hevc, one for vvc. then you can remove sse 128 using another patch.

also, some 64xX have similar issues, could also help check?
thank you

put_luma_uni_pixels_8_64x4_c: 10.1
put_luma_uni_pixels_8_64x4_sse4: 24.6
put_luma_uni_pixels_8_64x4_avx2: 15.1

@nuomi2021
Copy link
Member Author

comment and commit log are important too. It's easy to merge if it's clear to reviewers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
asm good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants