Skip to content

Commit

Permalink
Fix _mm_test_mix_ones_zeros
Browse files Browse the repository at this point in the history
Bug: `_mm_test_mix_ones_zeros` always returned true.
The function wasn't reducing `zf` and `cf` to a bool before combining them.

The fix proposed here isn't the most efficient, but at least it is correct. 

Note(s):
The arguments are named incorrectly in the `_mm_test_mix_ones_zeros` documentation[0].
The second argument is the mask, as per the behavior of `_mm_test_mix_ones_zeros` with gcc and clang.
This naming error seems to have propagated through both gcc[1] and llvm[2] headers but not to rust[3] headers or sse2neon[4].

[0] https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=ptest&techs=SSE_ALL&ig_expand=6902
[1] https://github.com/gcc-mirror/gcc/blob/27ce74fa23c93c1189c301993cd19ea766e6bdb5/gcc/config/i386/smmintrin.h#L94
[2] https://github.com/llvm/llvm-project/blob/70535f5e609f747c28cfef699eefb84581b0aac0/clang/lib/Headers/smmintrin.h#L1130
[3] https://github.com/rust-lang/stdarch/blob/f4528dd6e85d97bb802240d7cd048b6e1bf72540/crates/core_arch/src/x86/sse41.rs#L1149
[4] https://github.com/DLTcollab/sse2neon/blob/243e90f654193c97a691b1a53213d091e02eb631/sse2neon.h#L7595
  • Loading branch information
aqrit authored Dec 2, 2023
1 parent 318b559 commit aab64c5
Showing 1 changed file with 2 additions and 6 deletions.
8 changes: 2 additions & 6 deletions sse2neon.h
Original file line number Diff line number Diff line change
Expand Up @@ -7592,14 +7592,10 @@ FORCE_INLINE int _mm_test_all_zeros(__m128i a, __m128i mask)
// zero, otherwise set CF to 0. Return 1 if both the ZF and CF values are zero,
// otherwise return 0.
// https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=mm_test_mix_ones_zero
// Note: Argument names may be wrong in the Intel intrinsics guide.
FORCE_INLINE int _mm_test_mix_ones_zeros(__m128i a, __m128i mask)
{
uint64x2_t zf =
vandq_u64(vreinterpretq_u64_m128i(mask), vreinterpretq_u64_m128i(a));
uint64x2_t cf =
vbicq_u64(vreinterpretq_u64_m128i(mask), vreinterpretq_u64_m128i(a));
uint64x2_t result = vandq_u64(zf, cf);
return !(vgetq_lane_u64(result, 0) | vgetq_lane_u64(result, 1));
return !(_mm_testz_si128(a, mask) | _mm_testc_si128(a, mask));
}

// Compute the bitwise AND of 128 bits (representing integer data) in a and b,
Expand Down

0 comments on commit aab64c5

Please sign in to comment.