[InstCombine] fold `sub(zext(ptrtoint),zext(ptrtoint))` #115369

npanchen · 2024-11-07T20:25:34Z

On a 32-bit target if pointer arithmetic with addrspace is used in i64 computation, the missed folding in InstCombine results to suboptimal performance, unlike same code compiled for 64bit target

On a 32-bit target if pointer arithmetic is used in i64 computation, the missed folding in InstCombine results to suboptimal performance, unlike same code compiled for 64bit target

llvmbot · 2024-11-07T20:26:09Z

@llvm/pr-subscribers-llvm-transforms

Author: Nikolay Panchenko (npanchen)

Changes

On a 32-bit target if pointer arithmetic with addrspace is used in i64 computation, the missed folding in InstCombine results to suboptimal performance, unlike same code compiled for 64bit target

Full diff: https://github.com/llvm/llvm-project/pull/115369.diff

2 Files Affected:

(modified) llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp (+2-2)
(modified) llvm/test/Transforms/InstCombine/sub-gep.ll (+56)

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp b/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
index 21588aca512758..d931f89b9af421 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
@@ -2618,8 +2618,8 @@ Instruction *InstCombinerImpl::visitSub(BinaryOperator &I) {
   // Optimize pointer differences into the same array into a size.  Consider:
   //  &A[10] - &A[0]: we should compile this to "10".
   Value *LHSOp, *RHSOp;
-  if (match(Op0, m_PtrToInt(m_Value(LHSOp))) &&
-      match(Op1, m_PtrToInt(m_Value(RHSOp))))
+  if (match(Op0, m_ZExtOrSelf(m_PtrToInt(m_Value(LHSOp)))) &&
+      match(Op1, m_ZExtOrSelf(m_PtrToInt(m_Value(RHSOp)))))
     if (Value *Res = OptimizePointerDifference(LHSOp, RHSOp, I.getType(),
                                                I.hasNoUnsignedWrap()))
       return replaceInstUsesWith(I, Res);
diff --git a/llvm/test/Transforms/InstCombine/sub-gep.ll b/llvm/test/Transforms/InstCombine/sub-gep.ll
index b773d106b2c98a..ee18fd607823a5 100644
--- a/llvm/test/Transforms/InstCombine/sub-gep.ll
+++ b/llvm/test/Transforms/InstCombine/sub-gep.ll
@@ -270,6 +270,36 @@ define i64 @test25(ptr %P, i64 %A){
   ret i64 %G
 }
 
+define i64 @zext_ptrtoint_sub_ptrtoint(ptr %p, i32 %offset) {
+; CHECK-LABLE: @zext_ptrtoint_sub_ptrtoint(
+; CHECK-NEXT:  %1 = sext i32 %offset to i64
+; CHECK-NEXT:  %A = getelementptr bfloat, ptr @Arr, i64 %1
+; CHECK-NEXT:  %2 = ptrtoint ptr %A to i64
+; CHECK-NEXT:  %C = and i64 %2, 4294967294
+; CHECK-NEXT:  %D = sub i64 %C, ptrtoint (ptr @Arr to i64)
+; CHECK-NEXT:  ret i64 %D
+  %A = getelementptr bfloat, ptr @Arr, i32 %offset
+  %B = ptrtoint ptr %A to i32
+  %C = zext i32 %B to i64
+  %D = sub i64 %C, ptrtoint (ptr @Arr to i64)
+  ret i64 %D
+}
+
+define i64 @ptrtoint_sub_zext_ptrtoint(ptr %p, i32 %offset) {
+; CHECK-LABLE: @ptrtoint_sub_zext_ptrtoint(
+; CHECK-NEXT:  %1 = sext i32 %offset to i64
+; CHECK-NEXT:  %A = getelementptr bfloat, ptr @Arr, i64 %1
+; CHECK-NEXT:  %2 = ptrtoint ptr %A to i64
+; CHECK-NEXT:  %C = and i64 %2, 4294967294
+; CHECK-NEXT:  %D = sub i64 ptrtoint (ptr @Arr to i64), %C
+; CHECK-NEXT:  ret i64 %D
+  %A = getelementptr bfloat, ptr @Arr, i32 %offset
+  %B = ptrtoint ptr %A to i32
+  %C = zext i32 %B to i64
+  %D = sub i64 ptrtoint (ptr @Arr to i64), %C
+  ret i64 %D
+}
+
 @Arr_as1 = external addrspace(1) global [42 x i16]
 
 define i16 @test25_as1(ptr addrspace(1) %P, i64 %A) {
@@ -285,6 +315,32 @@ define i16 @test25_as1(ptr addrspace(1) %P, i64 %A) {
   ret i16 %G
 }
 
+define i64 @zext_ptrtoint_sub_ptrtoint_as1(ptr addrspace(1) %p, i32 %offset) {
+; CHECK-LABLE: @zext_ptrtoint_sub_ptrtoint_as1(
+; CHECK-NEXT:  %1 = trunc i32 %offset to i16
+; CHECK-NEXT:  %A.idx = shl i16 %1, 1
+; CHECK-NEXT:  %D = sext i16 %A.idx to i64
+; CHECK-NEXT:  ret i64 %D
+  %A = getelementptr bfloat, ptr addrspace(1) @Arr_as1, i32 %offset
+  %B = ptrtoint ptr addrspace(1) %A to i32
+  %C = zext i32 %B to i64
+  %D = sub i64 %C, ptrtoint (ptr addrspace(1) @Arr_as1 to i64)
+  ret i64 %D
+}
+
+define i64 @ptrtoint_sub_zext_ptrtoint_as1(ptr addrspace(1) %p, i32 %offset) {
+; CHECK-LABLE: @ptrtoint_sub_zext_ptrtoint_as1(
+; CHECK-NEXT:  %1 = trunc i32 %offset to i16
+; CHECK-NEXT:  %A.idx.neg = mul i16 %1, -2
+; CHECK-NEXT:  %D = sext i16 %A.idx.neg to i64
+; CHECK-NEXT:  ret i64 %D
+  %A = getelementptr bfloat, ptr addrspace(1) @Arr_as1, i32 %offset
+  %B = ptrtoint ptr addrspace(1) %A to i32
+  %C = zext i32 %B to i64
+  %D = sub i64 ptrtoint (ptr addrspace(1) @Arr_as1 to i64), %C
+  ret i64 %D
+}
+
 define i64 @test30(ptr %foo, i64 %i, i64 %j) {
 ; CHECK-LABEL: @test30(
 ; CHECK-NEXT:    [[GEP1_IDX:%.*]] = shl nsw i64 [[I:%.*]], 2

nikic

This optimization looks invalid to me. The claim you're making here is essentially this? https://alive2.llvm.org/ce/z/vzy3qB

npanchen · 2024-11-08T14:48:41Z

This optimization looks invalid to me. The claim you're making here is essentially this? https://alive2.llvm.org/ce/z/vzy3qB

thanks, I totally missed the obvious case. Seems like I need to pattern-match exact case we have: sub(zext(ptrtoint(gep(A, B))), ptrtoint(A)) with some constraint on

npanchen · 2024-11-08T22:25:38Z

@nikic please take a look again. The pattern should be similar to https://alive2.llvm.org/ce/z/v4Pyqg

npanchen · 2024-11-13T00:51:10Z

@nikic gentle ping

nikic · 2024-11-13T21:52:06Z

I believe you're doing this now, which is still incorrect: https://alive2.llvm.org/ce/z/irt9qk

Both of these variant work though: https://alive2.llvm.org/ce/z/tdpQth (I generalized inbounds -> nusw in the first one.)

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp

npanchen · 2024-11-13T22:23:25Z

I believe you're doing this now, which is still incorrect: https://alive2.llvm.org/ce/z/irt9qk

Argh... I misread nusw as nuw + nsw in gep.

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp

nikic

The implementation looks good to me, some notes on the tests.

llvm/test/Transforms/InstCombine/sub-gep.ll

[InstCombine] fold sub(zext(ptrtoint),zext(ptrtoint))

e360e21

On a 32-bit target if pointer arithmetic is used in i64 computation, the missed folding in InstCombine results to suboptimal performance, unlike same code compiled for 64bit target

npanchen requested a review from nikic as a code owner November 7, 2024 20:25

llvmbot added the llvm:transforms label Nov 7, 2024

nikic requested changes Nov 7, 2024

View reviewed changes

Changed pattern matching

5c692fd

npanchen requested a review from nikic November 8, 2024 22:25

nikic reviewed Nov 13, 2024

View reviewed changes

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp Outdated Show resolved Hide resolved

nikic requested review from dtcxzyw and goldsteinn November 13, 2024 21:53

nikic reviewed Nov 13, 2024

View reviewed changes

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp Outdated Show resolved Hide resolved

Addressed comments

a2b3e03

llvmbot added the llvm:instcombine label Nov 13, 2024

npanchen requested a review from nikic November 13, 2024 23:15

dtcxzyw mentioned this pull request Nov 14, 2024

Fuzz PR115369 dtcxzyw/llvm-mutation-based-fuzz-service#2

Closed

dtcxzyw reviewed Nov 14, 2024

View reviewed changes

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp Outdated Show resolved Hide resolved

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp Outdated Show resolved Hide resolved

dtcxzyw reviewed Nov 14, 2024

View reviewed changes

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp Show resolved Hide resolved

Addressed comments

fa83e47

npanchen requested a review from dtcxzyw November 14, 2024 02:45

nikic reviewed Nov 14, 2024

View reviewed changes

llvm/test/Transforms/InstCombine/sub-gep.ll Show resolved Hide resolved

llvm/test/Transforms/InstCombine/sub-gep.ll Show resolved Hide resolved

llvm/test/Transforms/InstCombine/sub-gep.ll Outdated Show resolved Hide resolved

llvm/test/Transforms/InstCombine/sub-gep.ll Show resolved Hide resolved

npanchen added 2 commits November 14, 2024 17:04

Updated test

11090ab

Added identical test that use non-global ptr

987bfd9

npanchen requested a review from nikic November 15, 2024 01:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[InstCombine] fold `sub(zext(ptrtoint),zext(ptrtoint))` #115369

[InstCombine] fold `sub(zext(ptrtoint),zext(ptrtoint))` #115369

npanchen commented Nov 7, 2024

llvmbot commented Nov 7, 2024

nikic left a comment

npanchen commented Nov 8, 2024

npanchen commented Nov 8, 2024

npanchen commented Nov 13, 2024

nikic commented Nov 13, 2024

npanchen commented Nov 13, 2024

nikic left a comment

[InstCombine] fold sub(zext(ptrtoint),zext(ptrtoint)) #115369

Are you sure you want to change the base?

[InstCombine] fold sub(zext(ptrtoint),zext(ptrtoint)) #115369

Conversation

npanchen commented Nov 7, 2024

llvmbot commented Nov 7, 2024

nikic left a comment

Choose a reason for hiding this comment

npanchen commented Nov 8, 2024

npanchen commented Nov 8, 2024

npanchen commented Nov 13, 2024

nikic commented Nov 13, 2024

npanchen commented Nov 13, 2024

nikic left a comment

Choose a reason for hiding this comment

[InstCombine] fold `sub(zext(ptrtoint),zext(ptrtoint))` #115369

[InstCombine] fold `sub(zext(ptrtoint),zext(ptrtoint))` #115369