Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[InstCombine] fold sub(zext(ptrtoint),zext(ptrtoint)) #115369

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

npanchen
Copy link
Contributor

@npanchen npanchen commented Nov 7, 2024

On a 32-bit target if pointer arithmetic with addrspace is used in i64 computation, the missed folding in InstCombine results to suboptimal performance, unlike same code compiled for 64bit target

On a 32-bit target if pointer arithmetic is used in i64 computation,
the missed folding in InstCombine results to suboptimal performance,
unlike same code compiled for 64bit target
@llvmbot
Copy link

llvmbot commented Nov 7, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Nikolay Panchenko (npanchen)

Changes

On a 32-bit target if pointer arithmetic with addrspace is used in i64 computation, the missed folding in InstCombine results to suboptimal performance, unlike same code compiled for 64bit target


Full diff: https://github.com/llvm/llvm-project/pull/115369.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp (+2-2)
  • (modified) llvm/test/Transforms/InstCombine/sub-gep.ll (+56)
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp b/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
index 21588aca512758..d931f89b9af421 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
@@ -2618,8 +2618,8 @@ Instruction *InstCombinerImpl::visitSub(BinaryOperator &I) {
   // Optimize pointer differences into the same array into a size.  Consider:
   //  &A[10] - &A[0]: we should compile this to "10".
   Value *LHSOp, *RHSOp;
-  if (match(Op0, m_PtrToInt(m_Value(LHSOp))) &&
-      match(Op1, m_PtrToInt(m_Value(RHSOp))))
+  if (match(Op0, m_ZExtOrSelf(m_PtrToInt(m_Value(LHSOp)))) &&
+      match(Op1, m_ZExtOrSelf(m_PtrToInt(m_Value(RHSOp)))))
     if (Value *Res = OptimizePointerDifference(LHSOp, RHSOp, I.getType(),
                                                I.hasNoUnsignedWrap()))
       return replaceInstUsesWith(I, Res);
diff --git a/llvm/test/Transforms/InstCombine/sub-gep.ll b/llvm/test/Transforms/InstCombine/sub-gep.ll
index b773d106b2c98a..ee18fd607823a5 100644
--- a/llvm/test/Transforms/InstCombine/sub-gep.ll
+++ b/llvm/test/Transforms/InstCombine/sub-gep.ll
@@ -270,6 +270,36 @@ define i64 @test25(ptr %P, i64 %A){
   ret i64 %G
 }
 
+define i64 @zext_ptrtoint_sub_ptrtoint(ptr %p, i32 %offset) {
+; CHECK-LABLE: @zext_ptrtoint_sub_ptrtoint(
+; CHECK-NEXT:  %1 = sext i32 %offset to i64
+; CHECK-NEXT:  %A = getelementptr bfloat, ptr @Arr, i64 %1
+; CHECK-NEXT:  %2 = ptrtoint ptr %A to i64
+; CHECK-NEXT:  %C = and i64 %2, 4294967294
+; CHECK-NEXT:  %D = sub i64 %C, ptrtoint (ptr @Arr to i64)
+; CHECK-NEXT:  ret i64 %D
+  %A = getelementptr bfloat, ptr @Arr, i32 %offset
+  %B = ptrtoint ptr %A to i32
+  %C = zext i32 %B to i64
+  %D = sub i64 %C, ptrtoint (ptr @Arr to i64)
+  ret i64 %D
+}
+
+define i64 @ptrtoint_sub_zext_ptrtoint(ptr %p, i32 %offset) {
+; CHECK-LABLE: @ptrtoint_sub_zext_ptrtoint(
+; CHECK-NEXT:  %1 = sext i32 %offset to i64
+; CHECK-NEXT:  %A = getelementptr bfloat, ptr @Arr, i64 %1
+; CHECK-NEXT:  %2 = ptrtoint ptr %A to i64
+; CHECK-NEXT:  %C = and i64 %2, 4294967294
+; CHECK-NEXT:  %D = sub i64 ptrtoint (ptr @Arr to i64), %C
+; CHECK-NEXT:  ret i64 %D
+  %A = getelementptr bfloat, ptr @Arr, i32 %offset
+  %B = ptrtoint ptr %A to i32
+  %C = zext i32 %B to i64
+  %D = sub i64 ptrtoint (ptr @Arr to i64), %C
+  ret i64 %D
+}
+
 @Arr_as1 = external addrspace(1) global [42 x i16]
 
 define i16 @test25_as1(ptr addrspace(1) %P, i64 %A) {
@@ -285,6 +315,32 @@ define i16 @test25_as1(ptr addrspace(1) %P, i64 %A) {
   ret i16 %G
 }
 
+define i64 @zext_ptrtoint_sub_ptrtoint_as1(ptr addrspace(1) %p, i32 %offset) {
+; CHECK-LABLE: @zext_ptrtoint_sub_ptrtoint_as1(
+; CHECK-NEXT:  %1 = trunc i32 %offset to i16
+; CHECK-NEXT:  %A.idx = shl i16 %1, 1
+; CHECK-NEXT:  %D = sext i16 %A.idx to i64
+; CHECK-NEXT:  ret i64 %D
+  %A = getelementptr bfloat, ptr addrspace(1) @Arr_as1, i32 %offset
+  %B = ptrtoint ptr addrspace(1) %A to i32
+  %C = zext i32 %B to i64
+  %D = sub i64 %C, ptrtoint (ptr addrspace(1) @Arr_as1 to i64)
+  ret i64 %D
+}
+
+define i64 @ptrtoint_sub_zext_ptrtoint_as1(ptr addrspace(1) %p, i32 %offset) {
+; CHECK-LABLE: @ptrtoint_sub_zext_ptrtoint_as1(
+; CHECK-NEXT:  %1 = trunc i32 %offset to i16
+; CHECK-NEXT:  %A.idx.neg = mul i16 %1, -2
+; CHECK-NEXT:  %D = sext i16 %A.idx.neg to i64
+; CHECK-NEXT:  ret i64 %D
+  %A = getelementptr bfloat, ptr addrspace(1) @Arr_as1, i32 %offset
+  %B = ptrtoint ptr addrspace(1) %A to i32
+  %C = zext i32 %B to i64
+  %D = sub i64 ptrtoint (ptr addrspace(1) @Arr_as1 to i64), %C
+  ret i64 %D
+}
+
 define i64 @test30(ptr %foo, i64 %i, i64 %j) {
 ; CHECK-LABEL: @test30(
 ; CHECK-NEXT:    [[GEP1_IDX:%.*]] = shl nsw i64 [[I:%.*]], 2

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This optimization looks invalid to me. The claim you're making here is essentially this? https://alive2.llvm.org/ce/z/vzy3qB

@npanchen
Copy link
Contributor Author

npanchen commented Nov 8, 2024

This optimization looks invalid to me. The claim you're making here is essentially this? https://alive2.llvm.org/ce/z/vzy3qB

thanks, I totally missed the obvious case. Seems like I need to pattern-match exact case we have: sub(zext(ptrtoint(gep(A, B))), ptrtoint(A)) with some constraint on

@npanchen npanchen requested a review from nikic November 8, 2024 22:25
@npanchen
Copy link
Contributor Author

npanchen commented Nov 8, 2024

@nikic please take a look again. The pattern should be similar to https://alive2.llvm.org/ce/z/v4Pyqg

@npanchen
Copy link
Contributor Author

@nikic gentle ping

@nikic
Copy link
Contributor

nikic commented Nov 13, 2024

I believe you're doing this now, which is still incorrect: https://alive2.llvm.org/ce/z/irt9qk

Both of these variant work though: https://alive2.llvm.org/ce/z/tdpQth (I generalized inbounds -> nusw in the first one.)

@npanchen
Copy link
Contributor Author

I believe you're doing this now, which is still incorrect: https://alive2.llvm.org/ce/z/irt9qk

Argh... I misread nusw as nuw + nsw in gep.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks good to me, some notes on the tests.

llvm/test/Transforms/InstCombine/sub-gep.ll Show resolved Hide resolved
llvm/test/Transforms/InstCombine/sub-gep.ll Show resolved Hide resolved
llvm/test/Transforms/InstCombine/sub-gep.ll Outdated Show resolved Hide resolved
llvm/test/Transforms/InstCombine/sub-gep.ll Show resolved Hide resolved
@npanchen npanchen requested a review from nikic November 15, 2024 01:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants