-
Notifications
You must be signed in to change notification settings - Fork 12k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[InstCombine] fold sub(zext(ptrtoint),zext(ptrtoint))
#115369
base: main
Are you sure you want to change the base?
[InstCombine] fold sub(zext(ptrtoint),zext(ptrtoint))
#115369
Conversation
On a 32-bit target if pointer arithmetic is used in i64 computation, the missed folding in InstCombine results to suboptimal performance, unlike same code compiled for 64bit target
@llvm/pr-subscribers-llvm-transforms Author: Nikolay Panchenko (npanchen) ChangesOn a 32-bit target if pointer arithmetic with Full diff: https://github.com/llvm/llvm-project/pull/115369.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp b/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
index 21588aca512758..d931f89b9af421 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
@@ -2618,8 +2618,8 @@ Instruction *InstCombinerImpl::visitSub(BinaryOperator &I) {
// Optimize pointer differences into the same array into a size. Consider:
// &A[10] - &A[0]: we should compile this to "10".
Value *LHSOp, *RHSOp;
- if (match(Op0, m_PtrToInt(m_Value(LHSOp))) &&
- match(Op1, m_PtrToInt(m_Value(RHSOp))))
+ if (match(Op0, m_ZExtOrSelf(m_PtrToInt(m_Value(LHSOp)))) &&
+ match(Op1, m_ZExtOrSelf(m_PtrToInt(m_Value(RHSOp)))))
if (Value *Res = OptimizePointerDifference(LHSOp, RHSOp, I.getType(),
I.hasNoUnsignedWrap()))
return replaceInstUsesWith(I, Res);
diff --git a/llvm/test/Transforms/InstCombine/sub-gep.ll b/llvm/test/Transforms/InstCombine/sub-gep.ll
index b773d106b2c98a..ee18fd607823a5 100644
--- a/llvm/test/Transforms/InstCombine/sub-gep.ll
+++ b/llvm/test/Transforms/InstCombine/sub-gep.ll
@@ -270,6 +270,36 @@ define i64 @test25(ptr %P, i64 %A){
ret i64 %G
}
+define i64 @zext_ptrtoint_sub_ptrtoint(ptr %p, i32 %offset) {
+; CHECK-LABLE: @zext_ptrtoint_sub_ptrtoint(
+; CHECK-NEXT: %1 = sext i32 %offset to i64
+; CHECK-NEXT: %A = getelementptr bfloat, ptr @Arr, i64 %1
+; CHECK-NEXT: %2 = ptrtoint ptr %A to i64
+; CHECK-NEXT: %C = and i64 %2, 4294967294
+; CHECK-NEXT: %D = sub i64 %C, ptrtoint (ptr @Arr to i64)
+; CHECK-NEXT: ret i64 %D
+ %A = getelementptr bfloat, ptr @Arr, i32 %offset
+ %B = ptrtoint ptr %A to i32
+ %C = zext i32 %B to i64
+ %D = sub i64 %C, ptrtoint (ptr @Arr to i64)
+ ret i64 %D
+}
+
+define i64 @ptrtoint_sub_zext_ptrtoint(ptr %p, i32 %offset) {
+; CHECK-LABLE: @ptrtoint_sub_zext_ptrtoint(
+; CHECK-NEXT: %1 = sext i32 %offset to i64
+; CHECK-NEXT: %A = getelementptr bfloat, ptr @Arr, i64 %1
+; CHECK-NEXT: %2 = ptrtoint ptr %A to i64
+; CHECK-NEXT: %C = and i64 %2, 4294967294
+; CHECK-NEXT: %D = sub i64 ptrtoint (ptr @Arr to i64), %C
+; CHECK-NEXT: ret i64 %D
+ %A = getelementptr bfloat, ptr @Arr, i32 %offset
+ %B = ptrtoint ptr %A to i32
+ %C = zext i32 %B to i64
+ %D = sub i64 ptrtoint (ptr @Arr to i64), %C
+ ret i64 %D
+}
+
@Arr_as1 = external addrspace(1) global [42 x i16]
define i16 @test25_as1(ptr addrspace(1) %P, i64 %A) {
@@ -285,6 +315,32 @@ define i16 @test25_as1(ptr addrspace(1) %P, i64 %A) {
ret i16 %G
}
+define i64 @zext_ptrtoint_sub_ptrtoint_as1(ptr addrspace(1) %p, i32 %offset) {
+; CHECK-LABLE: @zext_ptrtoint_sub_ptrtoint_as1(
+; CHECK-NEXT: %1 = trunc i32 %offset to i16
+; CHECK-NEXT: %A.idx = shl i16 %1, 1
+; CHECK-NEXT: %D = sext i16 %A.idx to i64
+; CHECK-NEXT: ret i64 %D
+ %A = getelementptr bfloat, ptr addrspace(1) @Arr_as1, i32 %offset
+ %B = ptrtoint ptr addrspace(1) %A to i32
+ %C = zext i32 %B to i64
+ %D = sub i64 %C, ptrtoint (ptr addrspace(1) @Arr_as1 to i64)
+ ret i64 %D
+}
+
+define i64 @ptrtoint_sub_zext_ptrtoint_as1(ptr addrspace(1) %p, i32 %offset) {
+; CHECK-LABLE: @ptrtoint_sub_zext_ptrtoint_as1(
+; CHECK-NEXT: %1 = trunc i32 %offset to i16
+; CHECK-NEXT: %A.idx.neg = mul i16 %1, -2
+; CHECK-NEXT: %D = sext i16 %A.idx.neg to i64
+; CHECK-NEXT: ret i64 %D
+ %A = getelementptr bfloat, ptr addrspace(1) @Arr_as1, i32 %offset
+ %B = ptrtoint ptr addrspace(1) %A to i32
+ %C = zext i32 %B to i64
+ %D = sub i64 ptrtoint (ptr addrspace(1) @Arr_as1 to i64), %C
+ ret i64 %D
+}
+
define i64 @test30(ptr %foo, i64 %i, i64 %j) {
; CHECK-LABEL: @test30(
; CHECK-NEXT: [[GEP1_IDX:%.*]] = shl nsw i64 [[I:%.*]], 2
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This optimization looks invalid to me. The claim you're making here is essentially this? https://alive2.llvm.org/ce/z/vzy3qB
thanks, I totally missed the obvious case. Seems like I need to pattern-match exact case we have: |
@nikic please take a look again. The pattern should be similar to https://alive2.llvm.org/ce/z/v4Pyqg |
@nikic gentle ping |
I believe you're doing this now, which is still incorrect: https://alive2.llvm.org/ce/z/irt9qk Both of these variant work though: https://alive2.llvm.org/ce/z/tdpQth (I generalized inbounds -> nusw in the first one.) |
Argh... I misread |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation looks good to me, some notes on the tests.
On a 32-bit target if pointer arithmetic with
addrspace
is used in i64 computation, the missed folding in InstCombine results to suboptimal performance, unlike same code compiled for 64bit target