-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Frame lowering optimizations #1
base: nanomips-llvm13-submission
Are you sure you want to change the base?
Frame lowering optimizations #1
Conversation
Thanks for the suggestions |
470dbc5
to
1af2000
Compare
Added two new commits that address other issues, but that rely on changes from the previous save/restore optimization commit. In the first newly added commit, two-step SP setup is implemented if offset is greater than 4096. This fixes FrameLowering for large offsets. In the second newly added commit, the FP setup is implemented so that if function wants to use it, FP will point to an address at -4096 from the beginning of function's stack. |
0711341
to
7213c41
Compare
@nikolaperic Please set the target branch to |
Enable save/restore generation when there is a gap between registers by inserting the missing ones at the moment when spill slots for callee-saved registers are assigned.
Adjust stack pointer in two steps if offset is larger than 4096. In the first step adjust stack pointer for the size necessary to spill CSR onto the stack. In the second step adjust stack pointer for the size necessary to spill local objects.
If function uses FP, it will now point to address -4096 from the beginning of function's stack. After FP setup following offsets will be relative to SP if function has no var-sized objects. If it has var-sized objects offsets will be relative to FP. Also, stack realignment now happens with INS instruction.
7213c41
to
fc5fa13
Compare
// Move MBBI_2 to point to the first instruction after | ||
// calle-saves store sequence. That's the place for the second | ||
// steck pointer adjustment. | ||
std::advance(MBBI_2, NumOfCSI); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is assuming that the prologue definitely has one instruction per callee save, and that they're right here? That might be the case but in future where save/restore are generated might invalidate this. It would be a good idea to at least assert
that the instructions are what we assume them to be and document that assumption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that's the safer way. However, in order to do that I think we need to introduce a new function which checks opcodes and operands of skipped instructions. This function will be nanomips-specific so I think it's not a good idea to write it inside MipsSEFrameLowering
. Maybe a good place to write such function is inside MipsFunctionInfo
since we have already put CalleeSavedStackSize
variable there? Is this a good approach for this problem?
Note that on the line 452 we already had this kind of skipping CSR store instructions.
This reverts commit 5fb4134. This patch is causing crashes when building llvm-test-suite when optimizing for CPUs with AVX512. Reproducer crashing with llc: target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-apple-macosx" define i32 @test(<32 x i32> %0) #0 { entry: %1 = mul <32 x i32> %0, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1> %2 = tail call i32 @llvm.vector.reduce.add.v32i32(<32 x i32> %1) ret i32 %2 } ; Function Attrs: nocallback nofree nosync nounwind readnone willreturn declare i32 @llvm.vector.reduce.add.v32i32(<32 x i32>) #1 attributes #0 = { "min-legal-vector-width"="0" "target-cpu"="skylake-avx512" } attributes #1 = { nocallback nofree nosync nounwind readnone willreturn } (cherry picked from commit f912bab)
We experienced some deadlocks when we used multiple threads for logging using `scan-builds` intercept-build tool when we used multiple threads by e.g. logging `make -j16` ``` (gdb) bt #0 0x00007f2bb3aff110 in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007f2bb3af70a3 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x00007f2bb3d152e4 in ?? () #3 0x00007ffcc5f0cc80 in ?? () #4 0x00007f2bb3d2bf5b in ?? () from /lib64/ld-linux-x86-64.so.2 #5 0x00007f2bb3b5da27 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #6 0x00007f2bb3b5dbe0 in exit () from /lib/x86_64-linux-gnu/libc.so.6 #7 0x00007f2bb3d144ee in ?? () #8 0x746e692f706d742f in ?? () #9 0x692d747065637265 in ?? () #10 0x2f653631326b3034 in ?? () #11 0x646d632e35353532 in ?? () #12 0x0000000000000000 in ?? () ``` I think the gcc's exit call caused the injected `libear.so` to be unloaded by the `ld`, which in turn called the `void on_unload() __attribute__((destructor))`. That tried to acquire an already locked mutex which was left locked in the `bear_report_call()` call, that probably encountered some error and returned early when it forgot to unlock the mutex. All of these are speculation since from the backtrace I could not verify if frames 2 and 3 are in fact corresponding to the `libear.so` module. But I think it's a fairly safe bet. So, hereby I'm releasing the held mutex on *all paths*, even if some failure happens. PS: I would use lock_guards, but it's C. Reviewed-by: NoQ Differential Revision: https://reviews.llvm.org/D118439 (cherry picked from commit d919d02)
Change https://reviews.llvm.org/D140059 exposed the following crash in Z3Solver, where bit widths were not checked consistently with that change. This change makes the check consistent, and fixes the crash. ``` clang: <root>/llvm/include/llvm/ADT/APSInt.h:99: int64_t llvm::APSInt::getExtValue() const: Assertion `isRepresentableByInt64() && "Too many bits for int64_t"' failed. ... Stack dump: 0. Program arguments: clang -cc1 -internal-isystem <root>/lib/clang/16/include -nostdsysteminc -analyze -analyzer-checker=core,unix.Malloc,debug.ExprInspection -analyzer-config crosscheck-with-z3=true -verify reproducer.c #0 0x00000000045b3476 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) <root>/llvm/lib/Support/Unix/Signals.inc:567:22 #1 0x00000000045b3862 PrintStackTraceSignalHandler(void*) <root>/llvm/lib/Support/Unix/Signals.inc:641:1 #2 0x00000000045b14a5 llvm::sys::RunSignalHandlers() <root>/llvm/lib/Support/Signals.cpp:104:20 #3 0x00000000045b2eb4 SignalHandler(int) <root>/llvm/lib/Support/Unix/Signals.inc:412:1 ... #9 0x0000000004be2eb3 llvm::APSInt::getExtValue() const <root>/llvm/include/llvm/ADT/APSInt.h:99:5 <root>/llvm/lib/Support/Z3Solver.cpp:740:53 clang::ASTContext&, clang::ento::SymExpr const*, llvm::APSInt const&, llvm::APSInt const&, bool) <root>/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SMTConv.h:552:61 ``` Reviewed By: steakhal Differential Revision: https://reviews.llvm.org/D142627 (cherry picked from commit f027dd5)
@nikolaperic I think we picked different approach for this, right? Can we close this (and make sure we do have it in LLVM 16)? |
Enable save/restore generation when there is a gap between registers by inserting the missing ones at the moment when spill slots for callee-saved registers are assigned.