Frame lowering optimizations #1

nikolaperic · 2022-11-03T16:53:31Z

Enable save/restore generation when there is a gap between registers by inserting the missing ones at the moment when spill slots for callee-saved registers are assigned.

llvm/lib/Target/Mips/MipsSEFrameLowering.cpp

nikolaperic · 2022-12-15T08:57:10Z

Thanks for the suggestions

nikolaperic · 2023-03-13T14:52:48Z

Added two new commits that address other issues, but that rely on changes from the previous save/restore optimization commit.

In the first newly added commit, two-step SP setup is implemented if offset is greater than 4096. This fixes FrameLowering for large offsets.

In the second newly added commit, the FP setup is implemented so that if function wants to use it, FP will point to an address at -4096 from the beginning of function's stack.

djtodoro · 2023-03-22T11:07:01Z

@nikolaperic Please set the target branch to nanomips-llvm13-submission.

Enable save/restore generation when there is a gap between registers by inserting the missing ones at the moment when spill slots for callee-saved registers are assigned.

Adjust stack pointer in two steps if offset is larger than 4096. In the first step adjust stack pointer for the size necessary to spill CSR onto the stack. In the second step adjust stack pointer for the size necessary to spill local objects.

If function uses FP, it will now point to address -4096 from the beginning of function's stack. After FP setup following offsets will be relative to SP if function has no var-sized objects. If it has var-sized objects offsets will be relative to FP. Also, stack realignment now happens with INS instruction.

llvm/lib/Target/Mips/MipsSEFrameLowering.cpp

cme · 2023-03-21T12:37:26Z

llvm/lib/Target/Mips/MipsSEFrameLowering.cpp

+    // Move MBBI_2 to point to the first instruction after
+    // calle-saves store sequence. That's the place for the second
+    // steck pointer adjustment.
+    std::advance(MBBI_2, NumOfCSI);


This is assuming that the prologue definitely has one instruction per callee save, and that they're right here? That might be the case but in future where save/restore are generated might invalidate this. It would be a good idea to at least assert that the instructions are what we assume them to be and document that assumption.

Yes, I think that's the safer way. However, in order to do that I think we need to introduce a new function which checks opcodes and operands of skipped instructions. This function will be nanomips-specific so I think it's not a good idea to write it inside MipsSEFrameLowering. Maybe a good place to write such function is inside MipsFunctionInfo since we have already put CalleeSavedStackSize variable there? Is this a good approach for this problem?

Note that on the line 452 we already had this kind of skipping CSR store instructions.

llvm/lib/Target/Mips/MipsSEFrameLowering.cpp

@test

This reverts commit 5fb4134. This patch is causing crashes when building llvm-test-suite when optimizing for CPUs with AVX512. Reproducer crashing with llc: target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-apple-macosx" define i32 @test(<32 x i32> %0) #0 { entry: %1 = mul <32 x i32> %0, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1> %2 = tail call i32 @llvm.vector.reduce.add.v32i32(<32 x i32> %1) ret i32 %2 } ; Function Attrs: nocallback nofree nosync nounwind readnone willreturn declare i32 @llvm.vector.reduce.add.v32i32(<32 x i32>) #1 attributes #0 = { "min-legal-vector-width"="0" "target-cpu"="skylake-avx512" } attributes #1 = { nocallback nofree nosync nounwind readnone willreturn } (cherry picked from commit f912bab)

We experienced some deadlocks when we used multiple threads for logging using `scan-builds` intercept-build tool when we used multiple threads by e.g. logging `make -j16` ``` (gdb) bt #0 0x00007f2bb3aff110 in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007f2bb3af70a3 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x00007f2bb3d152e4 in ?? () #3 0x00007ffcc5f0cc80 in ?? () #4 0x00007f2bb3d2bf5b in ?? () from /lib64/ld-linux-x86-64.so.2 #5 0x00007f2bb3b5da27 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #6 0x00007f2bb3b5dbe0 in exit () from /lib/x86_64-linux-gnu/libc.so.6 #7 0x00007f2bb3d144ee in ?? () #8 0x746e692f706d742f in ?? () #9 0x692d747065637265 in ?? () #10 0x2f653631326b3034 in ?? () #11 0x646d632e35353532 in ?? () #12 0x0000000000000000 in ?? () ``` I think the gcc's exit call caused the injected `libear.so` to be unloaded by the `ld`, which in turn called the `void on_unload() __attribute__((destructor))`. That tried to acquire an already locked mutex which was left locked in the `bear_report_call()` call, that probably encountered some error and returned early when it forgot to unlock the mutex. All of these are speculation since from the backtrace I could not verify if frames 2 and 3 are in fact corresponding to the `libear.so` module. But I think it's a fairly safe bet. So, hereby I'm releasing the held mutex on *all paths*, even if some failure happens. PS: I would use lock_guards, but it's C. Reviewed-by: NoQ Differential Revision: https://reviews.llvm.org/D118439 (cherry picked from commit d919d02)

Change https://reviews.llvm.org/D140059 exposed the following crash in Z3Solver, where bit widths were not checked consistently with that change. This change makes the check consistent, and fixes the crash. ``` clang: <root>/llvm/include/llvm/ADT/APSInt.h:99: int64_t llvm::APSInt::getExtValue() const: Assertion `isRepresentableByInt64() && "Too many bits for int64_t"' failed. ... Stack dump: 0. Program arguments: clang -cc1 -internal-isystem <root>/lib/clang/16/include -nostdsysteminc -analyze -analyzer-checker=core,unix.Malloc,debug.ExprInspection -analyzer-config crosscheck-with-z3=true -verify reproducer.c #0 0x00000000045b3476 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) <root>/llvm/lib/Support/Unix/Signals.inc:567:22 #1 0x00000000045b3862 PrintStackTraceSignalHandler(void*) <root>/llvm/lib/Support/Unix/Signals.inc:641:1 #2 0x00000000045b14a5 llvm::sys::RunSignalHandlers() <root>/llvm/lib/Support/Signals.cpp:104:20 #3 0x00000000045b2eb4 SignalHandler(int) <root>/llvm/lib/Support/Unix/Signals.inc:412:1 ... #9 0x0000000004be2eb3 llvm::APSInt::getExtValue() const <root>/llvm/include/llvm/ADT/APSInt.h:99:5 <root>/llvm/lib/Support/Z3Solver.cpp:740:53 clang::ASTContext&, clang::ento::SymExpr const*, llvm::APSInt const&, llvm::APSInt const&, bool) <root>/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SMTConv.h:552:61 ``` Reviewed By: steakhal Differential Revision: https://reviews.llvm.org/D142627 (cherry picked from commit f027dd5)

djtodoro · 2024-06-04T08:24:19Z

@nikolaperic I think we picked different approach for this, right? Can we close this (and make sure we do have it in LLVM 16)?

draganmladjenovic requested a review from cme December 7, 2022 15:08

cme reviewed Dec 11, 2022

View reviewed changes

llvm/lib/Target/Mips/MipsSEFrameLowering.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/Mips/MipsSEFrameLowering.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/Mips/MipsSEFrameLowering.cpp Outdated Show resolved Hide resolved

nikolaperic force-pushed the save-restore-opt branch from 470dbc5 to 1af2000 Compare December 15, 2022 09:12

nikolaperic changed the title ~~Enable save/restore generation when there is a gap between registers~~ Optimization of save/restore generation; Two-step stack pointer setup; Frame pointer setup Mar 13, 2023

nikolaperic changed the title ~~Optimization of save/restore generation; Two-step stack pointer setup; Frame pointer setup~~ Frame lowering optimizations Mar 13, 2023

nikolaperic force-pushed the save-restore-opt branch from 0711341 to 7213c41 Compare March 13, 2023 16:16

nikolaperic changed the base branch from nanomips-llvm13 to nanomips-llvm13-submission March 22, 2023 12:14

Nikola Peric added 3 commits April 12, 2023 11:36

Enable save/restore generation when there is a gap between registers

b810f28

Enable save/restore generation when there is a gap between registers by inserting the missing ones at the moment when spill slots for callee-saved registers are assigned.

NanoMips: Two-step stack setup

9a56b55

Adjust stack pointer in two steps if offset is larger than 4096. In the first step adjust stack pointer for the size necessary to spill CSR onto the stack. In the second step adjust stack pointer for the size necessary to spill local objects.

nikolaperic force-pushed the save-restore-opt branch from 7213c41 to fc5fa13 Compare April 12, 2023 09:39

cme reviewed Jun 28, 2023

View reviewed changes

Merged same if instructions

b905942

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Frame lowering optimizations #1

Frame lowering optimizations #1

nikolaperic commented Nov 3, 2022

nikolaperic commented Dec 15, 2022

nikolaperic commented Mar 13, 2023

djtodoro commented Mar 22, 2023

cme Mar 21, 2023

nikolaperic Jul 26, 2023 •

edited

Loading

djtodoro commented Jun 4, 2024

Frame lowering optimizations #1

Are you sure you want to change the base?

Frame lowering optimizations #1

Conversation

nikolaperic commented Nov 3, 2022

nikolaperic commented Dec 15, 2022

nikolaperic commented Mar 13, 2023

djtodoro commented Mar 22, 2023

cme Mar 21, 2023

Choose a reason for hiding this comment

nikolaperic Jul 26, 2023 • edited Loading

Choose a reason for hiding this comment

djtodoro commented Jun 4, 2024

nikolaperic Jul 26, 2023 •

edited

Loading