Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frame lowering optimizations #1

Open
wants to merge 4 commits into
base: nanomips-llvm13-submission
Choose a base branch
from

Conversation

nikolaperic
Copy link
Collaborator

Enable save/restore generation when there is a gap between registers by inserting the missing ones at the moment when spill slots for callee-saved registers are assigned.

llvm/lib/Target/Mips/MipsSEFrameLowering.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/Mips/MipsSEFrameLowering.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/Mips/MipsSEFrameLowering.cpp Outdated Show resolved Hide resolved
@nikolaperic
Copy link
Collaborator Author

Thanks for the suggestions

@nikolaperic nikolaperic changed the title Enable save/restore generation when there is a gap between registers Optimization of save/restore generation; Two-step stack pointer setup; Frame pointer setup Mar 13, 2023
@nikolaperic
Copy link
Collaborator Author

Added two new commits that address other issues, but that rely on changes from the previous save/restore optimization commit.

In the first newly added commit, two-step SP setup is implemented if offset is greater than 4096. This fixes FrameLowering for large offsets.

In the second newly added commit, the FP setup is implemented so that if function wants to use it, FP will point to an address at -4096 from the beginning of function's stack.

@nikolaperic nikolaperic changed the title Optimization of save/restore generation; Two-step stack pointer setup; Frame pointer setup Frame lowering optimizations Mar 13, 2023
@djtodoro
Copy link
Collaborator

@nikolaperic Please set the target branch to nanomips-llvm13-submission.

@nikolaperic nikolaperic changed the base branch from nanomips-llvm13 to nanomips-llvm13-submission March 22, 2023 12:14
Nikola Peric added 3 commits April 12, 2023 11:36
Enable save/restore generation when there is a gap between registers
by inserting the missing ones at the moment when spill slots for
callee-saved registers are assigned.
Adjust stack pointer in two steps if offset is larger than 4096.
In the first step adjust stack pointer for the size necessary to
spill CSR onto the stack. In the second step adjust stack pointer
for the size necessary to spill local objects.
If function uses FP, it will now point to address -4096 from the
beginning of function's stack. After FP setup following offsets
will be relative to SP if function has no var-sized objects. If it
has var-sized objects offsets will be relative to FP.
Also, stack realignment now happens with INS instruction.
llvm/lib/Target/Mips/MipsSEFrameLowering.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/Mips/MipsSEFrameLowering.cpp Outdated Show resolved Hide resolved
// Move MBBI_2 to point to the first instruction after
// calle-saves store sequence. That's the place for the second
// steck pointer adjustment.
std::advance(MBBI_2, NumOfCSI);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is assuming that the prologue definitely has one instruction per callee save, and that they're right here? That might be the case but in future where save/restore are generated might invalidate this. It would be a good idea to at least assert that the instructions are what we assume them to be and document that assumption.

Copy link
Collaborator Author

@nikolaperic nikolaperic Jul 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that's the safer way. However, in order to do that I think we need to introduce a new function which checks opcodes and operands of skipped instructions. This function will be nanomips-specific so I think it's not a good idea to write it inside MipsSEFrameLowering. Maybe a good place to write such function is inside MipsFunctionInfo since we have already put CalleeSavedStackSize variable there? Is this a good approach for this problem?

Note that on the line 452 we already had this kind of skipping CSR store instructions.

llvm/lib/Target/Mips/MipsSEFrameLowering.cpp Outdated Show resolved Hide resolved
cme pushed a commit that referenced this pull request Sep 18, 2023
This reverts commit 5fb4134.

This patch is causing crashes when building llvm-test-suite when
optimizing for CPUs with AVX512.

Reproducer crashing with llc:

    target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
    target triple = "x86_64-apple-macosx"

    define i32 @test(<32 x i32> %0) #0 {
    entry:
      %1 = mul <32 x i32> %0, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
      %2 = tail call i32 @llvm.vector.reduce.add.v32i32(<32 x i32> %1)
      ret i32 %2
    }

    ; Function Attrs: nocallback nofree nosync nounwind readnone willreturn
    declare i32 @llvm.vector.reduce.add.v32i32(<32 x i32>) #1

    attributes #0 = { "min-legal-vector-width"="0" "target-cpu"="skylake-avx512" }
    attributes #1 = { nocallback nofree nosync nounwind readnone willreturn }

(cherry picked from commit f912bab)
cme pushed a commit that referenced this pull request Sep 18, 2023
We experienced some deadlocks when we used multiple threads for logging
using `scan-builds` intercept-build tool when we used multiple threads by
e.g. logging `make -j16`

```
(gdb) bt
#0  0x00007f2bb3aff110 in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007f2bb3af70a3 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007f2bb3d152e4 in ?? ()
#3  0x00007ffcc5f0cc80 in ?? ()
#4  0x00007f2bb3d2bf5b in ?? () from /lib64/ld-linux-x86-64.so.2
#5  0x00007f2bb3b5da27 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#6  0x00007f2bb3b5dbe0 in exit () from /lib/x86_64-linux-gnu/libc.so.6
#7  0x00007f2bb3d144ee in ?? ()
#8  0x746e692f706d742f in ?? ()
#9  0x692d747065637265 in ?? ()
#10 0x2f653631326b3034 in ?? ()
#11 0x646d632e35353532 in ?? ()
#12 0x0000000000000000 in ?? ()
```

I think the gcc's exit call caused the injected `libear.so` to be unloaded
by the `ld`, which in turn called the `void on_unload() __attribute__((destructor))`.
That tried to acquire an already locked mutex which was left locked in the
`bear_report_call()` call, that probably encountered some error and
returned early when it forgot to unlock the mutex.

All of these are speculation since from the backtrace I could not verify
if frames 2 and 3 are in fact corresponding to the `libear.so` module.
But I think it's a fairly safe bet.

So, hereby I'm releasing the held mutex on *all paths*, even if some failure
happens.

PS: I would use lock_guards, but it's C.

Reviewed-by: NoQ

Differential Revision: https://reviews.llvm.org/D118439

(cherry picked from commit d919d02)
cme pushed a commit that referenced this pull request Sep 18, 2023
Change https://reviews.llvm.org/D140059 exposed the following crash in
Z3Solver, where bit widths were not checked consistently with that
change. This change makes the check consistent, and fixes the crash.

```
clang: <root>/llvm/include/llvm/ADT/APSInt.h:99:
  int64_t llvm::APSInt::getExtValue() const: Assertion
  `isRepresentableByInt64() && "Too many bits for int64_t"' failed.
...
Stack dump:
0. Program arguments: clang -cc1 -internal-isystem <root>/lib/clang/16/include
  -nostdsysteminc -analyze -analyzer-checker=core,unix.Malloc,debug.ExprInspection
  -analyzer-config crosscheck-with-z3=true -verify reproducer.c

 #0 0x00000000045b3476 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int)
  <root>/llvm/lib/Support/Unix/Signals.inc:567:22
 #1 0x00000000045b3862 PrintStackTraceSignalHandler(void*)
  <root>/llvm/lib/Support/Unix/Signals.inc:641:1
 #2 0x00000000045b14a5 llvm::sys::RunSignalHandlers()
  <root>/llvm/lib/Support/Signals.cpp:104:20
 #3 0x00000000045b2eb4 SignalHandler(int)
  <root>/llvm/lib/Support/Unix/Signals.inc:412:1
 ...
 #9 0x0000000004be2eb3 llvm::APSInt::getExtValue() const
  <root>/llvm/include/llvm/ADT/APSInt.h:99:5
  <root>/llvm/lib/Support/Z3Solver.cpp:740:53
  clang::ASTContext&, clang::ento::SymExpr const*, llvm::APSInt const&, llvm::APSInt const&, bool)
  <root>/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SMTConv.h:552:61
```

Reviewed By: steakhal

Differential Revision: https://reviews.llvm.org/D142627

(cherry picked from commit f027dd5)
@djtodoro
Copy link
Collaborator

djtodoro commented Jun 4, 2024

@nikolaperic I think we picked different approach for this, right? Can we close this (and make sure we do have it in LLVM 16)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants