Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Coroutines] ABI Object #5

Conversation

TylerNowicki
Copy link
Owner

@TylerNowicki TylerNowicki commented Sep 3, 2024

  • Primary change, add the ABI object and expose interfaces.
  • Add a unit test for the exposed ABI object.

See RFC for more info: https://discourse.llvm.org/t/rfc-abi-objects-for-coroutines/81057

@TylerNowicki TylerNowicki force-pushed the users/tylernowicki/coro-refactor6 branch from 5938a30 to f3be722 Compare September 3, 2024 21:07
@TylerNowicki TylerNowicki force-pushed the users/tylernowicki/coro-refactor5 branch from 8795ab2 to 4d1fb16 Compare September 3, 2024 21:32
@TylerNowicki TylerNowicki force-pushed the users/tylernowicki/coro-refactor6 branch from f3be722 to 52d8f3f Compare September 3, 2024 21:35
@TylerNowicki TylerNowicki force-pushed the users/tylernowicki/coro-refactor5 branch from 4d1fb16 to 054daf1 Compare September 4, 2024 17:38
@TylerNowicki TylerNowicki force-pushed the users/tylernowicki/coro-refactor6 branch from 52d8f3f to f57bffa Compare September 4, 2024 18:04
@TylerNowicki TylerNowicki changed the title [llvm/llvm-project][Coroutines] ABI Object [Coroutines] ABI Object Sep 11, 2024
@TylerNowicki TylerNowicki force-pushed the users/tylernowicki/coro-refactor5 branch 2 times, most recently from 84fbb42 to 45fdd5a Compare September 11, 2024 15:50
@TylerNowicki TylerNowicki force-pushed the users/tylernowicki/coro-refactor6 branch from f57bffa to 2ceb2c0 Compare September 12, 2024 18:31
@TylerNowicki TylerNowicki force-pushed the users/tylernowicki/coro-refactor5 branch from 45fdd5a to fd2bd5c Compare September 12, 2024 18:33
@TylerNowicki TylerNowicki force-pushed the users/tylernowicki/coro-refactor6 branch from 2ceb2c0 to 31866b5 Compare September 12, 2024 22:52
@TylerNowicki TylerNowicki force-pushed the users/tylernowicki/coro-refactor5 branch from fd2bd5c to 4baaa38 Compare September 13, 2024 13:37
cjacek and others added 17 commits September 13, 2024 15:42
Instead of ImportFile. This is a preparation for ARM64EC support, which
has both x86 and ARM64EC thunks and each of them needs a separate flag.
…vec),c3) (llvm#107987)

Extends existing trunc(extract_elt(vec,c1)) -> extract_elt(bitcast(vec),c3) fold.

Noticed while working on llvm#107404
…rns with RVV. (llvm#108470)

We need to insert a insert_subvector or extract_subvector which feels
pretty custom.

This should make it easier to support fixed vector arguments for GISel.
Reorder the arg layout to match (most) other lowerShuffle* calls.

Rename to lowerShuffleWithEXPAND to match other lowering cases where we lower to a single node.
Winograd lowering involves a number of matmul and batch_matmul which
are currently passed tensor.empty result as out parameter, thereby
are undefined behaviour. This commit adds the necessary linalg.fill.

---------

Co-authored-by: Max191 <[email protected]>
ARM64EC import thunks function similarly to regular ARM64 thunks but use
a mangled name and perform the call through the auxiliary IAT.
SchrodingerZhu and others added 26 commits September 13, 2024 11:10
This patch removes the getter for the mentioned mapping. This was only
kept around to keep things in sync for some downstream codebases (that
didn't even end up needing it), so removing it now that it is not needed
anymore.
…scape' (NFC) (llvm#108586)

The checker was indicated as a 'C' language checker but is only applicable to 'ObjC' code.
lldb-server built with NativeProcessLinux.cpp and
NativeProcessFreeBSD.cpp can use breakpoints to implement instruction
stepping on cores where there is no native instruction-step primitive.
Currently these set a breakpoint, continue, and if we hit the breakpoint
with the original thread, set the stop reason to be "trace".

I am wrapping up a change to lldb's breakpoint algorithm where I change
its current behavior of

"if a thread stops at a breakpoint site, we set
the thread's stop reason to breakpoint-hit, even if the breakpoint
hasn't been executed" +
"when resuming any thread at a breakpoint site, instruction-step past
the breakpoint before resuming"

to a behavior of

"when a thread executes a breakpoint, set the stop reason to
breakpoint-hit" +
"when a thread has hit a breakpoint, when the thread resumes, we
silently step past the breakpoint and then resume the thread".

For these lldb-server targets doing breakpoint stepping, this means that
if we are sitting on a breakpoint that has not yet executed, and
instruction-step the thread, we will execute the breakpoint instruction
at $pc (instead of $next-pc where it meant to go), and stop again -- at
the same pc value. Then we will rewrite the stop reason to 'trace'. The
higher level logic will see that we haven't hit the breakpoint
instruction again, so it will try to instruction step again, hitting the
breakpoint again forever.

To fix this, I'm checking that the thread matches the one we are
instruction-stepping-by-breakpoint AND that we've stopped at the
breakpoint address we are stepping to. Only in that case will the stop
reason be rewritten to "trace" hiding the implementation detail that the
step was done by breakpoints.
xusheng added support for swbreak/hwbreak a month ago, and no special
support was needed in ProcessGDBRemote when they're received because
lldb already marks a thread as having hit a breakpoint when it stops at
a breakpoint site. However, with changes I am working on, we need to
know the real stop reason a thread stopped or the breakpoint hit will
not be recognized.

This is similar to how lldb processes the "watch/rwatch/awatch" keys in
a thread stop packet -- we set the `reason` to `watchpoint`, and these
set it to `breakpoint` so we set the stop reason correctly later in
these methods.
https://wg21.link/p1869r1: Rename `condition_variable_any` interruptible
wait methods

The paper was implemented as experimental feature in Clang 18 in:
llvm@4fa812b

Experimental status removed in:
llvm#107900

Closes llvm#100031

---------

Co-authored-by: Hristo Hristov <[email protected]>
…108584)

Update mma tests in prep for changes needed in a followup patch for
llvm#107229.

Checks for ``clang/test/CodeGen/PowerPC/builtins-ppc-pair-mma-types.c``
seem to have been manually upated to rename temp variables even though
it says checks was auto generated. Regenerate via script.

Add noopt checks for
``clang/test/CodeGen/PowerPC/builtins-ppc-build-pair-mma.c``.
In llvm#95312 Minidump file creation was moved from being created at the
end, to the file being emitted in chunks. This causes some undesirable
behavior where the file can still be present after an error has
occurred. To resolve this we will now delete the file upon an error.
This patch implements a simple version of the pipeline parsing function.
It currently only handles a single FPM and adds function passes to it.
…ed. NFC (llvm#108572)

We have generic isel support for Zvkb and Zvbb.
This avoids issuing the deprecation diagnostic when building the module.

Not building it into a module shouldn't cause any negative impacts,
since it no longer has any declarations other than the header guard.
It's also very rarely included by anything.

Addresses
llvm#96246 (comment)
This patch implements sandboxir::DSOLocalEquivalent mirroring
llvm::DSOLocalEquivalent.
This patch enables the basic skeleton enablement of AMD next gen zen5 CPUs.
Once we modernize CopyInfo with default member initializations,

  Copies.insert({Unit, ...})

becomes equivalent to:

  Copies.try_emplace(Unit)

which we can simplify further down to Copies[Unit].
…meter (llvm#107641)

Describe how the issue that is diagnosed by this check can be resolved.
Namely, by adding an overload for the xvalue case (`&&` parameter).

Fixes llvm#107600
…107659)

Update the GPU to NVVM lowerings to correctly propagate range
information on IDs and dimension queries, etiher from
known_{block,grid}_size attributes or from `upperBound` annotations on
the operations themselves.
…children provider (llvm#108414)

Our customers is reporting a serious performance issue (expanding a this
pointer takes 70 seconds in VSCode) in a specific execution context.

Profiling shows the hot path is triggered by an expression evaluation
from libStdC++ synthetic children provider for `std::vector<bool>` since
it uses `CreateValueFromExpression()`.

This PR added a new `SBValue::CreateBoolValue()` API and switch
`std::vector<bool>` synthetic children provider to use the new API
without performing expression evaluation.

Note: there might be other cases of `CreateValueFromExpression()` in our
summary/synthetic children providers which I will sweep through in later
PRs.

With this PR, the customer's scenario reduces from 70 seconds => 50
seconds. I will add other PRs to further optimize the remaining 50
seconds (mostly from type/namespace lookup).

Testing:

`test/API/functionalities/data-formatter/data-formatter-stl/libstdcpp/vbool/TestDataFormatterStdVBool.py`
passes with the PR

---------

Co-authored-by: jeffreytan81 <[email protected]>
… symbols are created (llvm#106791)

Summary:
This improves the performance of ObjectFileMacho::ParseSymtab by
removing eager and expensive work in favor of doing it later in a
less-expensive fashion.

Experiment:
My goal was to understand LLDB's startup time.
First, I produced a Debug build of LLDB (no dSYM) and a
Release+NoAsserts build of LLDB. The Release build debugged the Debug
build as it debugged a small C++ program. I found that
ObjectFileMachO::ParseSymtab accounted for somewhere between 1.2 and 1.3
seconds consistently. After applying this change, I consistently
measured a reduction of approximately 100ms, putting the time closer to
1.1s and 1.2s on average.

Background:
ObjectFileMachO::ParseSymtab will incrementally create symbols by
parsing nlist entries from the symtab section of a MachO binary. As it
does this, it eagerly tries to determine the size of symbols (e.g. how
long a function is) using LC_FUNCTION_STARTS data (or eh_frame if
LC_FUNCTION_STARTS is unavailable). Concretely, this is done by
performing a binary search on the function starts array and calculating
the distance to the next function or the end of the section (whichever
is smaller).

However, this work is unnecessary for 2 reasons:
1. If you have debug symbol entries (i.e. STABs), the size of a function
is usually stored right after the function's entry. Performing this work
right before parsing the next entry is unnecessary work.
2. Calculating symbol sizes for symbols of size 0 is already performed
in `Symtab::InitAddressIndexes` after all the symbols are added to the
Symtab. It also does this more efficiently by walking over a list of
symbols sorted by address, so the work to calculate the size per symbol
is constant instead of O(log n).
Like most other i128 operations, this adds scalarization for i128 vector
shifts. Which in turn allows a few other operations to legalize too.
Similar to other i128 bit operations, we scalarizer any icmps or selects larger
than 64bits.
Adds a python script to automatically take output from a failed clang
-verify test and update the test case(s) to expect the new behaviour.
* To create custom ABIs plugin libraries need access to CoroShape.
* As a step in enabling plugin libraries, move Shape into its own header
* The header will eventually be moved into include/llvm/Transforms/Coroutines

See RFC for more info:
https://discourse.llvm.org/t/rfc-abi-objects-for-coroutines/81057
* Refactor buildFrom to separate the analysis, abi related operations,
  tidying and bailout.
* In a follow-up PR the code in initABI will be moved to an ABI object
  init method. And the Shape constructor will no longer perform any
  lowering, instead it will just call analysis. This will make the Shape
  object a bit more useful because it can be constructed and used
  anywhere. It may even be useful to make it an analysis pass.
* In a follow-up PR the OptimizeFrame flag will also be removed from the
  Shape and instead will be passed directly to buildCoroutineFrame
  (although it would be nice to find another way to trigger this
  optimization). This is the only thing that Shape cannot determine from
  the Function/Coroutine, but it is only needed within
  buildCoroutineFrame.
* Note, that it was necessary to introduce two new SmallVectors, one to
  track CoroFrames and the other for UnusedCoroSaves. The tidyCoroutine
  method requires both, while invalidateCoroutine (bailout) method just
  requires the former.

See RFC for more info: https://discourse.llvm.org/t/rfc-abi-objects-for-coroutines/81057
@TylerNowicki TylerNowicki force-pushed the users/tylernowicki/coro-refactor6 branch from 31866b5 to ba9e523 Compare September 13, 2024 18:23
TylerNowicki pushed a commit that referenced this pull request Sep 17, 2024
When SPARC Asan testing is enabled by PR llvm#107405, many Linux/sparc64
tests just hang like
```
#0  0xf7ae8e90 in syscall () from /usr/lib32/libc.so.6
#1  0x701065e8 in __sanitizer::FutexWait(__sanitizer::atomic_uint32_t*, unsigned int) ()
    at compiler-rt/lib/sanitizer_common/sanitizer_linux.cpp:766
#2  0x70107c90 in Wait ()
    at compiler-rt/lib/sanitizer_common/sanitizer_mutex.cpp:35
#3  0x700f7cac in Lock ()
    at compiler-rt/lib/asan/../sanitizer_common/sanitizer_mutex.h:196
#4  Lock ()
    at compiler-rt/lib/asan/../sanitizer_common/sanitizer_thread_registry.h:98
#5  LockThreads ()
    at compiler-rt/lib/asan/asan_thread.cpp:489
#6  0x700e9c8c in __asan::BeforeFork() ()
    at compiler-rt/lib/asan/asan_posix.cpp:157
#7  0xf7ac83f4 in ?? () from /usr/lib32/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
```
It turns out that this happens in tests using `internal_fork` (e.g.
invoking `llvm-symbolizer`): unlike most other Linux targets, which use
`clone`, Linux/sparc64 has to use `__fork` instead. While `clone`
doesn't trigger `pthread_atfork` handlers, `__fork` obviously does,
causing the hang.

To avoid this, this patch disables `InstallAtForkHandler` and lets the
ASan tests run to completion.

Tested on `sparc64-unknown-linux-gnu`.
TylerNowicki pushed a commit that referenced this pull request Oct 3, 2024
…ext is not fully initialized (llvm#110481)

As this comment around target initialization implies:
```
  // This can be NULL if we don't know anything about the architecture or if
  // the target for an architecture isn't enabled in the llvm/clang that we
  // built
```

There are cases where we might fail to call `InitBuiltinTypes` when
creating the backing `ASTContext` for a `TypeSystemClang`. If that
happens, the builtins `QualType`s, e.g., `VoidPtrTy`/`IntTy`/etc., are
not initialized and dereferencing them as we do in
`GetBuiltinTypeForEncodingAndBitSize` (and other places) will lead to
nullptr-dereferences. Example backtrace:
```
(lldb) run
Assertion failed: (!isNull() && "Cannot retrieve a NULL type pointer"), function getCommonPtr, file Type.h, line 958.
Process 2680 stopped
* thread llvm#15, name = '<lldb.process.internal-state(pid=2712)>', stop reason = hit program assert
    frame #4: 0x000000010cdf3cdc liblldb.20.0.0git.dylib`DWARFASTParserClang::ExtractIntFromFormValue(lldb_private::CompilerType const&, lldb_private::plugin::dwarf::DWARFFormValue const&) const (.cold.1) + 
liblldb.20.0.0git.dylib`DWARFASTParserClang::ParseObjCMethod(lldb_private::ObjCLanguage::MethodName const&, lldb_private::plugin::dwarf::DWARFDIE const&, lldb_private::CompilerType, ParsedDWARFTypeAttributes
, bool) (.cold.1):
->  0x10cdf3cdc <+0>:  stp    x29, x30, [sp, #-0x10]!
    0x10cdf3ce0 <+4>:  mov    x29, sp
    0x10cdf3ce4 <+8>:  adrp   x0, 545
    0x10cdf3ce8 <+12>: add    x0, x0, #0xa25 ; "ParseObjCMethod"
Target 0: (lldb) stopped.
(lldb) bt
* thread llvm#15, name = '<lldb.process.internal-state(pid=2712)>', stop reason = hit program assert
    frame #0: 0x0000000180d08600 libsystem_kernel.dylib`__pthread_kill + 8
    frame #1: 0x0000000180d40f50 libsystem_pthread.dylib`pthread_kill + 288
    frame #2: 0x0000000180c4d908 libsystem_c.dylib`abort + 128
    frame #3: 0x0000000180c4cc1c libsystem_c.dylib`__assert_rtn + 284
  * frame #4: 0x000000010cdf3cdc liblldb.20.0.0git.dylib`DWARFASTParserClang::ExtractIntFromFormValue(lldb_private::CompilerType const&, lldb_private::plugin::dwarf::DWARFFormValue const&) const (.cold.1) + 
    frame #5: 0x0000000109d30acc liblldb.20.0.0git.dylib`lldb_private::TypeSystemClang::GetBuiltinTypeForEncodingAndBitSize(lldb::Encoding, unsigned long) + 1188
    frame #6: 0x0000000109aaaed4 liblldb.20.0.0git.dylib`DynamicLoaderMacOS::NotifyBreakpointHit(void*, lldb_private::StoppointCallbackContext*, unsigned long long, unsigned long long) + 384
```

This patch adds a one-time user-visible warning for when we fail to
initialize the AST to indicate that initialization went wrong for the
given target. Additionally, we add checks for whether one of the
`ASTContext` `QualType`s is invalid before dereferencing any builtin
types.

The warning would look as follows:
```
(lldb) target create "a.out"
Current executable set to 'a.out' (arm64).
(lldb) b main
warning: Failed to initialize builtin ASTContext types for target 'some-unknown-triple'. Printing variables may behave unexpectedly.
Breakpoint 1: where = a.out`main + 8 at stepping.cpp:5:14, address = 0x0000000100003f90
```

rdar://134869779
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.