[Coroutines] ABI Object #5

TylerNowicki · 2024-09-03T17:22:36Z

Primary change, add the ABI object and expose interfaces.
Add a unit test for the exposed ABI object.

See RFC for more info: https://discourse.llvm.org/t/rfc-abi-objects-for-coroutines/81057

Instead of ImportFile. This is a preparation for ARM64EC support, which has both x86 and ARM64EC thunks and each of them needs a separate flag.

Avoid excess layer of indirection.

…vec),c3) (llvm#107987) Extends existing trunc(extract_elt(vec,c1)) -> extract_elt(bitcast(vec),c3) fold. Noticed while working on llvm#107404

…rns with RVV. (llvm#108470) We need to insert a insert_subvector or extract_subvector which feels pretty custom. This should make it easier to support fixed vector arguments for GISel.

Reorder the arg layout to match (most) other lowerShuffle* calls. Rename to lowerShuffleWithEXPAND to match other lowering cases where we lower to a single node.

Winograd lowering involves a number of matmul and batch_matmul which are currently passed tensor.empty result as out parameter, thereby are undefined behaviour. This commit adds the necessary linalg.fill. --------- Co-authored-by: Max191 <[email protected]>

…lvm#108476) Change TypeNodesEmitter to use const RecordKeeper. This is a part of effort to have better const correctness in TableGen backends: https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089

…lvm#108477) Change SACheckersEmitter to use const RecordKeeper. This is a part of effort to have better const correctness in TableGen backends: https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089

…m#108478) Change SyntaxEmitter to use const RecordKeeper. This is a part of effort to have better const correctness in TableGen backends: https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089

…108500) Change MVE Emitter to use const RecordKeeper. This is a part of effort to have better const correctness in TableGen backends: https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089

…108501) Change NeonEmitter to use const RecordKeeper. This is a part of effort to have better const correctness in TableGen backends: https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089

…m#108502) Change RISCVVEmitter to use const RecordKeeper. This is a part of effort to have better const correctness in TableGen backends: https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089

…108503) Change SVE Emitter to use const RecordKeeper. This is a part of effort to have better const correctness in TableGen backends: https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089

supersedes llvm#91805

ARM64EC import thunks function similarly to regular ARM64 thunks but use a mangled name and perform the call through the auxiliary IAT.

This patch removes the getter for the mentioned mapping. This was only kept around to keep things in sync for some downstream codebases (that didn't even end up needing it), so removing it now that it is not needed anymore.

…scape' (NFC) (llvm#108586) The checker was indicated as a 'C' language checker but is only applicable to 'ObjC' code.

lldb-server built with NativeProcessLinux.cpp and NativeProcessFreeBSD.cpp can use breakpoints to implement instruction stepping on cores where there is no native instruction-step primitive. Currently these set a breakpoint, continue, and if we hit the breakpoint with the original thread, set the stop reason to be "trace". I am wrapping up a change to lldb's breakpoint algorithm where I change its current behavior of "if a thread stops at a breakpoint site, we set the thread's stop reason to breakpoint-hit, even if the breakpoint hasn't been executed" + "when resuming any thread at a breakpoint site, instruction-step past the breakpoint before resuming" to a behavior of "when a thread executes a breakpoint, set the stop reason to breakpoint-hit" + "when a thread has hit a breakpoint, when the thread resumes, we silently step past the breakpoint and then resume the thread". For these lldb-server targets doing breakpoint stepping, this means that if we are sitting on a breakpoint that has not yet executed, and instruction-step the thread, we will execute the breakpoint instruction at $pc (instead of $next-pc where it meant to go), and stop again -- at the same pc value. Then we will rewrite the stop reason to 'trace'. The higher level logic will see that we haven't hit the breakpoint instruction again, so it will try to instruction step again, hitting the breakpoint again forever. To fix this, I'm checking that the thread matches the one we are instruction-stepping-by-breakpoint AND that we've stopped at the breakpoint address we are stepping to. Only in that case will the stop reason be rewritten to "trace" hiding the implementation detail that the step was done by breakpoints.

xusheng added support for swbreak/hwbreak a month ago, and no special support was needed in ProcessGDBRemote when they're received because lldb already marks a thread as having hit a breakpoint when it stops at a breakpoint site. However, with changes I am working on, we need to know the real stop reason a thread stopped or the breakpoint hit will not be recognized. This is similar to how lldb processes the "watch/rwatch/awatch" keys in a thread stop packet -- we set the `reason` to `watchpoint`, and these set it to `breakpoint` so we set the stop reason correctly later in these methods.

https://wg21.link/p1869r1: Rename `condition_variable_any` interruptible wait methods The paper was implemented as experimental feature in Clang 18 in: llvm@4fa812b Experimental status removed in: llvm#107900 Closes llvm#100031 --------- Co-authored-by: Hristo Hristov <[email protected]>

…108584) Update mma tests in prep for changes needed in a followup patch for llvm#107229. Checks for ``clang/test/CodeGen/PowerPC/builtins-ppc-pair-mma-types.c`` seem to have been manually upated to rename temp variables even though it says checks was auto generated. Regenerate via script. Add noopt checks for ``clang/test/CodeGen/PowerPC/builtins-ppc-build-pair-mma.c``.

In llvm#95312 Minidump file creation was moved from being created at the end, to the file being emitted in chunks. This causes some undesirable behavior where the file can still be present after an error has occurred. To resolve this we will now delete the file upon an error.

This patch implements a simple version of the pipeline parsing function. It currently only handles a single FPM and adds function passes to it.

…ed. NFC (llvm#108572) We have generic isel support for Zvkb and Zvbb.

This avoids issuing the deprecation diagnostic when building the module. Not building it into a module shouldn't cause any negative impacts, since it no longer has any declarations other than the header guard. It's also very rarely included by anything. Addresses llvm#96246 (comment)

This patch implements sandboxir::DSOLocalEquivalent mirroring llvm::DSOLocalEquivalent.

This patch enables the basic skeleton enablement of AMD next gen zen5 CPUs.

Once we modernize CopyInfo with default member initializations, Copies.insert({Unit, ...}) becomes equivalent to: Copies.try_emplace(Unit) which we can simplify further down to Copies[Unit].

…meter (llvm#107641) Describe how the issue that is diagnosed by this check can be resolved. Namely, by adding an overload for the xvalue case (`&&` parameter). Fixes llvm#107600

…107659) Update the GPU to NVVM lowerings to correctly propagate range information on IDs and dimension queries, etiher from known_{block,grid}_size attributes or from `upperBound` annotations on the operations themselves.

…t. (llvm#108462)

…children provider (llvm#108414) Our customers is reporting a serious performance issue (expanding a this pointer takes 70 seconds in VSCode) in a specific execution context. Profiling shows the hot path is triggered by an expression evaluation from libStdC++ synthetic children provider for `std::vector<bool>` since it uses `CreateValueFromExpression()`. This PR added a new `SBValue::CreateBoolValue()` API and switch `std::vector<bool>` synthetic children provider to use the new API without performing expression evaluation. Note: there might be other cases of `CreateValueFromExpression()` in our summary/synthetic children providers which I will sweep through in later PRs. With this PR, the customer's scenario reduces from 70 seconds => 50 seconds. I will add other PRs to further optimize the remaining 50 seconds (mostly from type/namespace lookup). Testing: `test/API/functionalities/data-formatter/data-formatter-stl/libstdcpp/vbool/TestDataFormatterStdVBool.py` passes with the PR --------- Co-authored-by: jeffreytan81 <[email protected]>

… symbols are created (llvm#106791) Summary: This improves the performance of ObjectFileMacho::ParseSymtab by removing eager and expensive work in favor of doing it later in a less-expensive fashion. Experiment: My goal was to understand LLDB's startup time. First, I produced a Debug build of LLDB (no dSYM) and a Release+NoAsserts build of LLDB. The Release build debugged the Debug build as it debugged a small C++ program. I found that ObjectFileMachO::ParseSymtab accounted for somewhere between 1.2 and 1.3 seconds consistently. After applying this change, I consistently measured a reduction of approximately 100ms, putting the time closer to 1.1s and 1.2s on average. Background: ObjectFileMachO::ParseSymtab will incrementally create symbols by parsing nlist entries from the symtab section of a MachO binary. As it does this, it eagerly tries to determine the size of symbols (e.g. how long a function is) using LC_FUNCTION_STARTS data (or eh_frame if LC_FUNCTION_STARTS is unavailable). Concretely, this is done by performing a binary search on the function starts array and calculating the distance to the next function or the end of the section (whichever is smaller). However, this work is unnecessary for 2 reasons: 1. If you have debug symbol entries (i.e. STABs), the size of a function is usually stored right after the function's entry. Performing this work right before parsing the next entry is unnecessary work. 2. Calculating symbol sizes for symbols of size 0 is already performed in `Symtab::InitAddressIndexes` after all the symbols are added to the Symtab. It also does this more efficiently by walking over a list of symbols sorted by address, so the work to calculate the size per symbol is constant instead of O(log n).

Like most other i128 operations, this adds scalarization for i128 vector shifts. Which in turn allows a few other operations to legalize too.

Similar to other i128 bit operations, we scalarizer any icmps or selects larger than 64bits.

Adds a python script to automatically take output from a failed clang -verify test and update the test case(s) to expect the new behaviour.

* To create custom ABIs plugin libraries need access to CoroShape. * As a step in enabling plugin libraries, move Shape into its own header * The header will eventually be moved into include/llvm/Transforms/Coroutines See RFC for more info: https://discourse.llvm.org/t/rfc-abi-objects-for-coroutines/81057

* Refactor buildFrom to separate the analysis, abi related operations, tidying and bailout. * In a follow-up PR the code in initABI will be moved to an ABI object init method. And the Shape constructor will no longer perform any lowering, instead it will just call analysis. This will make the Shape object a bit more useful because it can be constructed and used anywhere. It may even be useful to make it an analysis pass. * In a follow-up PR the OptimizeFrame flag will also be removed from the Shape and instead will be passed directly to buildCoroutineFrame (although it would be nice to find another way to trigger this optimization). This is the only thing that Shape cannot determine from the Function/Coroutine, but it is only needed within buildCoroutineFrame. * Note, that it was necessary to introduce two new SmallVectors, one to track CoroFrames and the other for UnusedCoroSaves. The tidyCoroutine method requires both, while invalidateCoroutine (bailout) method just requires the former. See RFC for more info: https://discourse.llvm.org/t/rfc-abi-objects-for-coroutines/81057

When SPARC Asan testing is enabled by PR llvm#107405, many Linux/sparc64 tests just hang like ``` #0 0xf7ae8e90 in syscall () from /usr/lib32/libc.so.6 #1 0x701065e8 in __sanitizer::FutexWait(__sanitizer::atomic_uint32_t*, unsigned int) () at compiler-rt/lib/sanitizer_common/sanitizer_linux.cpp:766 #2 0x70107c90 in Wait () at compiler-rt/lib/sanitizer_common/sanitizer_mutex.cpp:35 #3 0x700f7cac in Lock () at compiler-rt/lib/asan/../sanitizer_common/sanitizer_mutex.h:196 #4 Lock () at compiler-rt/lib/asan/../sanitizer_common/sanitizer_thread_registry.h:98 #5 LockThreads () at compiler-rt/lib/asan/asan_thread.cpp:489 #6 0x700e9c8c in __asan::BeforeFork() () at compiler-rt/lib/asan/asan_posix.cpp:157 #7 0xf7ac83f4 in ?? () from /usr/lib32/libc.so.6 Backtrace stopped: previous frame identical to this frame (corrupt stack?) ``` It turns out that this happens in tests using `internal_fork` (e.g. invoking `llvm-symbolizer`): unlike most other Linux targets, which use `clone`, Linux/sparc64 has to use `__fork` instead. While `clone` doesn't trigger `pthread_atfork` handlers, `__fork` obviously does, causing the hang. To avoid this, this patch disables `InstallAtForkHandler` and lets the ASan tests run to completion. Tested on `sparc64-unknown-linux-gnu`.

…ext is not fully initialized (llvm#110481) As this comment around target initialization implies: ``` // This can be NULL if we don't know anything about the architecture or if // the target for an architecture isn't enabled in the llvm/clang that we // built ``` There are cases where we might fail to call `InitBuiltinTypes` when creating the backing `ASTContext` for a `TypeSystemClang`. If that happens, the builtins `QualType`s, e.g., `VoidPtrTy`/`IntTy`/etc., are not initialized and dereferencing them as we do in `GetBuiltinTypeForEncodingAndBitSize` (and other places) will lead to nullptr-dereferences. Example backtrace: ``` (lldb) run Assertion failed: (!isNull() && "Cannot retrieve a NULL type pointer"), function getCommonPtr, file Type.h, line 958. Process 2680 stopped * thread llvm#15, name = '<lldb.process.internal-state(pid=2712)>', stop reason = hit program assert frame #4: 0x000000010cdf3cdc liblldb.20.0.0git.dylib`DWARFASTParserClang::ExtractIntFromFormValue(lldb_private::CompilerType const&, lldb_private::plugin::dwarf::DWARFFormValue const&) const (.cold.1) + liblldb.20.0.0git.dylib`DWARFASTParserClang::ParseObjCMethod(lldb_private::ObjCLanguage::MethodName const&, lldb_private::plugin::dwarf::DWARFDIE const&, lldb_private::CompilerType, ParsedDWARFTypeAttributes , bool) (.cold.1): -> 0x10cdf3cdc <+0>: stp x29, x30, [sp, #-0x10]! 0x10cdf3ce0 <+4>: mov x29, sp 0x10cdf3ce4 <+8>: adrp x0, 545 0x10cdf3ce8 <+12>: add x0, x0, #0xa25 ; "ParseObjCMethod" Target 0: (lldb) stopped. (lldb) bt * thread llvm#15, name = '<lldb.process.internal-state(pid=2712)>', stop reason = hit program assert frame #0: 0x0000000180d08600 libsystem_kernel.dylib`__pthread_kill + 8 frame #1: 0x0000000180d40f50 libsystem_pthread.dylib`pthread_kill + 288 frame #2: 0x0000000180c4d908 libsystem_c.dylib`abort + 128 frame #3: 0x0000000180c4cc1c libsystem_c.dylib`__assert_rtn + 284 * frame #4: 0x000000010cdf3cdc liblldb.20.0.0git.dylib`DWARFASTParserClang::ExtractIntFromFormValue(lldb_private::CompilerType const&, lldb_private::plugin::dwarf::DWARFFormValue const&) const (.cold.1) + frame #5: 0x0000000109d30acc liblldb.20.0.0git.dylib`lldb_private::TypeSystemClang::GetBuiltinTypeForEncodingAndBitSize(lldb::Encoding, unsigned long) + 1188 frame #6: 0x0000000109aaaed4 liblldb.20.0.0git.dylib`DynamicLoaderMacOS::NotifyBreakpointHit(void*, lldb_private::StoppointCallbackContext*, unsigned long long, unsigned long long) + 384 ``` This patch adds a one-time user-visible warning for when we fail to initialize the AST to indicate that initialization went wrong for the given target. Additionally, we add checks for whether one of the `ASTContext` `QualType`s is invalid before dereferencing any builtin types. The warning would look as follows: ``` (lldb) target create "a.out" Current executable set to 'a.out' (arm64). (lldb) b main warning: Failed to initialize builtin ASTContext types for target 'some-unknown-triple'. Printing variables may behave unexpectedly. Breakpoint 1: where = a.out`main + 8 at stepping.cpp:5:14, address = 0x0000000100003f90 ``` rdar://134869779

TylerNowicki mentioned this pull request Sep 3, 2024

[Coroutines] Move the SuspendCrossingInfo analysis helper into its own header/source llvm/llvm-project#106306

Merged

TylerNowicki force-pushed the users/tylernowicki/coro-refactor6 branch from 5938a30 to f3be722 Compare September 3, 2024 21:07

TylerNowicki force-pushed the users/tylernowicki/coro-refactor5 branch from 8795ab2 to 4d1fb16 Compare September 3, 2024 21:32

TylerNowicki force-pushed the users/tylernowicki/coro-refactor6 branch from f3be722 to 52d8f3f Compare September 3, 2024 21:35

TylerNowicki force-pushed the users/tylernowicki/coro-refactor5 branch from 4d1fb16 to 054daf1 Compare September 4, 2024 17:38

TylerNowicki force-pushed the users/tylernowicki/coro-refactor6 branch from 52d8f3f to f57bffa Compare September 4, 2024 18:04

TylerNowicki changed the title ~~[llvm/llvm-project][Coroutines] ABI Object~~ [Coroutines] ABI Object Sep 11, 2024

TylerNowicki force-pushed the users/tylernowicki/coro-refactor5 branch 2 times, most recently from 84fbb42 to 45fdd5a Compare September 11, 2024 15:50

TylerNowicki force-pushed the users/tylernowicki/coro-refactor6 branch from f57bffa to 2ceb2c0 Compare September 12, 2024 18:31

TylerNowicki force-pushed the users/tylernowicki/coro-refactor5 branch from 45fdd5a to fd2bd5c Compare September 12, 2024 18:33

TylerNowicki force-pushed the users/tylernowicki/coro-refactor6 branch from 2ceb2c0 to 31866b5 Compare September 12, 2024 22:52

TylerNowicki force-pushed the users/tylernowicki/coro-refactor5 branch from fd2bd5c to 4baaa38 Compare September 13, 2024 13:37

cjacek and others added 17 commits September 13, 2024 15:42

[LLD][COFF][NFC] Store live flag in ImportThunkChunk. (llvm#108459)

6be9be5

Instead of ImportFile. This is a preparation for ARM64EC support, which has both x86 and ARM64EC thunks and each of them needs a separate flag.

[llvm][unittests] Strip unneeded uses of raw_string_ostream::str() (NFC)

387bee9

Avoid excess layer of indirection.

[DAG] Fold trunc(srl(extract_elt(vec,c1),c2)) -> extract_elt(bitcast(…

69a2115

…vec),c3) (llvm#107987) Extends existing trunc(extract_elt(vec,c1)) -> extract_elt(bitcast(vec),c3) fold. Noticed while working on llvm#107404

[X86] Use MCRegister in X86AsmParser. (llvm#108509)

a3ea018

[RISCV] Use CCValAssign::getCustomReg for fixed vector arguments/retu…

ee4582f

…rns with RVV. (llvm#108470) We need to insert a insert_subvector or extract_subvector which feels pretty custom. This should make it easier to support fixed vector arguments for GISel.

[X86] Cleanup lowerShuffleToEXPAND arg layout. NFC.

4a9b6b0

Reorder the arg layout to match (most) other lowerShuffle* calls. Rename to lowerShuffleWithEXPAND to match other lowering cases where we lower to a single node.

[IRSim] Avoid repeated hash lookups (NFC) (llvm#108483)

ab06a18

[libc] implement clock_gettime using vDSO (llvm#108458)

99fe595

supersedes llvm#91805

[LLD][COFF] Add Support for ARM64EC Import Thunks (llvm#108460)

ea5d37f

ARM64EC import thunks function similarly to regular ARM64 thunks but use a mangled name and perform the call through the auxiliary IAT.

SchrodingerZhu and others added 26 commits September 13, 2024 11:10

[libc] fix build issue in overlay mode (llvm#108583)

a643836

[llvm-exegesis] Remove getter for RegNameToRegNo mapping

ff1de24

This patch removes the getter for the mentioned mapping. This was only kept around to keep things in sync for some downstream codebases (that didn't even end up needing it), so removing it now that it is not needed anymore.

[clang][analyzer][docs] Fix documentation of checker 'StackAddrAsyncE…

ffeb793

…scape' (NFC) (llvm#108586) The checker was indicated as a 'C' language checker but is only applicable to 'ObjC' code.

[libc] fix dependency path for vDSO (llvm#108591)

82987bd

[SandboxIR][PassRegistry] Parse pipeline string (llvm#108103)

f0f1b70

This patch implements a simple version of the pipeline parsing function. It currently only handles a single FPM and adds function passes to it.

[RISCV][Docs] Change Zvbb and Zvkb from 'Assembly Support' to Support…

8e2843b

…ed. NFC (llvm#108572) We have generic isel support for Zvkb and Zvbb.

[SandboxIR] Implement DSOLocalEquivalent (llvm#108473)

fffc7fb

This patch implements sandboxir::DSOLocalEquivalent mirroring llvm::DSOLocalEquivalent.

[X86] AMD Zen 5 Initial enablement (llvm#107964)

02e4186

This patch enables the basic skeleton enablement of AMD next gen zen5 CPUs.

[CodeGen] Use DenseMap::operator[] (NFC) (llvm#108489)

b9d85b1

Once we modernize CopyInfo with default member initializations, Copies.insert({Unit, ...}) becomes equivalent to: Copies.try_emplace(Unit) which we can simplify further down to Copies[Unit].

[NFC][clang-tidy] document fix to bugprone-return-const-ref-from-para…

4ef16e3

…meter (llvm#107641) Describe how the issue that is diagnosed by this check can be resolved. Namely, by adding an overload for the xvalue case (`&&` parameter). Fixes llvm#107600

Add a comment in the SB API doc about keeping the SB API's lightweigh…

02d8813

…t. (llvm#108462)

[LiveDebugValues] Avoid repeated hash lookups (NFC) (llvm#108484)

3a27458

[AArch64][GISel] Scalarize i128 vector shifts.

7582308

Like most other i128 operations, this adds scalarization for i128 vector shifts. Which in turn allows a few other operations to legalize too.

[AArch64][GISel] Scalarize i128 ICmp and Select.

3a4b30e

Similar to other i128 bit operations, we scalarizer any icmps or selects larger than 64bits.

[Utils] add update-verify-tests.py (llvm#97369)

d4f41be

Adds a python script to automatically take output from a failed clang -verify test and update the test case(s) to expect the new behaviour.

TylerNowicki force-pushed the users/tylernowicki/coro-refactor6 branch from 31866b5 to ba9e523 Compare September 13, 2024 18:23

TylerNowicki closed this Sep 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Coroutines] ABI Object #5

[Coroutines] ABI Object #5

TylerNowicki commented Sep 3, 2024 •

edited

Loading

[Coroutines] ABI Object #5

[Coroutines] ABI Object #5

Conversation

TylerNowicki commented Sep 3, 2024 • edited Loading

TylerNowicki commented Sep 3, 2024 •

edited

Loading