This repository has been archived by the owner on Jan 20, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 17
Jc d2d memcpy #243
Closed
JonChesterfield
wants to merge
10,000
commits into
ROCm-Developer-Tools:main
from
JonChesterfield:jc_d2d_memcpy
Closed
Jc d2d memcpy #243
JonChesterfield
wants to merge
10,000
commits into
ROCm-Developer-Tools:main
from
JonChesterfield:jc_d2d_memcpy
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…in the first function-attrs pass Fix lit test too Summary: Argument attributes like NoAlias and ReadOnly could affect memoryssa and thus earlyCSE in the function simplification pipeline. https://reviews.llvm.org/D145210 adjusted PostOrderFunctionAttrs placement and caused the argument attributes not referred for the use in the pipeline. This work (initiated by @nikic) unconditionally performs argument attribute inference in the first function-attrs pass. Reviewers: aeubanks and nikic Differential Revision: https://reviews.llvm.org/D156397 Change-Id: If9d1a1b165b708dddc03dfb4d33de2ee48e42844
Replace the ASO variant with upstream Debug.cpp Change-Id: I5a1b0ae8d49a9d8d7ab49ce7a37eb46bde9d8c1b
Change-Id: I98f9dbf2b938cfe1774bc25b22bb543f46027f6d
Change-Id: I63bbe1eb7b76198ea4d1663bf3be6041f8d3db40
…ssert messages Change-Id: Ia59e5bd3f9645213a15e68a959757749bef564aa
Change-Id: Id991d428db0c107790505e631257c3122e33281c
unxfails 3 tests xfails: clang/test/CodeGen/X86/sm3-error.c Change-Id: If8017b561cc0534a1c717119ecf861f8d6288d5a
Change-Id: If4d53d1e9fe69bd8d21eb155c0552acb03706d55
…upstream no longer uses named variables for master-worker handshaking in generic kernels Change-Id: I50ca6c010f4c0b7c58b706385ceb71cea32f7c28
Change-Id: I415b21f1ce259e2ec018765ed97748d715e83de6
Change-Id: Id4842d20619920792cfb76b939695427857bf139
… target launch and data transfer operations Implemented RAII objects, initialized at target entry points, that invoke tool-supplied callbacks. Updated status of target callbacks as implemented. Depends on D127365 Patch from John Mellor-Crummey <[email protected]> With contributions from: Dhruva Chakrabarti <[email protected]> Jan-Patrick Lehr <[email protected]> Reviewed By: jdoerfert, dhruvachak, jplehr Differential Revision: https://reviews.llvm.org/D127367 Change-Id: Ic6fcee7059aa4e6237e81e2a702adca6a26bcdc6
Change-Id: I1a79f3e6361fece67bee02fb55a62849d650b15d
Change-Id: Ib53689bd8e7eed86af1cff01a97eb3d6592fe9e4
The attributes changes were left out of Clang 17. Attributes that used to take a string literal now accept an unevaluated string literal instead, which means they reject numeric escape sequences and strings literal with an encoding prefix - but the later was already ill-formed in most cases. We need to know that we are going to parse an unevaluated string literal before we do - so we can reject numeric escape sequence, so we derive from Attrs.td which attributes parameters are expected to be string literals. Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D156237 Change-Id: I8dcf4c4de75a3f7b089d04cf25b9a20682fa72ff
…source_symbol' and 'uuid' attributes as unevaluated This is a complementary to D156237. These attributes have custom parsing logic. Reviewed By: cor3ntin Differential Revision: https://reviews.llvm.org/D159024 Change-Id: Icb6d3e0f9ea02b4058a567e5c998be71a3aea7c2
Change-Id: Ie972cb4507ff7b20727e7d1ce7275b18786f2efb
Change-Id: Ia0a6077c7a6eaef44d8a915123e11b4cf24489b0
Change-Id: I3c8e3576cef0a1cbf77bc7e76c0af527938128de
Change-Id: I17ec52cc181708d1134949c66781d2ae0be8ba9f
Change-Id: Iad8bd30ae767ce015d6d39987c7e8e4b07d186df
Change-Id: Id69edef44c1f1fb3eefe9a71413a00c831ba10ee
Revert "[OpenMPOpt] Allow indirect calls in AAKernelInfoCallSite (#65836)" Change-Id: I790c81ab7d92e0f828e81535cb131c6c45248138
Only a subset of the fields of DbgVariable are meaningful at any time, and some fields are re-used for multiple purposes (for example FrameIndexExprs is used with a throw-away frame-index of 0 to hold a single DIExpression without needing to add another member). The exact invariants must be reverse-engineered by inspecting the actual use of the class, its imprecise/outdated doc-comment, and some asserts. Refactor DbgVariable into a sum type by inheriting from std::variant. This makes the active fields for any given state explicit and removes the need to re-use fields in disparate contexts. As a bonus, it seems to reduce the size on my x86_64 linux box from 144 bytes to 96 bytes. There is some potential cost to `std::get` as it must check the active alternative even when context or an assert obviates it. To try to help ensure the compiler can optimize out the checks the patch also adds a helper `get` method which uses the noexcept `std::get_if`. Some of the extra cost would also be avoided more cleanly with a refactor that exposes the alternative types in the public interface, which will come in another patch. Differential Revision: https://reviews.llvm.org/D158675 [NFC][AsmPrinter] Remove dead multi-MMI handling from DwarfFile::addScopeVariable Differential Revision: https://reviews.llvm.org/D158676 [NFC][AsmPrinter] Expose std::variant-ness of DbgVariable Differential Revision: https://reviews.llvm.org/D158677 [NFC][AsmPrinter] Use std::visit in constructVariableDIEImpl This potentially has a slightly positive performance impact, as std::visit can be implemented as a `switch`-like jump rather than a series of `if`s. More importantly, the reader can be confident is no overlap between the cases. Differential Revision: https://reviews.llvm.org/D158678 Change-Id: Ie5b1fead7b4a4407f73b295530e46e5ce37f638e
Change-Id: Ifa714fcf388fb2cb35410a9b6e2e2f38257d07e6
- device-libs - Move amdgcn to lib/llvm/lib/clang/<ver>/lib/amdgcn - Create symlink amdgcn -> lib/llvm/lib/clang/<ver>/lib/amdgcn Change-Id: I9d0715c966fd962bfcbda8815ab8966f780b2268
Change-Id: Ief661bcea6c0205a4c0ba51aded0bc2b74ad4b10
…ad slices Second try at A-Wadhwani's https://reviews.llvm.org/D132096, which was reverted. The original patch had three issues: * https://reviews.llvm.org/D134032, which bjope kindly fixed. That patch is merged into this one. * [GHI #57796](llvm/llvm-project#57796). Fixed and added a test. * [GHI #57821](llvm/llvm-project#57821). I believe this is an undefined behavior which is not the fault of the original patch. Please see the issue for more details. Original diff summary: This patch adds additional vector types to be considered when doing promotion in SROA, based on the types of the store and load slices. This provides more promotion opportunities, by potentially using an optimal "intermediate" vector type. For example, the following code would currently not be promoted to a vector, since `__m128i` is a `<2 x i64>` vector. ``` __m128i packfoo0(int a, int b, int c, int d) { int r[4] = {a, b, c, d}; __m128i rm; std::memcpy(&rm, r, sizeof(rm)); return rm; } ``` ``` packfoo0(int, int, int, int): mov dword ptr [rsp - 24], edi mov dword ptr [rsp - 20], esi mov dword ptr [rsp - 16], edx mov dword ptr [rsp - 12], ecx movaps xmm0, xmmword ptr [rsp - 24] ret ``` By also considering the types of the elements, we could find that the `<4 x i32>` type would be valid for promotion, hence removing the memory accesses for this function. In other words, we can explore other new vector types, with the same size but different element types based on the load and store instructions from the Slices, which can provide us more promotion opportunities. Additionally, the step for removing duplicate elements from the `CandidateTys` vector was not using an equality comparator, which has been fixed. Differential Revision: https://reviews.llvm.org/D143225 Change-Id: I5b75f0a6ca59bc55af5202b0cb9d1641072cc95c
Fix a crash when compiling Skia. See https://reviews.llvm.org/D143225#4180342 for more details Change-Id: I0779cbaa76f12ccf2327e234c19970ae9d3d2272
Change-Id: I1e4983691ff7eb6457484e8ccace2ded0084a4c1
…oad's config files. They will be added again if they are made publicly available. In the meantime, HSA is used to detect all kinds of GFX94* devices. Change-Id: If2fcd3b3d4fff66115f31202eced08d9472cb673
Fixed assertion failure Basic Block in function 'main' does not have terminator! label %land.end caused by premature setting of CodeGenIP upon entry to emitTargetDataCalls, where subsequent evaluation of logical expression created new basic blocks, leaving CodeGenIP pointing to the wrong basic block. CodeGenIP is now set near the end of the function, just prior to generating a comparison of the logical expression result (from the if clause) which uses CodeGenIP to insert new IR. Fixes SWDEV-422794/AOMP issue #601 Test already exists in smoke-fail/issue601_if_clause Change-Id: I792141db01b0f030705ec0742c9d9fb1255f036a
This implements the following event types: * DeviceInitialize * DeviceLoad * Target * TargetDataOp * TargetSubmit Add class equality operators Adapt CTORs for more convenient manual usage Fix errors in toString methods Change-Id: Id335e412a3c90bdc5ea1f691290c1bc84012b51c
Change-Id: I14f4df348163a9d019527f001a9df3d8a4c305b4 This patch gets us one step closer to being able to run check-openmp. Summary of changes: - enable <TRIPLE>-LTO tests for non-AMDGPU architectures. - compile OpenMP offloading tests using the AOMP pattern which specifies -Xopenmp-target and -march, - enable the compiler selection to use the installed version of AOMP instead of using AOMP directly from the build folder - enables a way to run check-openmp for AOMP: AOMP=<path to rocm folder> AOMP_GPU=gfx90a make check-openmp - Overall, this brings down the number of check-openmp fails from 243 to 108. Change-Id: I14f4df348163a9d019527f001a9df3d8a4c305b4
Change-Id: Ibf46e633bebbce5ef6067564e1e8ef9d605e2cf2
….amdgcn.ballot Change-Id: I9016008de8ffaabee40870fe1254a7b1a1eb13c7
…ith llvm.amdgcn.ballot" This reverts commit ac84482. premature Change-Id: I8fe8a0d7e5b30715e33c99bda0ed236e11cd5b4a
Change-Id: I828359e27d379e1dd2c1dcdbda7d1ca09dcbad00
…ack to detect AMD GPUs for which the PCI ID is yet unkown. - Added new command line argument -hsa which enables the HSA detection algorithm. - Removed method getRuntimeCapabilities (no call-sites) - TODO: I will move the method isHomogeniousSystemOf to the plugins. I don't see any reason for this method to be implemented in OffloadArch. The method doesn't work if the PCI ID of a GPU is unknown. Change-Id: Ia0fd44f6d5786eaf513296ac4d731c00d92170d6
Change-Id: I0a0c54ac90501f9b5d0b156259eeec021422c0a1
xfail: clang/test/Driver/cl-offload.cu xfail: llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx11.s Change-Id: Ibca6ffdf3e6b2b0fee56bfe40b391920b0025351
Change-Id: I9419e45a805a02321ec4c958ba8cf26b471d27cb
Change-Id: I9d34efc5eda2ab4ba2b0b57d2310c20fa755bfb1
Change-Id: Ie8605f374a310448a9b7641353842524b2fac184
Change-Id: I47fb7c186f71a990c4445f28869f39df6d8288d1
Change-Id: I1151032336b181c744802c2c2e710025fe14a8a4
Change-Id: Ie4bec760ec88f7762d84a74535576420422b6af5
locally : reverts d3921e4 [OpenMP] Basic BumpAllocator for (AMD)GPUs (#69806) Change-Id: Id512e729870279855744ce65bfc69e2155fb68ee
Change-Id: I9ba82077bfaf2baf4076840f90d133c617fa3605
Change-Id: I86584ffca31489ab151a64cda1e10b99347d4a1e
sigh |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.