Allow specifying libcxx builder image. #110303

EricWF · 2024-09-27T17:26:29Z

This change attempts to shift the libc++ builders over to new backend
infrastructure that allows running an arbitrary container for the
libc++ job.

This has been a long time in the making, and support from github
and gke is finally at the point where it's possible (hopefully).

This change should also demonstrate another important property:
No Downtime Upgrades.

If this goes well, we'll be able to test the upgrade as a part
of the PR process, and then commiting it to main should (ideally)
not break anything.

llvmbot · 2024-09-27T17:27:03Z

@llvm/pr-subscribers-lldb
@llvm/pr-subscribers-libcxx

@llvm/pr-subscribers-github-workflow

Author: Eric (EricWF)

Changes

This change attempts to shift the libc++ builders over to new backend
infrastructure that allows running an arbitrary container for the
libc++ job.

This has been a long time in the making, and support from github
and gke is finally at the point where it's possible (hopefully).

This change should also demonstrate another important property:
No Downtime Upgrades.

If this goes well, we'll be able to test the upgrade as a part
of the PR process, and then commiting it to main should (ideally)
not break anything.

Full diff: https://github.com/llvm/llvm-project/pull/110303.diff

1 Files Affected:

(modified) .github/workflows/libcxx-build-and-test.yaml (+11-8)

diff --git a/.github/workflows/libcxx-build-and-test.yaml b/.github/workflows/libcxx-build-and-test.yaml
index b5e60781e00064..64855dad7197da 100644
--- a/.github/workflows/libcxx-build-and-test.yaml
+++ b/.github/workflows/libcxx-build-and-test.yaml
@@ -49,7 +49,8 @@ env:
 jobs:
   stage1:
     if: github.repository_owner == 'llvm'
-    runs-on: libcxx-runners-8-set
+    runs-on: libcxx-runners-set
+    container: ghcr.io/libcxx/actions-builder:testing-2024-09-21
     continue-on-error: false
     strategy:
       fail-fast: false
@@ -84,7 +85,8 @@ jobs:
             **/crash_diagnostics/*
   stage2:
     if: github.repository_owner == 'llvm'
-    runs-on: libcxx-runners-8-set
+    runs-on: libcxx-runners-set
+    container: ghcr.io/libcxx/actions-builder:testing-2024-09-21
     needs: [ stage1 ]
     continue-on-error: false
     strategy:
@@ -160,20 +162,21 @@ jobs:
           'benchmarks',
           'bootstrapping-build'
         ]
-        machine: [ 'libcxx-runners-8-set' ]
+        machine: [ 'libcxx-runners-set' ]
         include:
         - config: 'generic-cxx26'
-          machine: libcxx-runners-8-set
+          machine: libcxx-runners-set
         - config: 'generic-asan'
-          machine: libcxx-runners-8-set
+          machine: libcxx-runners-set
         - config: 'generic-tsan'
-          machine: libcxx-runners-8-set
+          machine: libcxx-runners-set
         - config: 'generic-ubsan'
-          machine: libcxx-runners-8-set
+          machine: libcxx-runners-set
         # Use a larger machine for MSAN to avoid timeout and memory allocation issues.
         - config: 'generic-msan'
-          machine: libcxx-runners-8-set
+          machine: libcxx-runners-set
     runs-on: ${{ matrix.machine }}
+    container: ghcr.io/libcxx/actions-builder:testing-2024-09-21
     steps:
       - uses: actions/checkout@v4
       - name: ${{ matrix.config }}

ldionne · 2024-09-27T20:19:38Z

This is amazing!

About the tests, I am not certain why the transitive includes test started failing, but I ran into something similar in #109720. I think this may be how we're running awk or something like that (I wasn't able to reproduce).

ldionne · 2024-09-30T18:58:10Z

@EricWF I am trying to debug and fix this CI failure in #110554

github-actions · 2024-10-02T17:39:37Z

✅ With the latest revision this PR passed the C/C++ code formatter.

Since we don't generate a full dependency graph of headers, we can greatly simplify the script that parses the result of --trace-includes. At the same time, we also unify the mechanism for detecting whether a header is a public/C compat/internal/etc header with the existing mechanism in header_information.py. As a drive-by this fixes the headers_in_modulemap.sh.py test which had been disabled by mistake because it used its own way of determining the list of libc++ headers. By consistently using header_information.py to get that information, problems like this shouldn't happen anymore. This should also unblock #110303, which was blocked because of a brittle implementation of the transitive includes check which broke when the repository was cloned at a path like /path/__something/more.

EricWF · 2024-10-22T18:36:05Z

How do I constantly fudge up my git history....

Fixing and force-pushing shortly.

Since we don't generate a full dependency graph of headers, we can greatly simplify the script that parses the result of --trace-includes. At the same time, we also unify the mechanism for detecting whether a header is a public/C compat/internal/etc header with the existing mechanism in header_information.py. As a drive-by this fixes the headers_in_modulemap.sh.py test which had been disabled by mistake because it used its own way of determining the list of libc++ headers. By consistently using header_information.py to get that information, problems like this shouldn't happen anymore. This should also unblock llvm#110303, which was blocked because of a brittle implementation of the transitive includes check which broke when the repository was cloned at a path like /path/__something/more.

Michael137 · 2024-10-24T17:10:26Z

Thanks for your fast help with this. Re-basing and rerunning now.

Np! Haven't merged it yet though. Just waiting for CI to pass

Michael137 · 2024-10-24T21:02:40Z

Hmm am I reading this right that the latest run still failed, despite the cherry-pick?
EDIT: Oh nvm, the change didn't seem to kick in yet:

 {
    "arguments": {
      "commandEscapePrefix": null,
      "disableASLR": true,
      "displayExtendedBacktrace": false,
      "enableAutoVariableSummaries": false,
      "enableSyntheticChildDebugging": false,
      "initCommands": [
        "settings clear -all",
        "settings set symbols.enable-external-lookup false",
        "settings set target.inherit-tcc true",
        "settings set target.disable-aslr false",

Ignore me

When running in constrained environments like docker, disabling ASLR might fail with errors like: ``` AssertionError: False is not true : launch failed (Cannot launch '/__w/.../lldb-dap/stackTrace/subtleFrames/TestDAP_subtleFrames.test_subtleFrames/a.out': personality set failed: Operation not permitted) ``` E.g., #110303 Hence we already run `settings set target.disable-aslr false` as part of the init-commands for the non-DAP tests (see #88312 and https://discourse.llvm.org/t/running-lldb-in-a-container/76801). But we never adjusted it for the DAP tests. As a result we get conflicting test logs like: ``` { "arguments": { "commandEscapePrefix": null, "disableASLR": true, .... "initCommands": [ ... "settings set target.disable-aslr false", ``` Disabling ASLR by default in tests isn't useulf (it's only really a debugging aid for users). So this patch sets `disableASLR=False` by default.

Michael137 · 2024-10-25T11:07:15Z

FYI, had to adjust the flag in one other place. Feel free to rebase the branch on main. I merged the changes. Let me know if the CI still fails

ldionne · 2024-10-25T18:11:32Z

It looks like it's still failing with the latest run :-(

Michael137 · 2024-10-25T18:57:01Z

It looks like it's still failing with the latest run :-(

Argh that's unfortunate. How about we skip this test in libc++ CI to unblock this PR and I'll open a github issue to re-enable the test?

@EricWF It's probably easiest if you just add --filter-out=TestDAP_subtleFrames.py to the following LIT invocation as part of this PR:

llvm-project/libcxx/utils/ci/run-buildbot

Line 397 in 8c4bc1e

    
           ${BUILD_DIR}/bin/llvm-lit -sv --param dotest-args='--category libc++' "${MONOREPO_ROOT}/lldb/test/API"

But if you prefer me doing it separately, let me know.

EricWF · 2024-10-25T20:50:57Z

It looks like it's still failing with the latest run :-(

Argh that's unfortunate. How about we skip this test in libc++ CI to unblock this PR and I'll open a github issue to re-enable the test?

@EricWF It's probably easiest if you just add --filter-out=TestDAP_subtleFrames.py to the following LIT invocation as part of this PR:

llvm-project/libcxx/utils/ci/run-buildbot

Line 397 in 8c4bc1e

${BUILD_DIR}/bin/llvm-lit -sv --param dotest-args='--category libc++' "${MONOREPO_ROOT}/lldb/test/API"

But if you prefer me doing it separately, let me know.

I have concerns about using the run-buildbot file to hide failing tests. I'll hold off on this change a little longer.

Michael137 · 2024-10-25T21:02:16Z

It looks like it's still failing with the latest run :-(

Argh that's unfortunate. How about we skip this test in libc++ CI to unblock this PR and I'll open a github issue to re-enable the test?
@EricWF It's probably easiest if you just add --filter-out=TestDAP_subtleFrames.py to the following LIT invocation as part of this PR:

llvm-project/libcxx/utils/ci/run-buildbot

Line 397 in 8c4bc1e

${BUILD_DIR}/bin/llvm-lit -sv --param dotest-args='--category libc++' "${MONOREPO_ROOT}/lldb/test/API"

But if you prefer me doing it separately, let me know.

I have concerns about using the run-buildbot file to hide failing tests. I'll hold off on this change a little longer.

That's fair. In that case, @walter-erquinigo @clayborg Do you have any ideas on how to best debug this?

Summary: the TestDAP_subtleFrames.py test is failing when run in a container (in a container):

AssertionError: False is not true : launch failed (Cannot launch '/__w/.../lldb-dap/stackTrace/subtleFrames/TestDAP_subtleFrames.test_subtleFrames/a.out': personality set failed: Operation not permitted)

Our theory was that this happened when trying to disable ASLR. So we're no longer doing that for the DAP tests. But we're still failing with the above.

I'll try raise a draft PR that mimics this but with some additional LLDB logging.

This change attempts to shift the libc++ builders over to new backend infrastructure that allows running an arbitrary container for the libc++ job. This has been a long time in the making, and support from github and gke is finally at the point where it's possible (hopefully). This change should also demonstrate another important property: No Downtime Upgrades. If this goes well, we'll be able to test the upgrade as a part of the PR process, and then commiting it to main should (ideally) not break anything.

Michael137 · 2024-10-28T16:56:01Z

Hmm so I opened a draft PR with this change and explicitly set disableASLR on the DAP server and the tests seemed to pass: #113891

With server patch: https://github.com/llvm/llvm-project/actions/runs/11552969549/job/32154860810?pr=113891
Without server patch: https://github.com/llvm/llvm-project/actions/runs/11557069616/job/32168537938?pr=113891

So it does look like this is still disableASLR related. I don't know why simply passing disableASLR to the server isn't doing the expected thing. Will have to investigate...

Michael137 · 2024-10-28T17:03:18Z

Ooh that's because it's hardcoded in the lldb-dap executable:

llvm-project/lldb/tools/lldb-dap/lldb-dap.cpp

Lines 2103 to 2104 in f147437

    
           if (GetBoolean(arguments, "disableASLR", true)) 
        
             flags |= lldb::eLaunchFlagDisableASLR;

Fix should be simple enough. Just need to always pass the disableASLR value from Python, regardless of whether it's set to True or False

More context can be found in llvm#110303 For DAP tests running in constrained environments (e.g., Docker containers), disabling ASLR isn't allowed. So we set `disableASLR=False` (since llvm#113593). However, the `dap_server.py` will currently only forward the value of `disableASLR` to the DAP executable if it's set to `True`. If the DAP executable wasn't provided a `disableASLR` field it defaults to `true` too (https://github.com/llvm/llvm-project/blob/f14743794587db102c6d1b20f9c87a1ac20decfd/lldb/tools/lldb-dap/lldb-dap.cpp#L2103-L2104). This means that passing `disableASLR=False` from the tests is currently not possible. This is also true for many of the other boolean arguments of `request_launch`. But this patch only addresses `disableASLR` for now since it's blocking a libc++ patch.

Michael137 · 2024-10-28T17:14:01Z

#113891

More context can be found in llvm#110303 For DAP tests running in constrained environments (e.g., Docker containers), disabling ASLR isn't allowed. So we set `disableASLR=False` (since llvm#113593). However, the `dap_server.py` will currently only forward the value of `disableASLR` to the DAP executable if it's set to `True`. If the DAP executable wasn't provided a `disableASLR` field it defaults to `true` too (https://github.com/llvm/llvm-project/blob/f14743794587db102c6d1b20f9c87a1ac20decfd/lldb/tools/lldb-dap/lldb-dap.cpp#L2103-L2104). This means that passing `disableASLR=False` from the tests is currently not possible. This is also true for many of the other boolean arguments of `request_launch`. But this patch only addresses `disableASLR` for now since it's blocking a libc++ patch.

More context can be found in #110303 For DAP tests running in constrained environments (e.g., Docker containers), disabling ASLR isn't allowed. So we set `disableASLR=False` (since #113593). However, the `dap_server.py` will currently only forward the value of `disableASLR` to the DAP executable if it's set to `True`. If the DAP executable wasn't provided a `disableASLR` field it defaults to `true` too: https://github.com/llvm/llvm-project/blob/f14743794587db102c6d1b20f9c87a1ac20decfd/lldb/tools/lldb-dap/lldb-dap.cpp#L2103-L2104 This means that passing `disableASLR=False` from the tests is currently not possible. This is also true for many of the other boolean arguments of `request_launch`. But this patch only addresses `disableASLR` for now since it's blocking a libc++ patch.

Michael137 · 2024-10-29T18:40:53Z

Just merged the fix. Let me know if you're still facing issues after the rebase

EricWF · 2024-10-31T19:28:09Z

@Michael137 Thanks for addressing this. I really appreciate it.

When running in constrained environments like docker, disabling ASLR might fail with errors like: ``` AssertionError: False is not true : launch failed (Cannot launch '/__w/.../lldb-dap/stackTrace/subtleFrames/TestDAP_subtleFrames.test_subtleFrames/a.out': personality set failed: Operation not permitted) ``` E.g., llvm#110303 Hence we already run `settings set target.disable-aslr false` as part of the init-commands for the non-DAP tests (see llvm#88312 and https://discourse.llvm.org/t/running-lldb-in-a-container/76801). But we never adjusted it for the DAP tests. As a result we get conflicting test logs like: ``` { "arguments": { "commandEscapePrefix": null, "disableASLR": true, .... "initCommands": [ ... "settings set target.disable-aslr false", ``` Disabling ASLR by default in tests isn't useulf (it's only really a debugging aid for users). So this patch sets `disableASLR=False` by default.

More context can be found in llvm#110303 For DAP tests running in constrained environments (e.g., Docker containers), disabling ASLR isn't allowed. So we set `disableASLR=False` (since llvm#113593). However, the `dap_server.py` will currently only forward the value of `disableASLR` to the DAP executable if it's set to `True`. If the DAP executable wasn't provided a `disableASLR` field it defaults to `true` too: https://github.com/llvm/llvm-project/blob/f14743794587db102c6d1b20f9c87a1ac20decfd/lldb/tools/lldb-dap/lldb-dap.cpp#L2103-L2104 This means that passing `disableASLR=False` from the tests is currently not possible. This is also true for many of the other boolean arguments of `request_launch`. But this patch only addresses `disableASLR` for now since it's blocking a libc++ patch.

ldionne · 2024-11-04T16:07:08Z

@EricWF The CI failures are unrelated issues on main that have been fixed. It looks like everything is working now.

I'll let you merge this since you likely want to adjust the capacity and other stuff before or closely after you merge, but as far as I'm concerned this is good to go. Thanks a whole lot for this improvement!

EricWF · 2024-11-05T21:31:16Z

@ldionne Squashed and merged. I'll be watching the bots closely
.

llvmbot added libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. github:workflow labels Sep 27, 2024

EricWF requested a review from ldionne September 27, 2024 17:26

ldionne mentioned this pull request Sep 30, 2024

[libc++][WIP] Move to Github hosted builders #109720

Closed

This was referenced Oct 1, 2024

[libc++] Rewrite the transitive header checking machinery #110554

Merged

[libc++][CI] Upgrade compiler HEAD version to Clang-20 #108761

Draft

EricWF requested a review from a team as a code owner October 2, 2024 17:35

EricWF force-pushed the move-load-to-new-builders branch from e6e801a to 055dc12 Compare October 2, 2024 17:42

ldionne mentioned this pull request Oct 21, 2024

[libcxx][libc] Hand in Hand PoC with from_chars #91651

Merged

EricWF requested review from DeinAlptraum, daniel-grumberg, aaupov, maksfb, rafaelauler, ayermolo, dcci, lanza, bcardosolopes, Endilll and a team as code owners October 22, 2024 18:34

EricWF marked this pull request as draft October 22, 2024 18:37

EricWF removed the request for review from a team October 22, 2024 18:38

EricWF force-pushed the move-load-to-new-builders branch from 4b01f56 to ac41555 Compare October 24, 2024 17:34

EricWF requested a review from JDevlieghere as a code owner October 24, 2024 17:34

llvmbot added the lldb label Oct 24, 2024

EricWF force-pushed the move-load-to-new-builders branch from ac41555 to c557438 Compare October 25, 2024 14:05

EricWF force-pushed the move-load-to-new-builders branch from ed532af to 947e12d Compare October 25, 2024 21:06

EricWF removed the request for review from JDevlieghere October 25, 2024 21:08

Michael137 mentioned this pull request Oct 28, 2024

[lldb-dap] Always pass disableASLR to the DAP executable #113891

Merged

Merge branch 'main' into move-load-to-new-builders

08e4b24

EricWF merged commit 97262af into llvm:main Nov 5, 2024
59 of 61 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow specifying libcxx builder image. #110303

Allow specifying libcxx builder image. #110303

EricWF commented Sep 27, 2024

llvmbot commented Sep 27, 2024 •

edited

Loading

ldionne commented Sep 27, 2024

ldionne commented Sep 30, 2024

github-actions bot commented Oct 2, 2024 •

edited

Loading

EricWF commented Oct 22, 2024

Michael137 commented Oct 24, 2024

Michael137 commented Oct 24, 2024 •

edited

Loading

Michael137 commented Oct 25, 2024 •

edited

Loading

ldionne commented Oct 25, 2024

Michael137 commented Oct 25, 2024 •

edited

Loading

EricWF commented Oct 25, 2024

Michael137 commented Oct 25, 2024

Michael137 commented Oct 28, 2024

Michael137 commented Oct 28, 2024

Michael137 commented Oct 28, 2024

Michael137 commented Oct 29, 2024

EricWF commented Oct 31, 2024

ldionne commented Nov 4, 2024

EricWF commented Nov 5, 2024

Allow specifying libcxx builder image. #110303

Allow specifying libcxx builder image. #110303

Conversation

EricWF commented Sep 27, 2024

llvmbot commented Sep 27, 2024 • edited Loading

ldionne commented Sep 27, 2024

ldionne commented Sep 30, 2024

github-actions bot commented Oct 2, 2024 • edited Loading

EricWF commented Oct 22, 2024

Michael137 commented Oct 24, 2024

Michael137 commented Oct 24, 2024 • edited Loading

Michael137 commented Oct 25, 2024 • edited Loading

ldionne commented Oct 25, 2024

Michael137 commented Oct 25, 2024 • edited Loading

EricWF commented Oct 25, 2024

Michael137 commented Oct 25, 2024

Michael137 commented Oct 28, 2024

Michael137 commented Oct 28, 2024

Michael137 commented Oct 28, 2024

Michael137 commented Oct 29, 2024

EricWF commented Oct 31, 2024

ldionne commented Nov 4, 2024

EricWF commented Nov 5, 2024

llvmbot commented Sep 27, 2024 •

edited

Loading

github-actions bot commented Oct 2, 2024 •

edited

Loading

Michael137 commented Oct 24, 2024 •

edited

Loading

Michael137 commented Oct 25, 2024 •

edited

Loading

Michael137 commented Oct 25, 2024 •

edited

Loading