Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow specifying libcxx builder image. #110303

Merged
merged 2 commits into from
Nov 5, 2024
Merged

Conversation

EricWF
Copy link
Member

@EricWF EricWF commented Sep 27, 2024

This change attempts to shift the libc++ builders over to new backend
infrastructure that allows running an arbitrary container for the
libc++ job.

This has been a long time in the making, and support from github
and gke is finally at the point where it's possible (hopefully).

This change should also demonstrate another important property:
No Downtime Upgrades.

If this goes well, we'll be able to test the upgrade as a part
of the PR process, and then commiting it to main should (ideally)
not break anything.

@llvmbot llvmbot added libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. github:workflow labels Sep 27, 2024
@EricWF EricWF requested a review from ldionne September 27, 2024 17:26
@llvmbot
Copy link

llvmbot commented Sep 27, 2024

@llvm/pr-subscribers-lldb
@llvm/pr-subscribers-libcxx

@llvm/pr-subscribers-github-workflow

Author: Eric (EricWF)

Changes

This change attempts to shift the libc++ builders over to new backend
infrastructure that allows running an arbitrary container for the
libc++ job.

This has been a long time in the making, and support from github
and gke is finally at the point where it's possible (hopefully).

This change should also demonstrate another important property:
No Downtime Upgrades.

If this goes well, we'll be able to test the upgrade as a part
of the PR process, and then commiting it to main should (ideally)
not break anything.


Full diff: https://github.com/llvm/llvm-project/pull/110303.diff

1 Files Affected:

  • (modified) .github/workflows/libcxx-build-and-test.yaml (+11-8)
diff --git a/.github/workflows/libcxx-build-and-test.yaml b/.github/workflows/libcxx-build-and-test.yaml
index b5e60781e00064..64855dad7197da 100644
--- a/.github/workflows/libcxx-build-and-test.yaml
+++ b/.github/workflows/libcxx-build-and-test.yaml
@@ -49,7 +49,8 @@ env:
 jobs:
   stage1:
     if: github.repository_owner == 'llvm'
-    runs-on: libcxx-runners-8-set
+    runs-on: libcxx-runners-set
+    container: ghcr.io/libcxx/actions-builder:testing-2024-09-21
     continue-on-error: false
     strategy:
       fail-fast: false
@@ -84,7 +85,8 @@ jobs:
             **/crash_diagnostics/*
   stage2:
     if: github.repository_owner == 'llvm'
-    runs-on: libcxx-runners-8-set
+    runs-on: libcxx-runners-set
+    container: ghcr.io/libcxx/actions-builder:testing-2024-09-21
     needs: [ stage1 ]
     continue-on-error: false
     strategy:
@@ -160,20 +162,21 @@ jobs:
           'benchmarks',
           'bootstrapping-build'
         ]
-        machine: [ 'libcxx-runners-8-set' ]
+        machine: [ 'libcxx-runners-set' ]
         include:
         - config: 'generic-cxx26'
-          machine: libcxx-runners-8-set
+          machine: libcxx-runners-set
         - config: 'generic-asan'
-          machine: libcxx-runners-8-set
+          machine: libcxx-runners-set
         - config: 'generic-tsan'
-          machine: libcxx-runners-8-set
+          machine: libcxx-runners-set
         - config: 'generic-ubsan'
-          machine: libcxx-runners-8-set
+          machine: libcxx-runners-set
         # Use a larger machine for MSAN to avoid timeout and memory allocation issues.
         - config: 'generic-msan'
-          machine: libcxx-runners-8-set
+          machine: libcxx-runners-set
     runs-on: ${{ matrix.machine }}
+    container: ghcr.io/libcxx/actions-builder:testing-2024-09-21
     steps:
       - uses: actions/checkout@v4
       - name: ${{ matrix.config }}

@ldionne
Copy link
Member

ldionne commented Sep 27, 2024

This is amazing!

About the tests, I am not certain why the transitive includes test started failing, but I ran into something similar in #109720. I think this may be how we're running awk or something like that (I wasn't able to reproduce).

@ldionne
Copy link
Member

ldionne commented Sep 30, 2024

@EricWF I am trying to debug and fix this CI failure in #110554

Copy link

github-actions bot commented Oct 2, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

ldionne added a commit that referenced this pull request Oct 22, 2024
Since we don't generate a full dependency graph of headers, we can
greatly simplify the script that parses the result of --trace-includes.

At the same time, we also unify the mechanism for detecting whether a
header is a public/C compat/internal/etc header with the existing
mechanism in header_information.py.

As a drive-by this fixes the headers_in_modulemap.sh.py test which had
been disabled by mistake because it used its own way of determining
the list of libc++ headers. By consistently using header_information.py
to get that information, problems like this shouldn't happen anymore.

This should also unblock #110303, which was blocked because of
a brittle implementation of the transitive includes check which broke
when the repository was cloned at a path like /path/__something/more.
@EricWF
Copy link
Member Author

EricWF commented Oct 22, 2024

How do I constantly fudge up my git history....

Fixing and force-pushing shortly.

EricWF pushed a commit to efcs/llvm-project that referenced this pull request Oct 22, 2024
Since we don't generate a full dependency graph of headers, we can
greatly simplify the script that parses the result of --trace-includes.

At the same time, we also unify the mechanism for detecting whether a
header is a public/C compat/internal/etc header with the existing
mechanism in header_information.py.

As a drive-by this fixes the headers_in_modulemap.sh.py test which had
been disabled by mistake because it used its own way of determining
the list of libc++ headers. By consistently using header_information.py
to get that information, problems like this shouldn't happen anymore.

This should also unblock llvm#110303, which was blocked because of
a brittle implementation of the transitive includes check which broke
when the repository was cloned at a path like /path/__something/more.
@EricWF EricWF marked this pull request as draft October 22, 2024 18:37
@EricWF EricWF removed the request for review from a team October 22, 2024 18:38
@Michael137
Copy link
Member

Thanks for your fast help with this. Re-basing and rerunning now.

Np! Haven't merged it yet though. Just waiting for CI to pass

@Michael137
Copy link
Member

Michael137 commented Oct 24, 2024

Hmm am I reading this right that the latest run still failed, despite the cherry-pick?
EDIT: Oh nvm, the change didn't seem to kick in yet:

 {
    "arguments": {
      "commandEscapePrefix": null,
      "disableASLR": true,
      "displayExtendedBacktrace": false,
      "enableAutoVariableSummaries": false,
      "enableSyntheticChildDebugging": false,
      "initCommands": [
        "settings clear -all",
        "settings set symbols.enable-external-lookup false",
        "settings set target.inherit-tcc true",
        "settings set target.disable-aslr false",

Ignore me

Michael137 added a commit that referenced this pull request Oct 25, 2024
When running in constrained environments like docker, disabling ASLR
might fail with errors like:
```
AssertionError: False is not true : launch failed (Cannot launch
'/__w/.../lldb-dap/stackTrace/subtleFrames/TestDAP_subtleFrames.test_subtleFrames/a.out':
personality set failed: Operation not permitted)
```
E.g., #110303

Hence we already run `settings set target.disable-aslr false` as part of
the init-commands for the non-DAP tests (see
#88312 and
https://discourse.llvm.org/t/running-lldb-in-a-container/76801).

But we never adjusted it for the DAP tests. As a result we get
conflicting test logs like:
```
 {
    "arguments": {
      "commandEscapePrefix": null,
      "disableASLR": true,
     ....
      "initCommands": [
        ...
        "settings set target.disable-aslr false",
```

Disabling ASLR by default in tests isn't useulf (it's only really a
debugging aid for users). So this patch sets `disableASLR=False` by
default.
@Michael137
Copy link
Member

Michael137 commented Oct 25, 2024

FYI, had to adjust the flag in one other place. Feel free to rebase the branch on main. I merged the changes. Let me know if the CI still fails

@ldionne
Copy link
Member

ldionne commented Oct 25, 2024

It looks like it's still failing with the latest run :-(

@Michael137
Copy link
Member

Michael137 commented Oct 25, 2024

It looks like it's still failing with the latest run :-(

Argh that's unfortunate. How about we skip this test in libc++ CI to unblock this PR and I'll open a github issue to re-enable the test?

@EricWF It's probably easiest if you just add --filter-out=TestDAP_subtleFrames.py to the following LIT invocation as part of this PR:

${BUILD_DIR}/bin/llvm-lit -sv --param dotest-args='--category libc++' "${MONOREPO_ROOT}/lldb/test/API"

But if you prefer me doing it separately, let me know.

@EricWF
Copy link
Member Author

EricWF commented Oct 25, 2024

It looks like it's still failing with the latest run :-(

Argh that's unfortunate. How about we skip this test in libc++ CI to unblock this PR and I'll open a github issue to re-enable the test?

@EricWF It's probably easiest if you just add --filter-out=TestDAP_subtleFrames.py to the following LIT invocation as part of this PR:

${BUILD_DIR}/bin/llvm-lit -sv --param dotest-args='--category libc++' "${MONOREPO_ROOT}/lldb/test/API"

But if you prefer me doing it separately, let me know.

I have concerns about using the run-buildbot file to hide failing tests. I'll hold off on this change a little longer.

@Michael137
Copy link
Member

It looks like it's still failing with the latest run :-(

Argh that's unfortunate. How about we skip this test in libc++ CI to unblock this PR and I'll open a github issue to re-enable the test?
@EricWF It's probably easiest if you just add --filter-out=TestDAP_subtleFrames.py to the following LIT invocation as part of this PR:

${BUILD_DIR}/bin/llvm-lit -sv --param dotest-args='--category libc++' "${MONOREPO_ROOT}/lldb/test/API"

But if you prefer me doing it separately, let me know.

I have concerns about using the run-buildbot file to hide failing tests. I'll hold off on this change a little longer.

That's fair. In that case, @walter-erquinigo @clayborg Do you have any ideas on how to best debug this?

Summary: the TestDAP_subtleFrames.py test is failing when run in a container (in a container):

AssertionError: False is not true : launch failed (Cannot launch '/__w/.../lldb-dap/stackTrace/subtleFrames/TestDAP_subtleFrames.test_subtleFrames/a.out': personality set failed: Operation not permitted)

Our theory was that this happened when trying to disable ASLR. So we're no longer doing that for the DAP tests. But we're still failing with the above.

I'll try raise a draft PR that mimics this but with some additional LLDB logging.

This change attempts to shift the libc++ builders over to new backend
infrastructure that allows running an arbitrary container for the
libc++ job.

This has been a long time in the making, and support from github
and gke is finally at the point where it's possible (hopefully).

This change should also demonstrate another important property:
No Downtime Upgrades.

If this goes well, we'll be able to test the upgrade as a part
of the PR process, and then commiting it to main should (ideally)
not break anything.
@EricWF EricWF removed the request for review from JDevlieghere October 25, 2024 21:08
@Michael137
Copy link
Member

Hmm so I opened a draft PR with this change and explicitly set disableASLR on the DAP server and the tests seemed to pass: #113891

With server patch: https://github.com/llvm/llvm-project/actions/runs/11552969549/job/32154860810?pr=113891
Without server patch: https://github.com/llvm/llvm-project/actions/runs/11557069616/job/32168537938?pr=113891

So it does look like this is still disableASLR related. I don't know why simply passing disableASLR to the server isn't doing the expected thing. Will have to investigate...

@Michael137
Copy link
Member

Ooh that's because it's hardcoded in the lldb-dap executable:

if (GetBoolean(arguments, "disableASLR", true))
flags |= lldb::eLaunchFlagDisableASLR;

Fix should be simple enough. Just need to always pass the disableASLR value from Python, regardless of whether it's set to True or False

Michael137 added a commit to Michael137/llvm-project that referenced this pull request Oct 28, 2024
More context can be found in
llvm#110303

For DAP tests running in constrained environments (e.g., Docker
containers), disabling ASLR isn't allowed. So we set `disableASLR=False`
(since llvm#113593).

However, the `dap_server.py` will currently only forward the value
of `disableASLR` to the DAP executable if it's set to `True`. If the
DAP executable wasn't provided a `disableASLR` field it defaults to
`true` too
(https://github.com/llvm/llvm-project/blob/f14743794587db102c6d1b20f9c87a1ac20decfd/lldb/tools/lldb-dap/lldb-dap.cpp#L2103-L2104).

This means that passing `disableASLR=False` from the tests is currently
not possible.

This is also true for many of the other boolean arguments of
`request_launch`. But this patch only addresses `disableASLR` for now
since it's blocking a libc++ patch.
@Michael137
Copy link
Member

#113891

Michael137 added a commit to Michael137/llvm-project that referenced this pull request Oct 28, 2024
More context can be found in
llvm#110303

For DAP tests running in constrained environments (e.g., Docker
containers), disabling ASLR isn't allowed. So we set `disableASLR=False`
(since llvm#113593).

However, the `dap_server.py` will currently only forward the value
of `disableASLR` to the DAP executable if it's set to `True`. If the
DAP executable wasn't provided a `disableASLR` field it defaults to
`true` too
(https://github.com/llvm/llvm-project/blob/f14743794587db102c6d1b20f9c87a1ac20decfd/lldb/tools/lldb-dap/lldb-dap.cpp#L2103-L2104).

This means that passing `disableASLR=False` from the tests is currently
not possible.

This is also true for many of the other boolean arguments of
`request_launch`. But this patch only addresses `disableASLR` for now
since it's blocking a libc++ patch.
Michael137 added a commit that referenced this pull request Oct 29, 2024
More context can be found in
#110303

For DAP tests running in constrained environments (e.g., Docker
containers), disabling ASLR isn't allowed. So we set `disableASLR=False`
(since #113593).

However, the `dap_server.py` will currently only forward the value
of `disableASLR` to the DAP executable if it's set to `True`. If the
DAP executable wasn't provided a `disableASLR` field it defaults to
`true` too:
https://github.com/llvm/llvm-project/blob/f14743794587db102c6d1b20f9c87a1ac20decfd/lldb/tools/lldb-dap/lldb-dap.cpp#L2103-L2104

This means that passing `disableASLR=False` from the tests is currently
not possible.

This is also true for many of the other boolean arguments of
`request_launch`. But this patch only addresses `disableASLR` for now
since it's blocking a libc++ patch.
@Michael137
Copy link
Member

Just merged the fix. Let me know if you're still facing issues after the rebase

@EricWF
Copy link
Member Author

EricWF commented Oct 31, 2024

@Michael137 Thanks for addressing this. I really appreciate it.

NoumanAmir657 pushed a commit to NoumanAmir657/llvm-project that referenced this pull request Nov 4, 2024
When running in constrained environments like docker, disabling ASLR
might fail with errors like:
```
AssertionError: False is not true : launch failed (Cannot launch
'/__w/.../lldb-dap/stackTrace/subtleFrames/TestDAP_subtleFrames.test_subtleFrames/a.out':
personality set failed: Operation not permitted)
```
E.g., llvm#110303

Hence we already run `settings set target.disable-aslr false` as part of
the init-commands for the non-DAP tests (see
llvm#88312 and
https://discourse.llvm.org/t/running-lldb-in-a-container/76801).

But we never adjusted it for the DAP tests. As a result we get
conflicting test logs like:
```
 {
    "arguments": {
      "commandEscapePrefix": null,
      "disableASLR": true,
     ....
      "initCommands": [
        ...
        "settings set target.disable-aslr false",
```

Disabling ASLR by default in tests isn't useulf (it's only really a
debugging aid for users). So this patch sets `disableASLR=False` by
default.
NoumanAmir657 pushed a commit to NoumanAmir657/llvm-project that referenced this pull request Nov 4, 2024
More context can be found in
llvm#110303

For DAP tests running in constrained environments (e.g., Docker
containers), disabling ASLR isn't allowed. So we set `disableASLR=False`
(since llvm#113593).

However, the `dap_server.py` will currently only forward the value
of `disableASLR` to the DAP executable if it's set to `True`. If the
DAP executable wasn't provided a `disableASLR` field it defaults to
`true` too:
https://github.com/llvm/llvm-project/blob/f14743794587db102c6d1b20f9c87a1ac20decfd/lldb/tools/lldb-dap/lldb-dap.cpp#L2103-L2104

This means that passing `disableASLR=False` from the tests is currently
not possible.

This is also true for many of the other boolean arguments of
`request_launch`. But this patch only addresses `disableASLR` for now
since it's blocking a libc++ patch.
@ldionne
Copy link
Member

ldionne commented Nov 4, 2024

@EricWF The CI failures are unrelated issues on main that have been fixed. It looks like everything is working now.

I'll let you merge this since you likely want to adjust the capacity and other stuff before or closely after you merge, but as far as I'm concerned this is good to go. Thanks a whole lot for this improvement!

@EricWF EricWF merged commit 97262af into llvm:main Nov 5, 2024
59 of 61 checks passed
@EricWF
Copy link
Member Author

EricWF commented Nov 5, 2024

@ldionne Squashed and merged. I'll be watching the bots closely
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
github:workflow libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. lldb
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants