-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Infra] Migrate rest of linux builder workflows off GCP runners. #18511
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: saienduri <[email protected]>
Progress on #15332. This uses a new `cpubuilder_ubuntu_jammy_x86_64` dockerfile from https://github.com/iree-org/base-docker-images. This stops using the remote cache that is hosted on GCP. Build time _without a cache_ is about 20 minutes on current runners, while build _with a cache_ is closer to 10 minutes. Build time without a cache is closer to 28-30 minutes on new runners. We can try adding back a cache using GitHub or our own hosted storage. I tried to continue using the previous cache during this transition period, but the `gcloud` command needs to run on the host, and I'd like to stop using the `docker_run.sh` script. I'm hoping we can keep folding away this sort of complexity by having the build machines run a dockerfile that includes key environment components like utility tools and any needed authorization/secrets (see #18238). ci-exactly: linux_x64_clang
Progress on #15332. I'm trying to get rid of the `docker_run.sh` scripts, replacing them with GitHub's `container:` feature. While local development flows _may_ want to use Docker like the CI workflows do, those scripts contained a lot of special handling and file mounting to be compatible with Bazel. Much of that is not needed for CMake and can be folded away, though the `--privileged` option needed here is one exception. This stops using the remote cache that is hosted on GCP. We can try adding back a cache using GitHub or our own hosted storage as part of #18238. Job | Cache? | Runner cluster | Time | Logs -- | -- | -- | -- | -- ASan | Cache | GCP runners | 14 minutes | [logs](https://github.com/iree-org/iree/actions/runs/10620030527/job/29438925064) ASan | No cache | GCP runners | 28 minutes | [logs](https://github.com/iree-org/iree/actions/runs/10605848397/job/29395467181) ASan | Cache | Azure runners | (not configured yet) ASan | No cache | Azure runners | 35 minutes | [logs](https://github.com/iree-org/iree/actions/runs/10621238709/job/29442788013?pr=18396) | | | TSan | Cache | GCP runners | 12 minutes | [logs](https://github.com/iree-org/iree/actions/runs/10612418711/job/29414025939) TSan | No cache | GCP runners | 21 minutes | [logs](https://github.com/iree-org/iree/actions/runs/10605848414/job/29395467002) TSan | Cache | Azure runners | (not configured yet) TSan | No cache | Azure runners | 32 minutes | [logs](https://github.com/iree-org/iree/actions/runs/10621238738/job/29442788341?pr=18396) ci-exactly: linux_x64_clang_asan
Following iree-org/base-docker-images#6, the new cpubuilder dockerfile should have all the software needed for ASan and TSan building + testing (specifically `clang-19` instead of just `clang-14`). Progress on #15332. The only remaining uses of `gcr.io/iree-oss/base.*` are: * `build_test_all_bazel` uses `gcr.io/iree-oss/base-bleeding-edge` * `publish_website` uses `gcr.io/iree-oss/base` * arm64 workflows use `gcr.io/iree-oss/base-arm64` * `gcr.io/iree-oss/emscripten` (used by web test workflows) depends on `gcr.io/iree-oss/base`
Signed-off-by: saienduri <[email protected]>
Signed-off-by: Elias Joseph <[email protected]>
Implemented caching with Azure containers using sccache, only works when merging from a branch ci-exactly: linux_x64_clang
Signed-off-by: Elias Joseph <[email protected]>
Signed-off-by: saienduri <[email protected]>
Progress on #15332. This uses a new `cpubuilder_ubuntu_jammy_x86_64` dockerfile from https://github.com/iree-org/base-docker-images. This stops using the remote cache that is hosted on GCP. Build time _without a cache_ is about 20 minutes on current runners, while build _with a cache_ is closer to 10 minutes. Build time without a cache is closer to 28-30 minutes on new runners. We can try adding back a cache using GitHub or our own hosted storage. I tried to continue using the previous cache during this transition period, but the `gcloud` command needs to run on the host, and I'd like to stop using the `docker_run.sh` script. I'm hoping we can keep folding away this sort of complexity by having the build machines run a dockerfile that includes key environment components like utility tools and any needed authorization/secrets (see #18238). ci-exactly: linux_x64_clang Signed-off-by: saienduri <[email protected]>
Progress on #15332. I'm trying to get rid of the `docker_run.sh` scripts, replacing them with GitHub's `container:` feature. While local development flows _may_ want to use Docker like the CI workflows do, those scripts contained a lot of special handling and file mounting to be compatible with Bazel. Much of that is not needed for CMake and can be folded away, though the `--privileged` option needed here is one exception. This stops using the remote cache that is hosted on GCP. We can try adding back a cache using GitHub or our own hosted storage as part of #18238. Job | Cache? | Runner cluster | Time | Logs -- | -- | -- | -- | -- ASan | Cache | GCP runners | 14 minutes | [logs](https://github.com/iree-org/iree/actions/runs/10620030527/job/29438925064) ASan | No cache | GCP runners | 28 minutes | [logs](https://github.com/iree-org/iree/actions/runs/10605848397/job/29395467181) ASan | Cache | Azure runners | (not configured yet) ASan | No cache | Azure runners | 35 minutes | [logs](https://github.com/iree-org/iree/actions/runs/10621238709/job/29442788013?pr=18396) | | | TSan | Cache | GCP runners | 12 minutes | [logs](https://github.com/iree-org/iree/actions/runs/10612418711/job/29414025939) TSan | No cache | GCP runners | 21 minutes | [logs](https://github.com/iree-org/iree/actions/runs/10605848414/job/29395467002) TSan | Cache | Azure runners | (not configured yet) TSan | No cache | Azure runners | 32 minutes | [logs](https://github.com/iree-org/iree/actions/runs/10621238738/job/29442788341?pr=18396) ci-exactly: linux_x64_clang_asan Signed-off-by: saienduri <[email protected]>
Following iree-org/base-docker-images#6, the new cpubuilder dockerfile should have all the software needed for ASan and TSan building + testing (specifically `clang-19` instead of just `clang-14`). Progress on #15332. The only remaining uses of `gcr.io/iree-oss/base.*` are: * `build_test_all_bazel` uses `gcr.io/iree-oss/base-bleeding-edge` * `publish_website` uses `gcr.io/iree-oss/base` * arm64 workflows use `gcr.io/iree-oss/base-arm64` * `gcr.io/iree-oss/emscripten` (used by web test workflows) depends on `gcr.io/iree-oss/base` Signed-off-by: saienduri <[email protected]>
Signed-off-by: saienduri <[email protected]>
Signed-off-by: Elias Joseph <[email protected]> Signed-off-by: saienduri <[email protected]>
saienduri
force-pushed
the
shared/runner-cluster-migration
branch
from
September 12, 2024 18:25
9f06194
to
d43fc31
Compare
…ster-migration-add-sccache-to-workflows
ScottTodd
reviewed
Sep 12, 2024
This fixes the issue shown here: https://github.com/iree-org/iree/actions/runs/10800586282/job/29958939877#step:6:16 where `SCCACHE_AZURE_CONNECTION_STRING` is defined to empty string instead of being undefined. It also introduces a config script, as discussed here: #18489 (comment). We may still want to limit writing to the cache to `push` events. ci-exactly: linux_x64_clang
ScottTodd
reviewed
Sep 12, 2024
| Workflow | Un-cached | Cached | | ------------- | ------------- | ------------- | | Clang | 15m57s | 10m12s | | Clang ASan | NA | 15m7s | | Clang TSan | NA | 9m42s | | Clang Debug | 12m19s | 8m15s |
ScottTodd
reviewed
Sep 13, 2024
Also testing these workflows with PRs from a fork.
This reverts commit 7388608.
ScottTodd
approved these changes
Sep 13, 2024
Some of these tests are taking 30-60 seconds on new runner machines under ASan/TSan, getting close to the 60 second timeout. Increase the timeout to 5 minutes. We could also do something TSan/ASan-specific here, but developers running the tests on slower systems can also benefit from these timeout changes.
ScottTodd
requested review from
stellaraccident and
benvanik
as code owners
September 13, 2024 20:19
raikonenfnu
pushed a commit
to raikonenfnu/iree
that referenced
this pull request
Sep 16, 2024
…e-org#18511) This commit is part of this larger issue that is tracking our migration off the GCP runners, storage buckets, etc: iree-org#18238. This builds on iree-org#18381, which migrated * `linux_x86_64_release_packages` * `linux_x64_clang_debug` * `linux_x64_clang_tsan` Here, we move over the rest of the critical linux builder workflows off of the GCP runners: * `linux_x64_clang` * `linux_x64_clang_asan` This also drops all CI usage of the GCP cache (`http://storage.googleapis.com/iree-sccache/ccache`). Some workflows now use sccache backed by Azure Blob Storage as a replacement. There are few issues with this (mozilla/sccache#2258) that prevent us providing read only access to the cache in PRs created from forks, so **PRs from forks currently don't use the cache and will have slower builds**. We're covering for this slowdown by using larger runners, but if we can roll out caching to all builds then we might use runners with fewer cores. Along with the changes to the cache, usage of Docker is rebased on images in the https://github.com/iree-org/base-docker-images/ repo and the `build_tools/docker/docker_run.sh` script is now only used by unmigrated workflows (`linux_arm64_clang` and `build_test_all_bazel`). --------- Signed-off-by: saienduri <[email protected]> Signed-off-by: Elias Joseph <[email protected]> Co-authored-by: Scott Todd <[email protected]> Co-authored-by: Elias Joseph <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This commit is part of this larger issue that is tracking our migration off the GCP runners, storage buckets, etc: #18238.
This builds on #18381, which migrated
linux_x86_64_release_packages
linux_x64_clang_debug
linux_x64_clang_tsan
Here, we move over the rest of the critical linux builder workflows off of the GCP runners:
linux_x64_clang
linux_x64_clang_asan
This also drops all CI usage of the GCP cache (
http://storage.googleapis.com/iree-sccache/ccache
). Some workflows now use sccache backed by Azure Blob Storage as a replacement. There are few issues with this (mozilla/sccache#2258) that prevent us providing read only access to the cache in PRs created from forks, so PRs from forks currently don't use the cache and will have slower builds. We're covering for this slowdown by using larger runners, but if we can roll out caching to all builds then we might use runners with fewer cores.Along with the changes to the cache, usage of Docker is rebased on images in the https://github.com/iree-org/base-docker-images/ repo and the
build_tools/docker/docker_run.sh
script is now only used by unmigrated workflows (linux_arm64_clang
andbuild_test_all_bazel
).