Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Infra] Migrate rest of linux builder workflows off GCP runners. #18511

Merged
merged 22 commits into from
Sep 13, 2024

Commits on Aug 29, 2024

  1. make workflows use cluster

    Signed-off-by: saienduri <[email protected]>
    saienduri committed Aug 29, 2024
    Configuration menu
    Copy the full SHA
    5744a6c View commit details
    Browse the repository at this point in the history
  2. Migrate base clang workflow to new Dockerfile. (#18392)

    Progress on #15332. This uses a
    new `cpubuilder_ubuntu_jammy_x86_64` dockerfile from
    https://github.com/iree-org/base-docker-images.
    
    This stops using the remote cache that is hosted on GCP. Build time
    _without a cache_ is about 20 minutes on current runners, while build
    _with a cache_ is closer to 10 minutes. Build time without a cache is
    closer to 28-30 minutes on new runners. We can try adding back a cache
    using GitHub or our own hosted storage.
    
    I tried to continue using the previous cache during this transition
    period, but the `gcloud` command needs to run on the host, and I'd like
    to stop using the `docker_run.sh` script. I'm hoping we can keep folding
    away this sort of complexity by having the build machines run a
    dockerfile that includes key environment components like utility tools
    and any needed authorization/secrets (see
    #18238).
    
    ci-exactly: linux_x64_clang
    ScottTodd authored Aug 29, 2024
    Configuration menu
    Copy the full SHA
    125646f View commit details
    Browse the repository at this point in the history
  3. Rework how ASan and TSan workflows use Docker. (#18396)

    Progress on #15332. I'm trying to
    get rid of the `docker_run.sh` scripts, replacing them with GitHub's
    `container:` feature. While local development flows _may_ want to use
    Docker like the CI workflows do, those scripts contained a lot of
    special handling and file mounting to be compatible with Bazel. Much of
    that is not needed for CMake and can be folded away, though the
    `--privileged` option needed here is one exception.
    
    This stops using the remote cache that is hosted on GCP. We can try
    adding back a cache using GitHub or our own hosted storage as part of
    #18238.
    
    Job | Cache? | Runner cluster | Time | Logs
    -- | -- | -- | -- | --
    ASan | Cache | GCP runners | 14 minutes |
    [logs](https://github.com/iree-org/iree/actions/runs/10620030527/job/29438925064)
    ASan | No cache | GCP runners | 28 minutes |
    [logs](https://github.com/iree-org/iree/actions/runs/10605848397/job/29395467181)
    ASan | Cache | Azure runners | (not configured yet)
    ASan | No cache | Azure runners | 35 minutes |
    [logs](https://github.com/iree-org/iree/actions/runs/10621238709/job/29442788013?pr=18396)
    | | | 
    TSan | Cache | GCP runners | 12 minutes |
    [logs](https://github.com/iree-org/iree/actions/runs/10612418711/job/29414025939)
    TSan | No cache | GCP runners | 21 minutes |
    [logs](https://github.com/iree-org/iree/actions/runs/10605848414/job/29395467002)
    TSan | Cache | Azure runners | (not configured yet)
    TSan | No cache | Azure runners | 32 minutes |
    [logs](https://github.com/iree-org/iree/actions/runs/10621238738/job/29442788341?pr=18396)
    
    ci-exactly: linux_x64_clang_asan
    ScottTodd authored Aug 29, 2024
    Configuration menu
    Copy the full SHA
    83195a2 View commit details
    Browse the repository at this point in the history

Commits on Sep 3, 2024

  1. Use new cpubuilder dockerfile for ASan and TSan jobs. (#18403)

    Following iree-org/base-docker-images#6, the new
    cpubuilder dockerfile should have all the software needed for ASan and
    TSan building + testing (specifically `clang-19` instead of just
    `clang-14`).
    
    Progress on #15332. The only
    remaining uses of `gcr.io/iree-oss/base.*` are:
    
    * `build_test_all_bazel` uses `gcr.io/iree-oss/base-bleeding-edge`
    * `publish_website` uses `gcr.io/iree-oss/base`
    * arm64 workflows use `gcr.io/iree-oss/base-arm64`
    * `gcr.io/iree-oss/emscripten` (used by web test workflows) depends on
    `gcr.io/iree-oss/base`
    ScottTodd authored Sep 3, 2024
    Configuration menu
    Copy the full SHA
    e49a385 View commit details
    Browse the repository at this point in the history

Commits on Sep 5, 2024

  1. change workflows to run on gha runner scale set

    Signed-off-by: saienduri <[email protected]>
    saienduri committed Sep 5, 2024
    Configuration menu
    Copy the full SHA
    9dc6b90 View commit details
    Browse the repository at this point in the history

Commits on Sep 10, 2024

  1. added ccache

    Signed-off-by: Elias Joseph <[email protected]>
    Elias Joseph committed Sep 10, 2024
    Configuration menu
    Copy the full SHA
    1af8f7a View commit details
    Browse the repository at this point in the history
  2. Implemented caching with Azure containers using sccache (#18466)

    Implemented caching with Azure containers using sccache, only works when merging from a branch
    
    ci-exactly: linux_x64_clang
    Eliasj42 authored Sep 10, 2024
    Configuration menu
    Copy the full SHA
    6dc3610 View commit details
    Browse the repository at this point in the history

Commits on Sep 12, 2024

  1. added sccache for asan, tsan, debug, and release_packages

    Signed-off-by: Elias Joseph <[email protected]>
    Elias Joseph committed Sep 12, 2024
    Configuration menu
    Copy the full SHA
    d3d0668 View commit details
    Browse the repository at this point in the history
  2. make workflows use cluster

    Signed-off-by: saienduri <[email protected]>
    saienduri committed Sep 12, 2024
    Configuration menu
    Copy the full SHA
    3b0f15d View commit details
    Browse the repository at this point in the history
  3. Migrate base clang workflow to new Dockerfile. (#18392)

    Progress on #15332. This uses a
    new `cpubuilder_ubuntu_jammy_x86_64` dockerfile from
    https://github.com/iree-org/base-docker-images.
    
    This stops using the remote cache that is hosted on GCP. Build time
    _without a cache_ is about 20 minutes on current runners, while build
    _with a cache_ is closer to 10 minutes. Build time without a cache is
    closer to 28-30 minutes on new runners. We can try adding back a cache
    using GitHub or our own hosted storage.
    
    I tried to continue using the previous cache during this transition
    period, but the `gcloud` command needs to run on the host, and I'd like
    to stop using the `docker_run.sh` script. I'm hoping we can keep folding
    away this sort of complexity by having the build machines run a
    dockerfile that includes key environment components like utility tools
    and any needed authorization/secrets (see
    #18238).
    
    ci-exactly: linux_x64_clang
    Signed-off-by: saienduri <[email protected]>
    ScottTodd authored and saienduri committed Sep 12, 2024
    Configuration menu
    Copy the full SHA
    63df37b View commit details
    Browse the repository at this point in the history
  4. Rework how ASan and TSan workflows use Docker. (#18396)

    Progress on #15332. I'm trying to
    get rid of the `docker_run.sh` scripts, replacing them with GitHub's
    `container:` feature. While local development flows _may_ want to use
    Docker like the CI workflows do, those scripts contained a lot of
    special handling and file mounting to be compatible with Bazel. Much of
    that is not needed for CMake and can be folded away, though the
    `--privileged` option needed here is one exception.
    
    This stops using the remote cache that is hosted on GCP. We can try
    adding back a cache using GitHub or our own hosted storage as part of
    #18238.
    
    Job | Cache? | Runner cluster | Time | Logs
    -- | -- | -- | -- | --
    ASan | Cache | GCP runners | 14 minutes |
    [logs](https://github.com/iree-org/iree/actions/runs/10620030527/job/29438925064)
    ASan | No cache | GCP runners | 28 minutes |
    [logs](https://github.com/iree-org/iree/actions/runs/10605848397/job/29395467181)
    ASan | Cache | Azure runners | (not configured yet)
    ASan | No cache | Azure runners | 35 minutes |
    [logs](https://github.com/iree-org/iree/actions/runs/10621238709/job/29442788013?pr=18396)
    | | |
    TSan | Cache | GCP runners | 12 minutes |
    [logs](https://github.com/iree-org/iree/actions/runs/10612418711/job/29414025939)
    TSan | No cache | GCP runners | 21 minutes |
    [logs](https://github.com/iree-org/iree/actions/runs/10605848414/job/29395467002)
    TSan | Cache | Azure runners | (not configured yet)
    TSan | No cache | Azure runners | 32 minutes |
    [logs](https://github.com/iree-org/iree/actions/runs/10621238738/job/29442788341?pr=18396)
    
    ci-exactly: linux_x64_clang_asan
    Signed-off-by: saienduri <[email protected]>
    ScottTodd authored and saienduri committed Sep 12, 2024
    Configuration menu
    Copy the full SHA
    491cdc0 View commit details
    Browse the repository at this point in the history
  5. Use new cpubuilder dockerfile for ASan and TSan jobs. (#18403)

    Following iree-org/base-docker-images#6, the new
    cpubuilder dockerfile should have all the software needed for ASan and
    TSan building + testing (specifically `clang-19` instead of just
    `clang-14`).
    
    Progress on #15332. The only
    remaining uses of `gcr.io/iree-oss/base.*` are:
    
    * `build_test_all_bazel` uses `gcr.io/iree-oss/base-bleeding-edge`
    * `publish_website` uses `gcr.io/iree-oss/base`
    * arm64 workflows use `gcr.io/iree-oss/base-arm64`
    * `gcr.io/iree-oss/emscripten` (used by web test workflows) depends on
    `gcr.io/iree-oss/base`
    
    Signed-off-by: saienduri <[email protected]>
    ScottTodd authored and saienduri committed Sep 12, 2024
    Configuration menu
    Copy the full SHA
    eb7bd72 View commit details
    Browse the repository at this point in the history
  6. change workflows to run on gha runner scale set

    Signed-off-by: saienduri <[email protected]>
    saienduri committed Sep 12, 2024
    Configuration menu
    Copy the full SHA
    3714cf1 View commit details
    Browse the repository at this point in the history
  7. added ccache

    Signed-off-by: Elias Joseph <[email protected]>
    Signed-off-by: saienduri <[email protected]>
    Elias Joseph authored and saienduri committed Sep 12, 2024
    Configuration menu
    Copy the full SHA
    d43fc31 View commit details
    Browse the repository at this point in the history
  8. Merge branch 'shared/runner-cluster-migration' into shared/runner-clu…

    …ster-migration-add-sccache-to-workflows
    Eliasj42 authored Sep 12, 2024
    Configuration menu
    Copy the full SHA
    b846b6e View commit details
    Browse the repository at this point in the history
  9. Skip sccache when the github secret is not available. (#18499)

    This fixes the issue shown here:
    https://github.com/iree-org/iree/actions/runs/10800586282/job/29958939877#step:6:16
    where `SCCACHE_AZURE_CONNECTION_STRING` is defined to empty string
    instead of being undefined. It also introduces a config script, as
    discussed here:
    #18489 (comment).
    
    We may still want to limit writing to the cache to `push` events.
    
    ci-exactly: linux_x64_clang
    ScottTodd authored Sep 12, 2024
    Configuration menu
    Copy the full SHA
    d0b8489 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    6bc2e73 View commit details
    Browse the repository at this point in the history
  11. added sccache for asan, tsan, debug (#18489)

    | Workflow      | Un-cached      | Cached   |
    | ------------- | ------------- | ------------- |
    | Clang | 15m57s | 10m12s |
    | Clang ASan | NA | 15m7s |
    | Clang TSan | NA | 9m42s |
    | Clang Debug | 12m19s | 8m15s |
    Eliasj42 authored Sep 12, 2024
    Configuration menu
    Copy the full SHA
    cf91e58 View commit details
    Browse the repository at this point in the history

Commits on Sep 13, 2024

  1. Testing runner times from forked PR

    Elias Joseph committed Sep 13, 2024
    Configuration menu
    Copy the full SHA
    7388608 View commit details
    Browse the repository at this point in the history
  2. Remove stale comments about cache from workflows. (#18522)

    Also testing these workflows with PRs from a fork.
    ScottTodd authored Sep 13, 2024
    Configuration menu
    Copy the full SHA
    dcf93d3 View commit details
    Browse the repository at this point in the history
  3. Revert "Testing runner times from forked PR"

    This reverts commit 7388608.
    saienduri committed Sep 13, 2024
    Configuration menu
    Copy the full SHA
    04da67e View commit details
    Browse the repository at this point in the history
  4. Increase timeouts for tests that are slow under TSan.

    Some of these tests are taking 30-60 seconds on new runner machines under ASan/TSan, getting close to the 60 second timeout. Increase the timeout to 5 minutes.
    
    We could also do something TSan/ASan-specific here, but developers running the tests on slower systems can also benefit from these timeout changes.
    ScottTodd committed Sep 13, 2024
    Configuration menu
    Copy the full SHA
    ef5f989 View commit details
    Browse the repository at this point in the history