Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible to use public readonly access with Azure storage? #2258

Open
ScottTodd opened this issue Sep 11, 2024 · 2 comments
Open

Possible to use public readonly access with Azure storage? #2258

ScottTodd opened this issue Sep 11, 2024 · 2 comments

Comments

@ScottTodd
Copy link

We're trying to set up our GitHub project to use sccache with Azure Blob Storage to speed up our CMake builds in GitHub Actions running on pull_request and push events. We'd like for contributors sending pull requests from forks to be able to read from the shared cache without granting them write access.

  • We achieved this before using GCS storage and the ccache (not sccache) HTTP storage backend (https://ccache.dev/manual/4.10.2.html#_http_storage_backend), with trusted workflow runs passing a connection token to the bearer-token parameter in the URL and untrusted workflows instead passing the read-only parameter.
  • In sccache, S3 storage appears to support this via the SCCACHE_S3_NO_CREDENTIALS environment variable, documented here: https://github.com/mozilla/sccache/blob/main/docs/S3.md
  • We're switching some of our infrastructure to Azure now for ✨ reasons ✨ , but if it's missing fundamental features then we can make a case for using other options. I'm mainly looking for a clear picture of what those options are at the moment.

We configured an Azure Blob Storage container with "anonymous read access" then followed the docs here: https://github.com/mozilla/sccache/blob/main/docs/Azure.md. Here's what we've tried so far to get readonly / unauthenticated access to the shared cache:

  • Set the SCCACHE_AZURE_BLOB_CONTAINER and SCCACHE_AZURE_CONNECTION_STRING environment variables to real values.
    • This lets us read and write, but we're storing the connection string in a GitHub secret, which forks don't have access to, and we'd really prefer to keep access to such a broadly scoped key as limited as possible.
  • Set the SCCACHE_AZURE_CONNECTION_STRING environment variable to a connection string with AccountKey=${THE_SECRET_KEY_HERE}; omitted, hoping that would fall back to anonymous/readonly access.
    • That didn't appear to connect successfully - it looked like sccache fell back to using a local disk cache.
  • Write to the Azure storage using those SCCACHE_AZURE_BLOB_CONTAINER and SCCACHE_AZURE_CONNECTION_STRING environment variables then try to read from the storage by first downloading the files (e.g. with azcopy), then treating the downloaded folder as a local storage cache by following instructions at https://github.com/mozilla/sccache/blob/main/docs/Local.md.
    • This seemed like it could work, if the local and remote caches store the same files, but I actually observed 0 cache hits in my experiments. I'm not sure how to debug that - could the cache keys be different, because of file path differences or some other storage backend metadata?
  • Mount the remote directory using blobfuse2 (https://github.com/Azure/azure-storage-fuse) as suggested on Azure blob secondary storage ccache/ccache#1152, then use ccache or sccache pointed at the "local" directory.
    • blobfuse2 doesn't support either anonymous readonly access or concurrent reads/writes, so this didn't seem practical.

Am I missing something? Would it be possible to add direct support for public readonly access? Any suggestions for other things to try?

Thanks!

@Xuanwo
Copy link
Collaborator

Xuanwo commented Sep 11, 2024

Hi, @ScottTodd. Thank you very much for these detailed issues. To support public access to Azure without credentials, we need efforts from both the opendal and sccache sides:

  • opendal should introduce a feature similar to allow_anonymous for azblob.
  • sccache should utilize this feature to enable AZBLOB_NO_CREDENTIALS.

Would you like to cross post this issue to opendal side too?

@ScottTodd
Copy link
Author

Would you like to cross post this issue to opendal side too?

I'm not familiar with opendal or the implementation details of sccache, so I wouldn't really know what to say there 😅

Another option we're considering is running our own server to use with sccache, possibly hosted on Azure close to our build machines, instead of using Azure Blob Storage. That would give us more direct control over endpoints, authentication, etc.

ScottTodd added a commit to iree-org/iree that referenced this issue Sep 13, 2024
)

This commit is part of this larger issue that is tracking our migration
off the GCP runners, storage buckets, etc:
#18238.

This builds on #18381, which
migrated
* `linux_x86_64_release_packages`
* `linux_x64_clang_debug`
* `linux_x64_clang_tsan`

Here, we move over the rest of the critical linux builder workflows off
of the GCP runners:
* `linux_x64_clang`
* `linux_x64_clang_asan`

This also drops all CI usage of the GCP cache
(`http://storage.googleapis.com/iree-sccache/ccache`). Some workflows
now use sccache backed by Azure Blob Storage as a replacement. There are
few issues with this (mozilla/sccache#2258)
that prevent us providing read only access to the cache in PRs created
from forks, so **PRs from forks currently don't use the cache and will
have slower builds**. We're covering for this slowdown by using larger
runners, but if we can roll out caching to all builds then we might use
runners with fewer cores.

Along with the changes to the cache, usage of Docker is rebased on
images in the https://github.com/iree-org/base-docker-images/ repo and
the `build_tools/docker/docker_run.sh` script is now only used by
unmigrated workflows (`linux_arm64_clang` and `build_test_all_bazel`).

---------

Signed-off-by: saienduri <[email protected]>
Signed-off-by: Elias Joseph <[email protected]>
Co-authored-by: Scott Todd <[email protected]>
Co-authored-by: Elias Joseph <[email protected]>
raikonenfnu pushed a commit to raikonenfnu/iree that referenced this issue Sep 16, 2024
…e-org#18511)

This commit is part of this larger issue that is tracking our migration
off the GCP runners, storage buckets, etc:
iree-org#18238.

This builds on iree-org#18381, which
migrated
* `linux_x86_64_release_packages`
* `linux_x64_clang_debug`
* `linux_x64_clang_tsan`

Here, we move over the rest of the critical linux builder workflows off
of the GCP runners:
* `linux_x64_clang`
* `linux_x64_clang_asan`

This also drops all CI usage of the GCP cache
(`http://storage.googleapis.com/iree-sccache/ccache`). Some workflows
now use sccache backed by Azure Blob Storage as a replacement. There are
few issues with this (mozilla/sccache#2258)
that prevent us providing read only access to the cache in PRs created
from forks, so **PRs from forks currently don't use the cache and will
have slower builds**. We're covering for this slowdown by using larger
runners, but if we can roll out caching to all builds then we might use
runners with fewer cores.

Along with the changes to the cache, usage of Docker is rebased on
images in the https://github.com/iree-org/base-docker-images/ repo and
the `build_tools/docker/docker_run.sh` script is now only used by
unmigrated workflows (`linux_arm64_clang` and `build_test_all_bazel`).

---------

Signed-off-by: saienduri <[email protected]>
Signed-off-by: Elias Joseph <[email protected]>
Co-authored-by: Scott Todd <[email protected]>
Co-authored-by: Elias Joseph <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants