Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRAN Repository per package repository instead of a separate single repository storing multi-package #15

Closed
coatless opened this issue Jan 12, 2024 · 7 comments · Fixed by #17

Comments

@coatless
Copy link
Contributor

Thanks again for working on the workflows!

I've read through the documentation on rwasm GitHub Actions and Getting Started. One of the takeaways I had for repository deployment is it needs to be in a standalone repository that is geared toward having multiple R WASM binary packages present. That is, we cannot pair the action with an already existing R package repository.

So, I went off and investigated whether the workflow action could be easily deployed inside an existing package repository that has multiple workflows already present (e.g. pkgdown, r-cmd-check, et cetera). The hope is to further decrease the barrier of entry by chaining usethis::use_github_pages() (to setup GitHub Pages automatically) alongside the usethis::use_github_action() (to retrieve the deployment action). I'm particularly eyeing a custom function inside of usethis (c.f. r-lib/usethis#1932).

However, after a couple of changes:

  1. modifying the existing workflow to create a package list directly from the DESCRIPTION file (e.g. packages: '.')
  2. enabling deployments from the main/master branch to the github-pages environment

I still couldn't get the action to publish the archive onto GitHub pages.

After multiple attempts, I switched over to directly working with the r-wasm/actions/build-rwasm action and managed to get the repository up and running by using:

# Workflow derived from https://github.com/r-wasm/actions/tree/v1/examples
# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
on:
  push:
    # Only build on main or master branch
    branches: [main, master]
  # Or when triggered manually
  workflow_dispatch: {}

name: Build WASM R package and Repo

jobs:
  deploy-cran-repo:
    # Only restrict concurrency for non-PR jobs
    concurrency:
      group: r-wasm-${{ github.event_name != 'pull_request' || github.run_id }}
    env:
      GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
    permissions:
      contents: write
    runs-on: ubuntu-latest
    steps:
      - name: Checkout repository
        uses: actions/checkout@v4

      - name: Build WASM R packages
        uses: r-wasm/actions/build-rwasm@v1
        with:
          packages: "."
          repo-path: "_site"

      - name: Deploy wasm R packages to GitHub pages 🚀
        if: github.event_name != 'pull_request'
        uses: JamesIves/[email protected]
        with:
          clean: false
          branch: gh-pages
          folder: _site

I think a few folks would be interested in a workflow similar to the above. So, a few quick questions:

  1. Would you be okay to include this as an example under examples/?
  2. Do you see any glaring issues with moving toward a per-package webR CRAN repository compared to a multi-package single repository for the organization?
Demo script and screenshot showing working installation with webR REPL Editor
# Check if package `{demorwasmbinary}` is installed
"demorwasmbinary" %in% installed.packages()[,"Package"]
# Install the binary from a repository
webr::install(
  "demorwasmbinary", 
  repos = "https://tutorials.thecoatlessprofessor.com/webr-github-action-wasm-binaries/"
)
# Check to see if the function works
demorwasmbinary::in_webr()
# View help documentation
?demorwasmbinary::in_webr
Screenshot of the webR REPL editor showing how to download from repository outside of repo.r-wasm.org an R package binary
@coatless
Copy link
Contributor Author

After a bit of trial and error, the showstopping reason was that the actions/upload-artifact-pages and actions/deploy-pages do not mix well with deployments on the gh-pages branch. To get around this limitation and to avoid overwriting page deployment with separate actions, I set up a different "unified" workflow for both a R WASM Package repository and {pkgdown} website. The workflow follows from:

# Workflow derived from https://github.com/r-wasm/actions/tree/v1/examples
# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
on:
  push:
    # Only build on main or master branch
    branches: [main, master]
  # Or when triggered manually
  workflow_dispatch: {}

name: R WASM & {pkgdown} deploy

jobs:
  rwasmbuild:
    # Only restrict concurrency for non-PR jobs
    concurrency:
      group: r-wasm-${{ github.event_name != 'pull_request' || github.run_id }}
    env:
      GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
    permissions:
      contents: write
    runs-on: ubuntu-latest
    steps:
      - name: Checkout repository
        uses: actions/checkout@v4
        
      # Build the local R package and structure the CRAN repository
      - name: Build WASM R packages
        uses: r-wasm/actions/build-rwasm@v1
        with:
          packages: "."
          repo-path: "_site"
      
      # Upload the CRAN repository for use in the next step
      # Make sure to set a retention day to avoid running into a cap
      - name: Upload build artifact
        uses: actions/upload-artifact@v3
        with:
          name: rwasmrepo
          path: |
            _site
          retention-days: 1

  pkgdown:
    runs-on: ubuntu-latest
    # Add a dependency on the prior job completing
    needs: rwasmbuild
    # Required for the gh-pages deployment action
    environment:
      name: github-pages
    # Only restrict concurrency for non-PR jobs
    concurrency:
      group: pkgdown-${{ github.event_name != 'pull_request' || github.run_id }}
    env:
      GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
    permissions:
      # To download GitHub Packages within action
      repository-projects: read
      # For publishing to pages environment
      pages: write
      id-token: write
    steps:
      # Usual steps for generating a pkgdown website
      - uses: actions/checkout@v3

      - uses: r-lib/actions/setup-pandoc@v2

      - uses: r-lib/actions/setup-r@v2
        with:
          use-public-rspm: true

      - uses: r-lib/actions/setup-r-dependencies@v2
        with:
          extra-packages: any::pkgdown, local::.
          needs: website
      # Change the build directory from `docs` to `_site`
      # For parity with where the R WASM package repository is setup
      - name: Build site
        run: pkgdown::build_site_github_pages(new_process = FALSE, install = FALSE, dest_dir = "_site")
        shell: Rscript {0}
        
      # New material ---
      
      # Download the built R WASM CRAN repository from the prior step.
      # Extract it into the `_site` directory
      - name: Download build artifact
        uses: actions/download-artifact@v3
        with:
          name: rwasmrepo
          path: _site
      
      # Upload a tar file that will work with GitHub Pages
      # Make sure to set a retention day to avoid running into a cap
      # This artifact shouldn't be required after deployment onto pages was a success.
      - name: Upload Pages artifact
        uses: actions/upload-pages-artifact@v2
        with: 
          retention-days: 1
      
      # Use an Action deploy to push the artifact onto GitHub Pages
      # This requires the `Action` tab being structured to allow for deployment
      # instead of using `docs/` or the `gh-pages` branch of the repository
      - name: Deploy to GitHub Pages
        id: deployment
        uses: actions/deploy-pages@v2

Note: One part of interest may be to explicitly mention the retention-days option as the R WASM Package binaries tend to get a bit large and may eat into the 0.5 GB free user amount and/or the 2 GB pro user amount.

You can see the full experiment over here.

@georgestagg
Copy link
Member

Thanks for investigating this James!

We focused our efforts on the "separate CRAN-like repo" and "attaching artifacts to releases" workflows precisely because established packages will already have a pkgdown deployment in place, and a naive implementation of deploying Wasm binaries to GitHub Pages would clobber pkgdown documentation.

Your suggested workflow looks like a reasonable solution to that problem to me. @schloerke, can you see any potential issues with the method should someone want to deploy in this way? My feeling is that it would be nice to have a workflow that deploys a combined pkgdown website and wasm CRAN-like repo, even if it is not the default recommendation.

I think for long-term package reproducibility and sustainability, we'll want to continue to recommend using the release-file-system-image.yml workflow, since it attaches wasm binaries to individual releases that will live on a longer scale than a single release. That particular workflow will also continue to be useful to us later when it comes to building and bundling pinned Wasm binaries with Shinylive. Though I guess from the point of view of a package author using both workflows in parallel should also work fine.

If Barret sees no potential issues, I don't see why this could not become an additional workflow file and associated example, and I'd be happy to take a look at a PR for review.

@schloerke
Copy link
Collaborator

@coatless What do you want to gain by not using an extra CRAN repo? (And instead leverage the existing pkgdown website)

I am having trouble seeing the long term benefit. Sure... great short term benefit as which ever version is there will be downloaded.


  1. Using pkgdown to host the asset would allow for the bleeding edge bundle of your repo. But if many wasm packages are being used, you're offloading the responsibility of the your CRAN repo to the user who must manually collect each package's version for every app that is created. (Rather than pointing to a single extra cran instance to gain access to every package it holds.)

  2. By having the bundle located at a fixed URL in the website, it will be difficult to tell which version you are downloading. If a git sha / pkg version is used in the URL, it feels as if you might as well use a release on GitHub.


George:

Though I guess from the point of view of a package author using both workflows in parallel should also work fine.

Correct, but I'm having trouble seeing the benefit of adding it to the website in the first place.


If we do add the example, I believe we could perform it in a single job as the runners are of the same type. (Given you are ok with given r-wasm extra permissions for the job to complete.)

@coatless
Copy link
Contributor Author

coatless commented Feb 2, 2024

@georgestagg I had a feeling that was the case while trying to re-create the suggested approaches.

@schloerke thanks for the questions! See responses inline.

What do you want to gain by not using an extra CRAN repo? (And instead leverage the existing pkgdown website)

One less repository to monitor, no need to manually trigger updates with new releases, and built-in compatibility check for webR when PRs are being sent into the main repo.

Sure... great short term benefit as which ever version is there will be downloaded.

I'm not sure that version history is retained in the current mono-org CRAN setup. That is, the build artifact for 4.3 triggered now will not persist when R 4.4 is released/updated as each artifact under the current deploy-cran-repo is not available. Moreover, retaining the build artifacts for more than a day is likely to cause free users to quickly hit the storage cap (500 MB GitHub Packages storage). Pro users will have a little bit more breathing room (2 GB GitHub Packages storage).

https://docs.github.com/en/get-started/learning-about-github/githubs-plans

One other aspect is that the R version and even the webR version toolkit may not be backwards compatible. So, releases generated under say 4.3 may not work if the user is in R 4.4. So, even attaching a binary to the GitHub release may be problematic without emphasizing that it's for webR 0.2.2-R4.3.

Using pkgdown to host the asset would allow for the bleeding edge bundle of your repo. But if many wasm packages are being used, you're offloading the responsibility of the your CRAN repo to the user who must manually collect each package's version for every app that is created. (Rather than pointing to a single extra cran instance to gain access to every package it holds.)

Not necessarily. As far as I can tell between r-universe and the quarto-webr extension, we're advocating for users to use the package repository and the repos.r-wasm.org repositories for coverage.

# Specify where to search for the R WASM packages
list_of_repos = c(
    "https://gh-username.github.io/repo-name", 
    "https://username.r-universe.dev", 
    "https://repo.r-wasm.org/"
  )

# Set the repository URLs
options(
  repos = list_of_repos, # required for installed.packages()/available.packages()/... 
  webr_pkg_repos = list_of_repos # required for webr::install()
)

webr::install("pkg")

Or:

webr::install("pkg", repos = list_of_repos)

By having the bundle located at a fixed URL in the website, it will be difficult to tell which version you are downloading. If a git sha / pkg version is used in the URL, it feels as if you might as well use a release on GitHub.

I think the download URL can be exposed in the webr::install() function similar to install.packages():

Example installing a package from CRAN using `install.packages()` that shows the URL to be used that should be mirrored in `webr::install()`

From what I can tell, the documentation is sparse on how to obtain a package from the release. If I had to guess, I think you're saying that installs should be routed through pak using:

pak::pak("github::org/[email protected]")

This would then grab the assets from the release? Given what I've seen, I'm not sure this is going to be widely adopted as the usual approach for R users is to just grab the development repo or CRAN package as-is, e.g.

remotes::install_github("org/repo")
install.packages("repo")

With this being said, I do want to highlight a potential issue in the documentation:

The {rwasm} and docs.r-wasm.org websites emphasize either the Get started with rwasm or Build R packages using GitHub Actions for building and deploying an R WASM package.

The approaches highlighted here omit discussion on a system file image. I think this is likely because the file image approach was a recent entrant in webR 0.2.2. Perhaps at the top of the Build R packages using GitHub Actions article, there can be a link with a note added to redirect users over to Mounting filesystem images given its preferential status.

(P.S. How should webR be stylized? WebR, webR, webr?)

@georgestagg
Copy link
Member

georgestagg commented Feb 2, 2024

@coatless Some comments & clarifications:

Moreover, retaining the build artifacts for more than a day is likely to cause free users to quickly hit the storage cap (500 MB GitHub Packages storage). Pro users will have a little bit more breathing room (2 GB GitHub Packages storage).

For clarity, I think this limit is only in place for private repositories? "GitHub Packages is free for public repositories"1.

One other aspect is that the R version and even the webR version toolkit may not be backwards compatible. So, releases generated under say 4.3 may not work if the user is in R 4.4.

This is true, though I'm unsure yet how often this will happen. Breaking updates in Emscripten have happened before, but if there were no ABI changes in both the R and Emscripten versions since release, then the older packages should load fine on newer webR.

Also, though not ideal, if one really needed access to an older Wasm package, they could also downgrade webR. For reproducibility, it's better to have an older packages available for download that requires some work to use rather than nothing at all.

From what I can tell, the documentation is sparse on how to obtain a package from the release.

GitHub does not serve release package URLs with CORS headers, so the usual way of fetching a direct URL will be blocked by the browser. Instead, the GitHub REST API must be used to grab the release binary.

I am unsure right now if such an API request should be handled by webr::install or otherwise, but I would like the step to be automated in the future so that it is almost transparent for end users obtaining packages either directly in the webR REPL or for bundling with shinylive, and I plan to work on this fairly soon.

With this being said, I do want to highlight a potential issue in the documentation [...]

Just a minor note that the {rwasm} docs were originally written before the current iteration of the r-wasm/actions were available. It's likely that we'll be rewriting the recommendations within once we have some stable r-wasm/actions and recommended workflows. Sorry about the dissonance here; we're still in uncharted territory.


@schloerke My feeling is that this way of working by simply pushing the bleeding edge binaries to GH Pages is going to be "obvious" to enough power users like James that if we don't have a useful example here, those power users will just roll their own to avoid an extra repo.

For that reason, I am currently leaning towards including such an example, but not recommending it.

I believe we could perform it in a single job as the runners are of the same type. (Given you are ok with given r-wasm extra permissions for the job to complete.)

I am OK with this if it is simple to implement, but we should add a switch so that users who don't want Wasm binaries in their pkgdown website can turn it off. The switch should be off by default.


(P.S. How should webR be stylized? WebR, webR, webr?)

For the system as a whole I recommend "webR" with a lowercase w, unless webR is at the start of a sentence or in Title Case, in which case that wins with "WebR". The {webr} supporting R package is in all lowercase.

WebAssembly should have an uppercase W and A. The shortened "Wasm" should have a uppercase W2, but "wasm" is universally used and understood.

I am not particularly precious about this, other than that it should never be stylised "WEBR".

Footnotes

  1. Written in the left hand panel at https://github.com/features/packages#pricing

  2. Ref: "WebAssembly (abbreviated Wasm) is a binary...", https://webassembly.org

@schloerke
Copy link
Collaborator

For a clarification of my earlier comment of But if many wasm packages are being used, you're offloading the responsibility of the your CRAN repo to the user...

The goal of r-universe to to make a CRAN repo for you. This is great! It achieves the same intent as our WASM repo suggestion. So if you have access to a universe, use it! My worry is against trying to import from many package repos (ex: 10 different repos).

Every Shiny app that uses these packages would start to look like:

# Specify where to search for the R WASM packages
list_of_repos = c(
    "https://gh-username.github.io/repo-name1", 
    "https://gh-username.github.io/repo-name2", 
    "https://gh-username.github.io/repo-name3", 
    "https://gh-username.github.io/repo-name4", 
    "https://gh-username.github.io/repo-name5", 
    "https://gh-username.github.io/repo-name6", 
    "https://gh-username.github.io/repo-name7", 
    "https://gh-username.github.io/repo-name8", 
    "https://gh-username.github.io/repo-name9", 
    "https://gh-username.github.io/repo-name10", 
    "https://username.r-universe.dev/", 
    "https://repo.r-wasm.org/"
  )

Using a r-universe or custom wasm-cran repo would prevent the need to load from each individual repo's CRAN. When using individual CRAN repos, each Shiny app will need to be updated when a new repo becomes available. Where as if a r-universe or a single wasm-cran is used, then it becomes available to everything with a single update.


While typing this up, I had an ah ha! moment...

Making a wasm CRAN within each package repo does allow for a "github remote" style installation of a particular package, while we can keep the pinned package versions for r-universe / custom wasm-cran repo.

If this is the case, then the repo count should not not grow dramatically for every app... maybe just a few, and that is ok!

One potential problem is that earlier CRAN repos will overshadow followup repos. This is the main driver for using a single custom repo location.

Ex: user/pkgA@main -> cran::pkgB; user/pkgC -> user/pkgB@feature_branch. If the repos are set such that pkgA's repo is first and pkgC is second, then pkgB will be installed from the CRAN version, not the expected feature GH branch that pkgC needs.

  • user/pkgA wasm cran repo will contain pkgA@main, cran::pkgB
  • user/pkgC wasm cran repo will contain pkgC@main, pkgB@feature_branch

If the repos are set up as user/pkgA then user/pkgC, my belief is that pkgB will be installed from user/pkgA's repo which installs pkgB@main version as pkgB is found in the first cran library.

Where as if a central repo was used, then the pkgB version would be able to be resolved with a package install config.


Second ah ha! moment...

@georgestagg What if we did not store CRAN available packages when making a package repo's CRAN? Continuing the example from above...

  • user/pkgA wasm cran repo will contain pkgA@main
  • user/pkgC wasm cran repo will contain pkgC@main, pkgB@feature_branch
  • user/pkgD wasm cran repo will contain pkgD@main, pkgB@other_branch

There would still be issues if two dependencies didn't match (ex; pkgC and pkgD can't be installed together), but that's a nightmare for another day that even package install config have issues with.

By only storing the minimal packages within a package repo's CRAN, then there is only a minimal/expected shadow over the remaining CRAN locations. I believe we could have the packages being stored be able to be defined by the user, defaulting to just the single package.


My feeling is that this way of working by simply pushing the bleeding edge binaries to GH Pages is going to be "obvious" to enough power users like James that if we don't have a useful example here, those power users will just roll their own to avoid an extra repo.

For that reason, I am currently leaning towards including such an example, but not recommending it.

Sounds good! I like this idea of the added benefit of shimming in dev versions where possible. It avoids having to make a release for every commit.

@coatless Could you create a PR with your second solution from above? I can then take it from there to try to combine the two jobs into one.

https://github.com/actions/upload-pages-artifact/blob/0191170de1016adb0df3e653dd87f50ff6687ad7/action.yml#L16 has the default retention time as 1 day already. I'm happy to have this hard coded throughout the actions repo to err on the safe side.

Thank you for driving discussion with us on the bleeding edge!

@georgestagg
Copy link
Member

georgestagg commented Feb 2, 2024

What if we did not store CRAN available packages when making a package repo's CRAN?

Yes, we can make that change... and we probably should. Good idea.

The default sets dependencies = NA here. The NA means additionally add hard dependencies of the package to the CRAN-like repo. This made sense while I was developing {rwasm} and we couldn't pass multiple repos to webr::install(), but probably this should not be the default any more.

We can change the default in {rwasm} to dependencies = FALSE, so as to match rwasm::build(). That would mean that only the minimal package(s) that have been explicitly listed will be compiled and added to the CRAN-like repo, and the workflow would become as suggested above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants