-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci: Improve caching #3211
ci: Improve caching #3211
Conversation
67da5b7
to
742268e
Compare
Indeed, loadtest is fixed that way. Not sure whether that's a one-time thing or not, but still a good sign. A few more observations, looking at https://github.com/PostgREST/postgrest/actions/caches. Since I'm not sure whether that's publicly available, some key facts:
test and style are the new split in this MR. common is the old cache from before. Clearly, splitting test and style does not make any difference in terms of cache size - everything in style is also in test. On the flip-side, merging the loadtest into it is great, the cache is not really bigger. Overall, we only have 10 GB, so we need to use the cache a lot less. I think we should do the following:
This way we can use the 10 GB we have for:
By only storing caches when on main we avoid having the main cache be evicted when a PR changes dependencies. This will make CI fast for those PRs which only touch code, those will use the cache. PRs which touch dependencies or nix will be slower, because they only use cachix. This seems like the most useful thing to do here. Edit: The above calculation does not take the "static" cache into account, which is another 1 GB. So the cache would already be at it's maximum with those... To avoid having to rebuild the dynamic postgrest build in every nix job, before we run all the tests on it, we can additionally cache just the dist-newstyle folder between the "prepopulate job" and all the test jobs. This comes in at around 200 MB, so we should still have room for those, even when pushing them in PRs. We'd just need to make sure that those caches get some kind of priority over the main caches in terms of eviction. I.e. caches from main should always be evicted last, PR cached first. Will need to figure out whether that's possible. |
742268e
to
ffb7e24
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything seems to be taken care of, LGTM!
Ooh, this one's good. Should've thought of this myself. Looks like something
The current eviction policy is that the oldest caches are deleted first, so all we could do is to create the big Nix cache last somehow. Let's see if we do hit that limit first.
On the one hand, it makes sense for Nix at lease, on the other — there were full Nix cache rebuilds and from what I've noticed, the cache size doesn't seem to increase significantly, if ever; it was 3-something GB all the time.
Speaking of Stack caches, somehow Linux cache is 380M, MacOS is 680M and Windows is 1GB, I wonder why? |
Yeah, that's something I wondered about, too. I tried stack locally, but I'm getting sizes of more than 2 GB for my I think the difference is that stack would download GHC in those cases, and we'd cache that, too. Which, I think, doesn't make much sense - downloading and installing GHC via stack is by far the fastest piece. Caching the dependencies is the important bit, though. Maybe we can do better with some combination of ghcup and |
The problem is that the caches from |
Interestingly, all PR caches from yesterday have already been evicted while main caches are still there. So maybe there is already some built-in prio between those. |
Since all caches are evicted after 7 days anyway, we could just run a daily workflow to restore (and save?) the current caches on main. This way they'd always be "used last" and hopefully not be evicted instead of PR caches. |
Oh, were they? That'd be really useful. Were the corresponding branches deleted or related PRs merged? |
We could even set stack up with |
Not really, they use different GHC versions. Stack is on GHC 9.4.5, Cabal on 9.4.8, 9.6.4 and 9.8.1. |
Yeah, I think I've suggested it somewhere in comments to nix-cache-related PRs. The eviction policy doesn't mention access time, though, but that seems like the best effort. |
Ah, I thought we'd have the case with this PR. But this PR only re-uses caches from main, so that just kept those and deleted everything else... |
Why do we stick to 9.4.5 for Stack, BTW? 9.4.8 is in Stackage for a month or so |
Excellent point on caching GHC and dependencies in non-nix builds separately |
But not available on FreebBSD, yet: |
So the idea would be to consistently:
|
We could probably also just throw out the Linux x64 cabal build with GHC 9.4.8. We are building with 9.4.8 in nix, with 9.4.5 via stack on various platforms and with 9.4.8 on arm via Cabal. That should be enough for 9.4.x series... |
@develop7 do you see any way of "not storing the cache" with cache-nix-action? I think this is possible in v5, but we have reverted to v4. Maybe we should try v5 again - might not have been buggy at all, maybe we just had our caches invalidated for other reasons or so? |
@wolfgangwalther perhaps, use |
Ah, right I remember why I looked at this and then didn't do it: This would require us to put the save part manually in each place where we currently use the setup-nix action. I didn't find a way to add a "post job hook" or something like that in a composite action, yet. However, v5 of cache-nix-action has the |
ffb7e24
to
8ff19f4
Compare
I looked at the "Build MacOS (Nix)" job a little bit, because it by far takes the most time to run (compared to other nix jobs). Both on main (with cache) and in this PR (without cache, because the key was changed). The numbers are:
Assuming the 2m 46s are the "base" time for nix setup without cache download, downloading the actions cache took around 11m here. Compare that to 12m 30s for cachix... and I start to wonder why do we bother with caching this stuff via github actions at all? Cachix seems fast enough to me for that purpose. I can see the github actions cache be useful to:
But I seriously question how much we are gaining from caching nix in the actions cache again. As an experiment, I added a commit to replace the actions cache for this job with something else: Basically a lookup of cachix, which doesn't always involve a full download. This should be very fast when nothing changes - and if it does, it would only download those dependencies that are required for the rebuild. Compare this to downloading everything every time.. |
Nix caching does shave off a minute or two per Nix-involving job, which shortens the feedback loop accordingly; usually to these two minutes, due to CI jobs running in parallel. For the macOS case, where there is a single cache consumer job and low-performing runner, it certainly does make sense to drop cache altogether. This is my first approach to GitHub Actions and their cache in particular, and, being honest, it is performing worse than I've expected, especially Windows/Mac runners (download speed is barely 100Mb/sec (down to abysmal 20MB/sec even), while in Linux runners it's 380Mb/s sometimes). With the dedicated infrastructure, I expect the improvement to be more significant, but currently it might not be worth the effort. |
So that didn't work at all in CI, while locally it was fine. I have no idea, yet, what's happening there.
This is done now - I was able to keep it all in the setup-nix action. |
59fe0f7
to
e736401
Compare
112574c
to
98b1cf3
Compare
4900cc6
to
43a3187
Compare
This restores caches on all branches and pull requests, but only stores them on the main branch and release branches. This prevents those caches from being evicted early when we hit the 10 GB limit quickly.
This is a first step to split up the cabal and stack caches in separate pieces. Here we split the work folder, which just contains the postgrest-specific build artifacts, into a separate cache. More fine-grained caching should give us better cache hits and much fewer upload size in the regular case, improving CI performance. Since the work file caches are very small (about 30-40 MB) they are cached for PRs, too. This will allow the majority of PRs, which only change source code files, but no dependencies, to still have cached their build files for additional commits.
This removes the GHC install from stack caches to reduce size.
This action makes sure to always have the correct GHC and/or stack version installed in all environments. This solves problem where ghc or stack might not be available on newer macos images anymore or where ghcup is not available by default on our new custom github runner on arm.
0139c84
to
a510b8b
Compare
I focused on improvements to the stack and cabal caching here and decided against trying to play with more nix-related github actions caching. Instead I will try #3364 (comment) to speed up the macos nix job. |
Plenty of text in the commit messages. Hopefully this also fixes the currently failing loadtest, but I'm not sure about that.
TODO:
Add PR caching for postgrest-buildGarbage collect nix/stack/cabal caches?Purge old caches proactively when saving new cache?