-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High memory usage when pulling large images with pre-existing layers. #24527
Comments
Do you have TMPDIR set and does it point to a tmpfs file system? Downloaded image content will be written to the TMPDIR /var/tmp by default and removed once the pull is complete AFAIK |
I just checked and I don't set TMPDIR on any of the machines. Also the memory used is allocated for the podman process itself, I've measured it using cgroups through a systemd.scope unit, tmpfs would show as part of the page cache if I recall correctly. |
The memory starts rising only after the download completes at stage "Writing manifest to image destination" and grows progressively until it hits roughly the image size (i think), then it finishes. Here's
I ran it like this:
|
There is containers/storage#2055 for zstd compressed images but this image does seem to use gzip. There is a hidden |
Yes, I’d like to see a profile. If we are doing anything stupid, that should very clearly show up in there at 13 GB. (The description “page cache” doesn’t immediately tell me whether this is private memory required by Podman, or just that we have written that many files and they will eventually be written and then the page cache can be freed. I’m sure that’s trivial knowledge that can be looked up.) |
I probably should have split that into it's own paragraph. What I meant to say is that it's not tmpfs, since tmpfs isn't memory allocated to any single process, but rather is part of the page cache, which is separate from process memory. I have it profiled (got to only 6G), where should I paste it? Edit: for anyone stumbling upon this, I now know this isn't always true as cgroups v2 also accounts for tmpfs usage, see https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html, it's tracked under |
you can upload the file here in a github comment, either drag and drop or click below the text box |
pull-profile-1.txt (this got to the full 13G, I reran the profiling with a different image before, so this is with pulling quay.io/fedora-ostree-desktops/kinoite:41, for clarity) |
So, I've figured out how to reproduce it from scratch and have a rough theory of how it happens. If you just download a large image like this from scratch, you're not going to see the 13G memory usage, you need to already have an image stored locally that shares a significant amount of blobs with the pulled image. (complete speculation) If I had to hazard a guess, it's got something to do with the deduplication of blobs, that they get loaded into memory for checksumming or something and then stick around a bit too long for some reason. Here's how I reproduce it now (the first image is FROM the kinoite one, so it already contains all its layers):
|
The first one is 66 zstd:chunked layers… there might be something to the deduplication theory? The second one is 65 gzip layers. Is that really that, or some mirrored / converted version? Any non-default c/storage options? In particular, is |
The second is literally I haven't changed anything about EDIT: I think I misunderstood you, the first one is "FROM quay.io/fedora-ostree-desktops/kinoite:41", but it is rebuilt with zstd:chunked and some other stuff. (base image + the 1 squashed layer) |
Note to self: In the profile,
That does look rather unexpected. Allocating 6 GB for Image.BigData would certainly explain a lot, but why? |
@ver4a Thanks! Unconfirmed, just by reading the code: I think you are right, this does happen when a whole layer is reused from a previously-pulled image (but reused in a way which requires making a copy); there must be a consistent reproducer, but it’s not too likely to happen just on common pulls — but with some combination of pushes, compression format changes, and and re-pulls, I can certainly imagine a situation where this happens very frequently. In particular, the hypothesis is that since containers/image@5567453 we have started pointing at temporary uncompressed versions of the layer data in And it’s not just the memory usage, we actually unnecessarily store that 6 GB on disk (until the image is deleted). We should just drastically simplify this, and instead of recording all unaccounted-for blobs, special-case the one blob where that can happen, the config. |
Correct me if I've missed something, but I actually think the reproducer seems pretty simple and the case when this happens also seems very easy to hit (to a smaller degree though, since more layers are different). It also doesn't seem to require any special combination of actions, Take for example this which I've just tried:
This got to 6G of memory usage. These two are basically the same image, but they were built 1 day apart, so some layers have changed. The case here is a simple update. Now yes, most users don't upgrade daily (I do), but I hit it because I rebuild my images every day and so I pull a new upstream image to replace an ever so slightly out of date one. Update: I've just reproduced it on a fresh CoreOS install with no customizations, just by pulling these two images in this order. |
I ran the test on multiple versions of FCOS: 40.20240416.3.1 (podman 5.0.1): reproduces I've tried several older versions and those also don't have the issue, so it's somewhere between 4.9.4 and 5.0.1 Update: |
The bug depends on using the same content, but not exactly in the same “layer position”, e.g. if something was a layer 5 previously and now it is a layer 6 — or if it is still a layer 5, but a parent layer 4 has changed. … and, now that I think about it, yes, the “parent layer has changed” situation can happen very often.
Perfect, that narrows it down, thank you. |
containers/image#2636 might be a fix, but I didn’t test that yet. |
Confirming the guess at the cause, and that containers/image#2636 fixes this. Given the reproducer above, before|after of the second pull:
|
This is a workaround to deal with containers/podman#24527, once that's fixed this can be dropped, but might still be useful to not pull multiple images at once, which I think might duplicate some work. Without this the memory usage during pulls is currently too much to handle.
Issue Description
podman version:
When pulling a large image, podman allocates memory seemingly equal to image size. For a 13G
quay.io/fedora-ostree-desktops/kinoite:41
podman allocated ~13G of memory.Steps to reproduce the issue
Steps to reproduce the issue (with rootless podman)
I can't reproduce this with rootful podman, that only floats around 100M.
Describe the results you received
Memory usage scaling with image size.
Describe the results you expected
Memory usage being more or less constant.
podman info output
Podman in a container
No
Privileged Or Rootless
Rootless
Upstream Latest Release
Yes
Additional environment details
No response
Additional information
I'm able to reproduce this on Fedora Kinoite 41 and FCOS (stable) 40.20241019.3.0
The text was updated successfully, but these errors were encountered: