Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allows setting volatile overlay mount option #5366

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

marcosnils
Copy link

given that buildkit is generally very IO intensive, some preliminary
tests have shown that adding the volatile option could yield up to
a 15% performance improvements in some cases.

ref: https://docs.kernel.org/filesystems/overlayfs.html#volatile-mount

Signed-off-by: Marcos Lilljedahl [email protected]

given that buildkit is generally very IO intensive, some preliminary
tests have shown that adding the volatile option could yield up to
a 15% performance improvements in some cases.

ref: https://docs.kernel.org/filesystems/overlayfs.html#volatile-mount

Signed-off-by: Marcos Lilljedahl <[email protected]>
@tonistiigi
Copy link
Member

Bit confused. What sets the volatile option for these mounts? Additionally it looks like containerd libs have some code to intentionally remove it before mounting atm. https://github.com/containerd/containerd/blob/v1.7.22/mount/temp.go#L66

Do we capture fsync error on unmount for this? If upperdir fails to sync, what's the point where the error would appear?

@marcosnils
Copy link
Author

What sets the volatile option for these mounts?

You can set this in containerd via snapshotter options.

  [plugins."io.containerd.snapshotter.v1.overlayfs"]
    root_path = ""
    upperdir_label = false
    mount_options = ["volatile"]

Additionally it looks like containerd libs have some code to intentionally remove it before mounting atm. https://github.com/containerd/containerd/blob/v1.7.22/mount/temp.go#L66

IIUC that's only for temp mounts, I see that WriteUpperDir uses those https://github.com/moby/buildkit/blob/master/util/overlay/overlay_linux.go#L115. Having said that, when running some black-box performance tests with Dagger. I was getting consistent ~+15% performance increase in build times when using the volatile option. Could it be possible that some other non-temp overlay mounts could be benefited from this setting?

I could have also been bad measure from the benchmark even though I ran the same test multiple times and took the p99 value out of all samples. I'll try re-running the tests again to see if I still get the same results.

@tonistiigi
Copy link
Member

You can set this in containerd via snapshotter options.

If we think this is safe(for buildkit's use-cases) then we should add it to all our overlay mounts, not only with containerd worker, and not requiring custom config from the user.

Still, this would need some clarification:

Do we capture fsync error on unmount for this? If upperdir fails to sync, what's the point where the error would appear?

@sipsma
Copy link
Collaborator

sipsma commented Oct 2, 2024

If we think this is safe(for buildkit's use-cases) then we should add it to all our overlay mounts, not only with containerd worker, and not requiring custom config from the user.

The scope of this change is much smaller and just allows Dagger to even try this in the first place without a full fork of buildkit. All it's doing is stopping one part of code that's looking for upperdir/lowerdir in buildkit from erroring out if it sees volatile in the list of options. The change is not turning volatile on anywhere, just removing an error from happening if a mount happens to have it.

Do we capture fsync error on unmount for this? If upperdir fails to sync, what's the point where the error would appear?

From the kernel docs Marcos linked to:

If any writeback error occurs on the upperdir’s filesystem after a volatile mount takes place, all sync functions will return an error. Once this condition is reached, the filesystem will not recover, and every subsequent sync call will return an error, even if the upperdir has not experience a new error since the last sync call.

@tonistiigi
Copy link
Member

@sipsma I get that but I much more like that we implement a feature in buildkit instead of half-complete updates that only work if library is called in specific context and can easily break as there isn't anything in BuildKit actually using it. If you want to split work up into multiple PRs and create follow-up issues then that fine if properly documented this way.

From the kernel docs Marcos linked to:

Do I understand correctly from this that in order for volatile option to be safe for buildkit we need to manually call fsync before we call umount for these mounts as that is the place where we can capture the error if some writes were broken. This shouldn't affect performance as afaik umount implicitly calls fsync anyway (but wouldn't let us capture the error).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants