-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rootfs: make pivot_root(2) dance handle initramfs case #4434
base: main
Are you sure you want to change the base?
Changes from all commits
7f84857
17927cc
1f3707b
59b0465
0b283ea
363768c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -105,7 +105,7 @@ jobs: | |
- name: install deps | ||
run: | | ||
sudo apt update | ||
sudo apt -y install libseccomp-dev sshfs uidmap | ||
sudo apt -y install cpio libseccomp-dev qemu-kvm sshfs uidmap | ||
|
||
- name: install CRIU | ||
if: ${{ matrix.criu == '' }} | ||
|
@@ -140,7 +140,7 @@ jobs: | |
- name: Setup Bats and bats libs | ||
uses: bats-core/[email protected] | ||
with: | ||
bats-version: 1.9.0 | ||
bats-version: 1.11.0 | ||
support-install: false | ||
assert-install: false | ||
detik-install: false | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -202,10 +202,19 @@ func prepareRootfs(pipe *syncSocket, iConfig *initConfig) (err error) { | |
return err | ||
} | ||
|
||
if config.NoPivotRoot { | ||
err = msMoveRoot(config.Rootfs) | ||
} else if config.Namespaces.Contains(configs.NEWNS) { | ||
if config.Namespaces.Contains(configs.NEWNS) { | ||
err = pivotRoot(config.Rootfs) | ||
if config.NoPivotRoot { | ||
logrus.Warnf("--no-pivot is deprecated and may be removed or silently ignored in a future version of runc -- see <https://github.com/opencontainers/runc/issues/4435> for more details") | ||
cyphar marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if err != nil { | ||
// Always try to do pivot_root(2) because it's safe, and only fallback | ||
// to the unsafe MS_MOVE+chroot(2) dance if pivot_root(2) fails. | ||
logrus.Warnf("your container failed to start with pivot_root(2) (%v) -- please open a bug report to let us know about your usecase", err) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. End-users may not understand who are "us", as they don't execute runc directly. We may link https://github.com/opencontainers/runc/issues for explicitness, but we may potentially get a report about some third-party product that we have never heard of. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We already provide a link to the tracking issue for this deprecation if you pass |
||
err = msMoveRoot(config.Rootfs) | ||
} else { | ||
logrus.Warnf("despite setting --no-pivot, this container successfully started using pivot_root(2) -- consider removing the --no-pivot flag") | ||
} | ||
} | ||
} else { | ||
err = chroot() | ||
} | ||
|
@@ -1068,19 +1077,58 @@ func pivotRoot(rootfs string) error { | |
} | ||
defer unix.Close(oldroot) //nolint: errcheck | ||
|
||
newroot, err := unix.Open(rootfs, unix.O_DIRECTORY|unix.O_RDONLY, 0) | ||
if err != nil { | ||
return &os.PathError{Op: "open", Path: rootfs, Err: err} | ||
} | ||
defer unix.Close(newroot) //nolint: errcheck | ||
|
||
// Change to the new root so that the pivot_root actually acts on it. | ||
if err := unix.Fchdir(newroot); err != nil { | ||
return &os.PathError{Op: "fchdir", Path: "fd " + strconv.Itoa(newroot), Err: err} | ||
if err := os.Chdir(rootfs); err != nil { | ||
return err | ||
cyphar marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} | ||
|
||
if err := unix.PivotRoot(".", "."); err != nil { | ||
return &os.PathError{Op: "pivot_root", Path: ".", Err: err} | ||
pivotErr := unix.PivotRoot(".", ".") | ||
if errors.Is(pivotErr, unix.EINVAL) { | ||
// If pivot_root(2) failed with -EINVAL, one of the possible reasons is | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are six reasons will cause
This PR tries to deal with the forth one, does this pr would cause some unexpected behaviors for the fifth one? (Maybe need to check once have a time) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
For this case, this patch has the same security issue like
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
How? After
We also set the parent to The bind-mount of
The "magic" done is just creating a bind-mount of the root that can be used for This is the only way of doing it (I spoke to the VFS maintainers a few weeks ago, and this is the only "right" way of doing it -- FWIW Lennart implied that systemd does this but they don't it seems). It's a bit hard to understand what the problem is without an actual reproducer.
We can't deprecate it without having a solution for minikube and kata... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Mark it deprecate not means we remove it right now, it's only to remind users that they should not use it except there is no way to run their case, and to inform users that we can't guarantee the container's security with
I'll send you a email ASAP. |
||
// that we are in early boot and trying pivot_root on top of the | ||
// initramfs (which isn't allowed because initramfs/rootfs doesn't have | ||
// a parent mount). | ||
// | ||
// Traditionally, users were told to pass --no-pivot (which used chroot | ||
// instead) but this is very insecure (even with the hardenings we've | ||
// put into our chroot() wrapper). | ||
// | ||
// A much better solution is to create a bind-mount clone of / (which | ||
// would have a parent) and then chroot into that clone so that we are | ||
// properly rooted within a mount that has a parent mount. Then we can | ||
// retry the pivot_root(). | ||
|
||
// Clone / on top of . to create a version of / that has a parent and | ||
// so can be pivot-rooted. | ||
if err := unix.Mount("/", ".", "", unix.MS_BIND|unix.MS_REC, ""); err != nil { | ||
err := &os.PathError{Op: "make clone of / mount", Path: rootfs, Err: err} | ||
return fmt.Errorf("error during fallback for failed pivot_root (%w): %w", pivotErr, err) | ||
} | ||
// Switch to the cloned mount. We have to use the full path here | ||
// because we need to get the kernel to move us into the new mount | ||
// (chdir(".") will keep us in the old non-cloned / mount). | ||
if err := os.Chdir(rootfs); err != nil { | ||
return fmt.Errorf("error during fallback for failed pivot_root (%w): switch to cloned mount: %w", pivotErr, err) | ||
} | ||
// Move the cloned mount to /. | ||
if err := unix.Mount(".", "/", "", unix.MS_MOVE, ""); err != nil { | ||
err := &os.PathError{Op: "move / clone mount to /", Path: rootfs, Err: err} | ||
return fmt.Errorf("error during fallback for failed pivot_root (%w): %w", pivotErr, err) | ||
} | ||
// Update current->fs->root to be the cloned / (to be pivot_root'd). | ||
if err := unix.Chroot("."); err != nil { | ||
err := &os.PathError{Op: "chroot into cloned /", Path: rootfs, Err: err} | ||
return fmt.Errorf("error during fallback for failed pivot_root (%w): %w", pivotErr, err) | ||
} | ||
|
||
// Go back to the container rootfs and retry pivot_root. | ||
if err := os.Chdir(rootfs); err != nil { | ||
return fmt.Errorf("error during fallback for failed pivot_root (%w): %w", pivotErr, err) | ||
} | ||
pivotErr = unix.PivotRoot(".", ".") | ||
} | ||
if pivotErr != nil { | ||
return &os.PathError{Op: "pivot_root", Path: rootfs, Err: pivotErr} | ||
} | ||
|
||
// Currently our "." is oldroot (according to the current kernel code). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably the QEMU test should be executed only on a single GHA Ubuntu job
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do that, but the test itself runs quite quickly (1.6s on my machine) so I don't think it's worth making the test runner logic more complicated.