Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flaky unit tests (unable to start container process: unable to start init: fork/exec /proc/self/fd/7: permission denied) #4294

Open
kolyshkin opened this issue May 28, 2024 · 6 comments · May be fixed by #4452
Labels
Milestone

Comments

@kolyshkin
Copy link
Contributor

kolyshkin commented May 28, 2024

Description

From https://cirrus-ci.com/task/6471857094787072:

=== RUN TestInitJoinPID
exec_test.go:1471: unexpected error: unable to start container process: unable to start init: fork/exec /proc/self/fd/7: permission denied
--- FAIL: TestInitJoinPID (0.29s)

I've only seen it happen once. Filing for visibility.

@kolyshkin kolyshkin changed the title flake in flake in TestInitJoinPID on CentOS 7 May 28, 2024
@kolyshkin
Copy link
Contributor Author

CentOS 7 is gone (see #4333), so this one can be closed I guess.

@kolyshkin
Copy link
Contributor Author

I saw this once on Ubuntu 24.04 now:

https://github.com/opencontainers/runc/actions/runs/10823144914/job/30028174423?pr=4358

...
=== RUN   TestSharedPidnsInitKill
--- PASS: TestSharedPidnsInitKill (0.18s)
=== RUN   TestInitJoinPID
    exec_test.go:1444: unexpected error: unable to start container process: unable to start init: fork/exec /proc/self/fd/7: permission denied
--- FAIL: TestInitJoinPID (0.14s)
=== RUN   TestInitJoinNetworkAndUser
--- PASS: TestInitJoinNetworkAndUser (0.33s)
=== RUN   TestTmpfsCopyUp
...

So it might be a genuine issue with the test case.

@kolyshkin kolyshkin reopened this Sep 12, 2024
@kolyshkin
Copy link
Contributor Author

One more, in test (ubuntu-20.04, 1.23.x, criu-dev) but with a different test. From logs

=== RUN   TestSeccompPermitWriteMultipleConditions
    seccomp_test.go:251: |: unable to start container process: unable to start init: fork/exec /proc/self/fd/7: permission denied
--- FAIL: TestSeccompPermitWriteMultipleConditions (0.13s)

@kolyshkin kolyshkin changed the title flake in TestInitJoinPID on CentOS 7 flaky unit tests (unable to start container process: unable to start init: fork/exec /proc/self/fd/7: permission denied) Sep 13, 2024
@kolyshkin
Copy link
Contributor Author

Another failure in test (ubuntu-20.04, 1.23.x, -race). From the logs:

=== RUN   TestSeccompDenyWriteConditional
    seccomp_test.go:205: unexpected error: unable to start container process: unable to start init: fork/exec /proc/self/fd/7: permission denied
--- FAIL: TestSeccompDenyWriteConditional (0.14s)

@kolyshkin
Copy link
Contributor Author

From https://github.com/opencontainers/runc/actions/runs/11300641540/job/31433805394?pr=4441 (cross-i386):

=== RUN   TestRootfsPropagationSharedMount
    exec_test.go:1288: unexpected error: unable to start container process: unable to start init: fork/exec /proc/self/fd/7: permission denied
--- FAIL: TestRootfsPropagationSharedMount (0.13s)

lifubang added a commit to lifubang/runc that referenced this issue Oct 17, 2024
In opencontainers#3987(0e9a335), we may use a memfd to copy run to start runc init,
due to a Go stdlib bug, we need to add safeExe to the set of
ExtraFiles otherwise it is possible for the stdlib to clobber the fd
during forkAndExecInChild1 and replace it with some other file that
might be malicious. This is less than ideal (because the descriptor
will be non-O_CLOEXEC) however we have protections in "runc init" to
stop us from leaking extra file descriptors.
See <golang/go#61751>.

But because of we have added safeExe to the set of ExtraFiles, if the
fd of safeExe is too small, go stdlib will dup3 it to another fd, then
it will cause the original fd closed. (opencontainers#4294)

Signed-off-by: lfbzhm <[email protected]>
lifubang added a commit to lifubang/runc that referenced this issue Oct 17, 2024
In opencontainers#3987(0e9a335), we may use a memfd to copy run to start runc init,
due to a Go stdlib bug, we need to add safeExe to the set of
ExtraFiles otherwise it is possible for the stdlib to clobber the fd
during forkAndExecInChild1 and replace it with some other file that
might be malicious. This is less than ideal (because the descriptor
will be non-O_CLOEXEC) however we have protections in "runc init" to
stop us from leaking extra file descriptors.
See <golang/go#61751>.

There is a race situation when we are opening this memfd, if the fd 5
or 6 was closed at that time, maybe it will be reused by memfd.

But because of we have added safeExe to the set of ExtraFiles, if the
fd of safeExe is than stdio fds count + ExtraFiles count, go stdlib will
dup3 it to another fd, then it will cause the original fd closed. (opencontainers#4294)

Signed-off-by: lfbzhm <[email protected]>
lifubang added a commit to lifubang/runc that referenced this issue Oct 17, 2024
In opencontainers#3987(0e9a335), we may use a memfd to copy run to start runc init,
due to a Go stdlib bug, we need to add safeExe to the set of
ExtraFiles otherwise it is possible for the stdlib to clobber the fd
during forkAndExecInChild1 and replace it with some other file that
might be malicious. This is less than ideal (because the descriptor
will be non-O_CLOEXEC) however we have protections in "runc init" to
stop us from leaking extra file descriptors.
See <golang/go#61751>.

There is a race situation when we are opening this memfd, if the fd 5
or 6 was closed at that time, maybe it will be reused by memfd.

But because of we have added safeExe to the set of ExtraFiles, if the
fd of safeExe is than stdio fds count + ExtraFiles count, go stdlib will
dup3 it to another fd, then it will cause the original fd closed. (opencontainers#4294)

Signed-off-by: lfbzhm <[email protected]>
lifubang added a commit to lifubang/runc that referenced this issue Oct 17, 2024
In opencontainers#3987(0e9a335), we may use a memfd to copy run to start runc init,
due to a Go stdlib bug, we need to add safeExe to the set of
ExtraFiles otherwise it is possible for the stdlib to clobber the fd
during forkAndExecInChild1 and replace it with some other file that
might be malicious. This is less than ideal (because the descriptor
will be non-O_CLOEXEC) however we have protections in "runc init" to
stop us from leaking extra file descriptors.
See <golang/go#61751>.

There is a race situation when we are opening this memfd, if the fd 5
or 6 was closed at that time, maybe it will be reused by memfd.

But because of we have added safeExe to the set of ExtraFiles, if the
fd of safeExe is than stdio fds count + ExtraFiles count, go stdlib will
dup3 it to another fd, then it will cause the original fd closed. (opencontainers#4294)

Signed-off-by: lfbzhm <[email protected]>
lifubang added a commit to lifubang/runc that referenced this issue Oct 17, 2024
In opencontainers#3987(0e9a335), we may use a memfd to copy run to start runc init,
due to a Go stdlib bug, we need to add safeExe to the set of
ExtraFiles otherwise it is possible for the stdlib to clobber the fd
during forkAndExecInChild1 and replace it with some other file that
might be malicious. This is less than ideal (because the descriptor
will be non-O_CLOEXEC) however we have protections in "runc init" to
stop us from leaking extra file descriptors.
See <golang/go#61751>.

There is a race situation when we are opening this memfd, if the fd 6
or 7 was closed at that time, maybe it will be reused by memfd.

But because of we have added safeExe to the set of ExtraFiles, if the
fd of safeExe is not bigger than stdio fds count + ExtraFiles count, go
stdlib will dup3 it to another fd, then it will cause the original fd
closed. (opencontainers#4294)

Signed-off-by: lfbzhm <[email protected]>
lifubang added a commit to lifubang/runc that referenced this issue Oct 18, 2024
In opencontainers#3987(0e9a335), we may use a memfd to copy run to start runc init,
due to a Go stdlib bug, we need to add safeExe to the set of
ExtraFiles otherwise it is possible for the stdlib to clobber the fd
during forkAndExecInChild1 and replace it with some other file that
might be malicious. This is less than ideal (because the descriptor
will be non-O_CLOEXEC) however we have protections in "runc init" to
stop us from leaking extra file descriptors.
See <golang/go#61751>.

There is a race situation when we are opening this memfd, if the fd 6
or 7 was closed at that time, maybe it will be reused by memfd.

Because we want to add safeExe to the set of ExtraFiles, if the fd of
safeExe is too small, go stdlib will dup3 it to another fd, or dup3 a
other fd to this fd, then it will cause the fd type cmd.Path refers to
a random path. (issue: opencontainers#4294)

Signed-off-by: lfbzhm <[email protected]>
@lifubang
Copy link
Member

This is really a bug in practice, I can reproduce it in local.
We can start 1000 containers quickly, then we will have a change to reproduce.
Especially apply with the PR #4448, I don't know why, maybe because this PR make runc start more quickly than before.

@lifubang lifubang added this to the 1.2.0 milestone Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants