-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci: fix a race in TestExecIn and TestExecInTTY #4445
base: main
Are you sure you want to change the base?
Conversation
a6c839a
to
ee42499
Compare
Please add this into the commit message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should just not expect cat
in ps
output.
Or do a retry. |
I think because these are integration tests, so maybe we need to keep this to check we didn't exec into an unrelated container? |
Fixes: opencontainers#4437 We can use a chan to wait the output from init process. After we received the content, it means that the init process has started. Then we can exec into this container to use ps command to check the init process and the exec process are both exist. Signed-off-by: lifubang <[email protected]>
ee42499
to
f08116b
Compare
_ = stdoutR.Close() | ||
_ = stdoutW.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we just defer the close, instead of a function? We are ignoring the error anyways.
I mean it for all the cases, not just this.
_ = stdinR.Close() | ||
defer stdinW.Close() //nolint: errcheck | ||
defer func() { | ||
_, _ = stdinW.Write([]byte("hello")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we write this? To stop the go routine? Can you add a comment or the message written can be self-explanatory instead of this hello?
If it's that, is it needed? Won't the close of the pipe already free the goroutine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the proper fix belongs to func (c *Container) exec()
, which signals the runc init to go ahead and exec the real init process (cat
in our case). I see there is some complicated logic in there but it looks it's just to handle the error case.
What it does (in a normal, non-error) case is opens exec fifo, reads from it, and removes the exec fifo file.
The other side (runc init, see the tail of func (l *linuxStandardInit) Init()
):
- writes
0
to the exec fifo (after which the parent returns from container.Start and thus container is expected to be running); - runs the
StartContainer
hook; - calls
utils.UnsafeCloseFrom
- calls
system.Exec(name, l.config.Args, os.Environ())
The test execs the second process after (1) has happened, and if it manages to execute before (4) then we have this issue.
I guess that #4325 makes the race window smaller because os.Environ
takes some time (it actually copies the environment), but we still haven't merged that.
I wish there is a cheap way to find out if runc init
has completed. Maybe read /proc/$INIT_PID/exe
to see it's not /proc/self/exe
? Or something similar, ideally polling on something.
And, this can be added to func (c *Container) exec()
so when it returns we're sure the container is running.
Fixes: #4437
We can use a chan to wait the output from init process. After we received the content,
it means that the init process has started. Then we can exec into this container to use
ps command to check the init process and the exec process are both exist.
Signed-off-by: lifubang [email protected]