Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: fix a race in TestExecIn and TestExecInTTY #4445

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

lifubang
Copy link
Member

@lifubang lifubang commented Oct 14, 2024

Fixes: #4437
We can use a chan to wait the output from init process. After we received the content,
it means that the init process has started. Then we can exec into this container to use
ps command to check the init process and the exec process are both exist.

Signed-off-by: lifubang [email protected]

@kolyshkin
Copy link
Contributor

We can use a chan to wait the output from init process, after we received the content, it means that the init process has started. Then we can exec into this container to use ps command.

Please add this into the commit message.

Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should just not expect cat in ps output.

@kolyshkin
Copy link
Contributor

Maybe we should just not expect cat in ps output.

Or do a retry.

@lifubang
Copy link
Member Author

Maybe we should just not expect cat in ps output.

I think because these are integration tests, so maybe we need to keep this to check we didn't exec into an unrelated container?

Fixes: opencontainers#4437
We can use a chan to wait the output from init process. After we received the content,
it means that the init process has started. Then we can exec into this container to use
ps command to check the init process and the exec process are both exist.

Signed-off-by: lifubang <[email protected]>
Comment on lines +40 to +41
_ = stdoutR.Close()
_ = stdoutW.Close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we just defer the close, instead of a function? We are ignoring the error anyways.

I mean it for all the cases, not just this.

_ = stdinR.Close()
defer stdinW.Close() //nolint: errcheck
defer func() {
_, _ = stdinW.Write([]byte("hello"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we write this? To stop the go routine? Can you add a comment or the message written can be self-explanatory instead of this hello?

If it's that, is it needed? Won't the close of the pipe already free the goroutine?

Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the proper fix belongs to func (c *Container) exec(), which signals the runc init to go ahead and exec the real init process (cat in our case). I see there is some complicated logic in there but it looks it's just to handle the error case.

What it does (in a normal, non-error) case is opens exec fifo, reads from it, and removes the exec fifo file.

The other side (runc init, see the tail of func (l *linuxStandardInit) Init()):

  1. writes 0 to the exec fifo (after which the parent returns from container.Start and thus container is expected to be running);
  2. runs the StartContainer hook;
  3. calls utils.UnsafeCloseFrom
  4. calls system.Exec(name, l.config.Args, os.Environ())

The test execs the second process after (1) has happened, and if it manages to execute before (4) then we have this issue.

I guess that #4325 makes the race window smaller because os.Environ takes some time (it actually copies the environment), but we still haven't merged that.

I wish there is a cheap way to find out if runc init has completed. Maybe read /proc/$INIT_PID/exe to see it's not /proc/self/exe? Or something similar, ideally polling on something.

And, this can be added to func (c *Container) exec() so when it returns we're sure the container is running.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ci: race in TestExecIn
3 participants