Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait for a container to be running #1900

Merged
merged 1 commit into from
Oct 24, 2024
Merged

Conversation

erthalion
Copy link
Contributor

@erthalion erthalion commented Oct 21, 2024

Description

TestProcessListeningOnPort has became flaky in a nasty way, leaving no logs
from the flask container. Most likely reason is that it takes some time to
start the container, while the client method ContainerCreate doesn't wait for
"running" status.

Make the test more robust by waiting until it reports running status. It's been
done using ContainerExecInspect, not ContainerWait, because the latter one
isn't suitable despite the name. ContainerWait [1] could only wait for a
non-running state, and is designed to wait until a container has finished it's
job.

Checklist

  • Investigated and inspected CI test results
  • Updated documentation accordingly

Automated testing

  • Added unit tests
  • Added integration tests
  • Added regression tests

Testing Performed

Manual testing, running the flaky test.

@erthalion erthalion requested a review from a team as a code owner October 21, 2024 15:04
@erthalion erthalion marked this pull request as draft October 21, 2024 15:04
Copy link
Collaborator

@Molter73 Molter73 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the errors on CI, it seems the proposed changes are locking themselves and timing out on all containers being spin up.

tick := time.Tick(tickSeconds)
timer := time.After(timeout)

for {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a Retry method that is currently unused, maybe we can move it to the common package and use that here?

func Retry(f retryable) (output string, err error) {

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, just noticed the changes are inside the executor package, so it could be used directly if we wanted to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original Retry was number based, but in this context time based retry is more appropriate. I've moved the ticker implementation into the retry module.

for {
select {
case <-tick:
inspect, err := d.client.ContainerExecInspect(ctx, resp.ID)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to directly check if there was an error here:

Suggested change
inspect, err := d.client.ContainerExecInspect(ctx, resp.ID)
inspect, err := d.client.ContainerExecInspect(ctx, resp.ID)
if err != nil {
log.Info("Failed to inspect %s: %s", startConfig.Name, err)
continue
}

Then we can remove it from err from the log message on line 148, since it's currently causing some noise:

2024/10/21 15:41:38 INFO: Wait for container container-stats to start, %!w(errdefs.errNotFound=***0xc000616a08***)
2024/10/21 15:41:39 INFO: Wait for container container-stats to start, %!w(errdefs.errNotFound=***0xc000616b58***)
2024/10/21 15:41:40 INFO: Wait for container container-stats to start, %!w(errdefs.errNotFound=***0xc000616d08***)
2024/10/21 15:41:41 INFO: Wait for container container-stats to start, %!w(errdefs.errNotFound=***0xc000616fa8***)

Co-authored-by: Mauro Ezequiel Moltrasio <[email protected]>
@erthalion erthalion marked this pull request as ready for review October 23, 2024 15:04
Copy link
Collaborator

@Molter73 Molter73 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Could you edit the PR description a bit before merging? So we can have some context there if we ever need it.

//
// Note that the caller is responsible for reporting outstanding errors in
// ticker function
func RetryWithTimeout(f retryable, timeoutMsg error) (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, we might want to add an argument for specifying the timeout, but for now it's good enough.

@erthalion erthalion merged commit f17e9a4 into master Oct 24, 2024
99 of 104 checks passed
@erthalion erthalion deleted the feature/wait-for-containers branch October 24, 2024 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants