-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wait for a container to be running #1900
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the errors on CI, it seems the proposed changes are locking themselves and timing out on all containers being spin up.
tick := time.Tick(tickSeconds) | ||
timer := time.After(timeout) | ||
|
||
for { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a Retry
method that is currently unused, maybe we can move it to the common
package and use that here?
func Retry(f retryable) (output string, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, just noticed the changes are inside the executor package, so it could be used directly if we wanted to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original Retry was number based, but in this context time based retry is more appropriate. I've moved the ticker implementation into the retry module.
for { | ||
select { | ||
case <-tick: | ||
inspect, err := d.client.ContainerExecInspect(ctx, resp.ID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to directly check if there was an error here:
inspect, err := d.client.ContainerExecInspect(ctx, resp.ID) | |
inspect, err := d.client.ContainerExecInspect(ctx, resp.ID) | |
if err != nil { | |
log.Info("Failed to inspect %s: %s", startConfig.Name, err) | |
continue | |
} |
Then we can remove it from err
from the log message on line 148, since it's currently causing some noise:
2024/10/21 15:41:38 INFO: Wait for container container-stats to start, %!w(errdefs.errNotFound=***0xc000616a08***)
2024/10/21 15:41:39 INFO: Wait for container container-stats to start, %!w(errdefs.errNotFound=***0xc000616b58***)
2024/10/21 15:41:40 INFO: Wait for container container-stats to start, %!w(errdefs.errNotFound=***0xc000616d08***)
2024/10/21 15:41:41 INFO: Wait for container container-stats to start, %!w(errdefs.errNotFound=***0xc000616fa8***)
81b627a
to
15ba21e
Compare
Co-authored-by: Mauro Ezequiel Moltrasio <[email protected]>
15ba21e
to
49f2d17
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Could you edit the PR description a bit before merging? So we can have some context there if we ever need it.
// | ||
// Note that the caller is responsible for reporting outstanding errors in | ||
// ticker function | ||
func RetryWithTimeout(f retryable, timeoutMsg error) ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future, we might want to add an argument for specifying the timeout, but for now it's good enough.
Description
TestProcessListeningOnPort has became flaky in a nasty way, leaving no logs
from the flask container. Most likely reason is that it takes some time to
start the container, while the client method ContainerCreate doesn't wait for
"running" status.
Make the test more robust by waiting until it reports running status. It's been
done using ContainerExecInspect, not ContainerWait, because the latter one
isn't suitable despite the name. ContainerWait [1] could only wait for a
non-running state, and is designed to wait until a container has finished it's
job.
Checklist
Automated testing
Testing Performed
Manual testing, running the flaky test.