Wait for a container to be running #1900

erthalion · 2024-10-21T15:04:02Z

Description

TestProcessListeningOnPort has became flaky in a nasty way, leaving no logs
from the flask container. Most likely reason is that it takes some time to
start the container, while the client method ContainerCreate doesn't wait for
"running" status.

Make the test more robust by waiting until it reports running status. It's been
done using ContainerExecInspect, not ContainerWait, because the latter one
isn't suitable despite the name. ContainerWait [1] could only wait for a
non-running state, and is designed to wait until a container has finished it's
job.

Checklist

Investigated and inspected CI test results
Updated documentation accordingly

Automated testing

Added unit tests
Added integration tests
Added regression tests

Testing Performed

Manual testing, running the flaky test.

Molter73

Looking at the errors on CI, it seems the proposed changes are locking themselves and timing out on all containers being spin up.

Molter73 · 2024-10-21T16:07:35Z

integration-tests/pkg/executor/executor_docker_api.go

+	tick := time.Tick(tickSeconds)
+	timer := time.After(timeout)
+
+	for {


We have a Retry method that is currently unused, maybe we can move it to the common package and use that here?

collector/integration-tests/pkg/executor/retry.go

Line 19 in 19eafba

func Retry(f retryable) (output string, err error) {

Actually, just noticed the changes are inside the executor package, so it could be used directly if we wanted to.

The original Retry was number based, but in this context time based retry is more appropriate. I've moved the ticker implementation into the retry module.

Molter73 · 2024-10-21T16:11:59Z

integration-tests/pkg/executor/executor_docker_api.go

+	for {
+		select {
+		case <-tick:
+			inspect, err := d.client.ContainerExecInspect(ctx, resp.ID)


We might want to directly check if there was an error here:

Suggested change

inspect, err := d.client.ContainerExecInspect(ctx, resp.ID)

inspect, err := d.client.ContainerExecInspect(ctx, resp.ID)

if err != nil {

log.Info("Failed to inspect %s: %s", startConfig.Name, err)

continue

}

Then we can remove it from err from the log message on line 148, since it's currently causing some noise:

2024/10/21 15:41:38 INFO: Wait for container container-stats to start, %!w(errdefs.errNotFound=***0xc000616a08***) 2024/10/21 15:41:39 INFO: Wait for container container-stats to start, %!w(errdefs.errNotFound=***0xc000616b58***) 2024/10/21 15:41:40 INFO: Wait for container container-stats to start, %!w(errdefs.errNotFound=***0xc000616d08***) 2024/10/21 15:41:41 INFO: Wait for container container-stats to start, %!w(errdefs.errNotFound=***0xc000616fa8***)

Co-authored-by: Mauro Ezequiel Moltrasio <[email protected]>

Molter73

LGTM!

Could you edit the PR description a bit before merging? So we can have some context there if we ever need it.

Molter73 · 2024-10-24T08:30:49Z

integration-tests/pkg/executor/retry.go

+//
+// Note that the caller is responsible for reporting outstanding errors in
+// ticker function
+func RetryWithTimeout(f retryable, timeoutMsg error) (


In the future, we might want to add an argument for specifying the timeout, but for now it's good enough.

erthalion requested a review from a team as a code owner October 21, 2024 15:04

erthalion marked this pull request as draft October 21, 2024 15:04

openshift-ci bot added the do-not-merge/work-in-progress label Oct 21, 2024

Molter73 reviewed Oct 21, 2024

View reviewed changes

erthalion force-pushed the feature/wait-for-containers branch from 81b627a to 15ba21e Compare October 23, 2024 12:42

Wait for a container to be running

49f2d17

Co-authored-by: Mauro Ezequiel Moltrasio <[email protected]>

erthalion force-pushed the feature/wait-for-containers branch from 15ba21e to 49f2d17 Compare October 23, 2024 13:12

erthalion marked this pull request as ready for review October 23, 2024 15:04

openshift-ci bot removed the do-not-merge/work-in-progress label Oct 23, 2024

Molter73 approved these changes Oct 24, 2024

View reviewed changes

erthalion merged commit f17e9a4 into master Oct 24, 2024
99 of 104 checks passed

erthalion deleted the feature/wait-for-containers branch October 24, 2024 13:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wait for a container to be running #1900

Wait for a container to be running #1900

erthalion commented Oct 21, 2024 •

edited

Loading

Molter73 left a comment

Molter73 Oct 21, 2024

Molter73 Oct 21, 2024

erthalion Oct 23, 2024

Molter73 Oct 21, 2024

Molter73 left a comment

Molter73 Oct 24, 2024

-			inspect, err := d.client.ContainerExecInspect(ctx, resp.ID)
+			inspect, err := d.client.ContainerExecInspect(ctx, resp.ID)
+			if err != nil {
+				log.Info("Failed to inspect %s: %s", startConfig.Name, err)
+				continue
+			}

Wait for a container to be running #1900

Wait for a container to be running #1900

Conversation

erthalion commented Oct 21, 2024 • edited Loading

Description

Checklist

Testing Performed

Molter73 left a comment

Choose a reason for hiding this comment

Molter73 Oct 21, 2024

Choose a reason for hiding this comment

Molter73 Oct 21, 2024

Choose a reason for hiding this comment

erthalion Oct 23, 2024

Choose a reason for hiding this comment

Molter73 Oct 21, 2024

Choose a reason for hiding this comment

Molter73 left a comment

Choose a reason for hiding this comment

Molter73 Oct 24, 2024

Choose a reason for hiding this comment

erthalion commented Oct 21, 2024 •

edited

Loading