Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait for a stable cluster in the suite #51

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

rhmdnd
Copy link
Collaborator

@rhmdnd rhmdnd commented Aug 21, 2024

Some remediations are more invasive than others, and make changes to the
cluster that require time to propagate through the system. Before the
suite starts running subsequent scans, we should wait for it to become
stable so that we know the remediations at least applied properly, or at
the very least didn't make things worse.

Some remediations are more invasive than others, and make changes to the
cluster that require time to propagate through the system. Before the
suite starts running subsequent scans, we should wait for it to become
stable so that we know the remediations at least applied properly, or at
the very least didn't make things worse.
@@ -673,6 +673,14 @@ func (ctx *e2econtext) waitForMachinePoolUpdate(t *testing.T, name string) {
}
}

func (ctx *e2econtext) waitForStableCluster() error {
_, err := exec.Command("oc", "adm", "wait-for-stable-cluster", "--minimum-stable-period=2m").Output()
Copy link
Collaborator Author

@rhmdnd rhmdnd Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assumption here is that we don't care about the command output, just that it doesn't timeout waiting for a stable cluster.

Using a client library here instead would be nice because it might give us more useful error messages without having to parse raw output.

Copy link

@xiaojiey xiaojiey Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the testing for PR ComplianceAsCode/content#12220, the remediation took about 25-30 minutes for a 6 node cluster. Otherwise the ingress or apisever will be in updating status..

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the cluster be modified to have a faster rollout? Machine config operate used to have such an option

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - that's a significant increase in our testing times. I'll do some digging around to see if there is a way to speed this up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants