Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🌱 MachineSet controller: delete Bootstrap object when creating InfraMachine object failed #11211

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

goushicui
Copy link
Contributor

What this PR does / why we need it:
If you do not delete the bootstrapRef after failing to create the machineRef, it will cause the bootstrapRef to leak.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 20, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign vincepri for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the do-not-merge/needs-area PR is missing an area label label Sep 20, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @goushicui. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Sep 20, 2024
Copy link
Member

@chrischdi chrischdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/area machineset

/ok-to-test

Thanks for opening this.

Could we also maybe add an unit test for this?

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. area/machineset Issues or PRs related to machinesets and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. do-not-merge/needs-area PR is missing an area label labels Sep 20, 2024
@sbueringer sbueringer changed the title delete bootstrapRef when create infraRef failed 🌱 MachineSet: controller delete Bootstrap object when creating InfraMachine object failed Sep 23, 2024
@sbueringer sbueringer changed the title 🌱 MachineSet: controller delete Bootstrap object when creating InfraMachine object failed 🌱 MachineSet controller: delete Bootstrap object when creating InfraMachine object failed Sep 23, 2024
Comment on lines +547 to 551
log.Error(err, "Failed to cleanup bootstrap configuration object after Machine creation error", bootstrapRef.Kind, klog.KRef(bootstrapRef.Namespace, bootstrapRef.Name))
}
}
conditions.MarkFalse(ms, clusterv1.MachinesCreatedCondition, clusterv1.InfrastructureTemplateCloningFailedReason, clusterv1.ConditionSeverityError, err.Error())
return ctrl.Result{}, errors.Wrapf(err, "failed to clone infrastructure machine from %s %s while creating a machine",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to add the error to the previous one, instead of just logging it out

@@ -541,6 +541,12 @@ func (r *Reconciler) syncReplicas(ctx context.Context, cluster *clusterv1.Cluste
},
})
if err != nil {
// Cleanup the bootstrap resource if we can't create the InfraMachine; or we might risk to leak it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@goushicui I think what Vince was suggesting is something like this:

			if err != nil {
				var deleteErr error
				if bootstrapRef != nil {
					// Cleanup the bootstrap resource if we can't create the InfraMachine; or we might risk to leak it.
					if err := r.Client.Delete(ctx, util.ObjectReferenceToUnstructured(*bootstrapRef)); err != nil && !apierrors.IsNotFound(err) {
						deleteErr = errors.Wrapf(err, "failed to cleanup %s %s after %s creation failed", bootstrapRef.Kind, klog.KRef(bootstrapRef.Namespace, bootstrapRef.Name), (&ms.Spec.Template.Spec.InfrastructureRef).Kind)
					}
				}
				conditions.MarkFalse(ms, clusterv1.MachinesCreatedCondition, clusterv1.InfrastructureTemplateCloningFailedReason, clusterv1.ConditionSeverityError, err.Error())
				return ctrl.Result{}, kerrors.NewAggregate([]error{errors.Wrapf(err, "failed to clone infrastructure machine from %s %s while creating a machine",
					ms.Spec.Template.Spec.InfrastructureRef.Kind,
					klog.KRef(ms.Spec.Template.Spec.InfrastructureRef.Namespace, ms.Spec.Template.Spec.InfrastructureRef.Name)), deleteErr})
			}

@@ -541,6 +541,12 @@ func (r *Reconciler) syncReplicas(ctx context.Context, cluster *clusterv1.Cluste
},
})
if err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's please add unit test coverage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/machineset Issues or PRs related to machinesets cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants