Handle HSM for HFC #351

iurygregory · 2024-05-08T11:45:52Z

No description provided.

openshift-ci · 2024-05-08T11:47:06Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: iurygregory
Once this PR has been reviewed and has the lgtm label, please assign dtantsur for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

honza · 2024-05-08T11:54:52Z

Is this a downstream-only change?

iurygregory · 2024-05-08T11:56:44Z

@honza nope, is just to try to test downstream first in the setup Jad has (I have a thread in slack about it)

honza · 2024-05-08T13:13:41Z

/hold

rhjanders · 2024-05-13T09:52:29Z

/retest

zaneb · 2024-05-14T08:08:48Z

controllers/metal3.io/baremetalhost_controller.go

+		hfc.Status.Updates = hfc.Spec.Updates
+		t := metav1.Now()
+		hfc.Status.LastUpdated = &t
+		return nil


Without calling Update() this doesn't have any effect.

You are talking about having the reconciler calling Status().Update() right ? r.Status().Update(info.ctx, info.hfc)
Any ideas on how to access the HostFirmwareComponentsReconciler? Or there is other way to call

Since this function is in baremetalhost controller, you can either define it as a method to BMH Reconciler or call the Status().Update in the caller function.
By the way, in line 1750 are we sure that all the updates have been applied by this point ?

@hroyrh ohhh. that makes some sense!
Regarding L1750, since this is in the saveHostFirmwareComponents I should probably re-think the place I'm calling it actionPreparing (L1174 atm), maybe I should move it to L118.. so we would be sure that there is nothing else to do and we would do actionComplete right after it..

zaneb · 2024-05-14T08:11:31Z

controllers/metal3.io/baremetalhost_controller.go

@@ -1168,6 +1169,10 @@ func (r *BareMetalHostReconciler) actionPreparing(prov provisioner.Provisioner,
 		if err != nil {
 			return actionError{errors.Wrap(err, "could not save the host provisioning settings")}
 		}
+		if hfc != nil {
+			info.log.Info("saving hostfirmwarecomponents updates into status")
+			saveHostFirmwareComponents(hfc, info)


This updates the status when manual cleaning starts, but what if it fails?

I suspect we only want to update this after cleaning has succeeded, but we also have to think about what happens if updating the status fails.

right!, yeah I agree we want to only update if it succeeded.
You mean updating the status in case an error happened during cleaning or what?

This is how I handled status updates - https://github.com/openshift/baremetal-operator/blob/master/controllers/metal3.io/baremetalhost_controller.go#L1571

Thanks @hroyrh

I mean if the call to Update() fails, do we get the chance to try again? Or will we restart the whole manual cleaning process or something weird like that?

But we don't have a way to remember that the Update failed in the last Reconcile, right ? So, do we have to check if the config changes were already applied before, by fetching the current Ironic node and if yes - then, simply run Update again, rather than the whole manual cleaning process as you mentioned ?

One way, could be like @hroyrh mentioned I think...
We can check if the information ironic provides about Components would be different (if there was a Firmware Update something would be different in the ironic DB)
Does it makes sense?

zaneb · 2024-05-14T08:15:40Z

controllers/metal3.io/hostfirmwarecomponents_controller.go

@@ -223,9 +223,7 @@ func (r *HostFirmwareComponentsReconciler) updateHostFirmware(info *rhfcInfo, co

 	// Update Status if has changed
 	if dirty {
-		info.log.Info("Status for HostFirmwareComponents changed")
-		info.hfc.Status = *newStatus.DeepCopy()


Without this you're no longer writing the components read from ironic.

iurygregory · 2024-06-28T11:20:17Z

controllers/metal3.io/hostfirmwarecomponents_controller.go

+	newStatus.Components = make([]metal3api.FirmwareComponentStatus, len(components))
+	for i := range info.hfc.Status.Components {
+		components[i].DeepCopyInto(&newStatus.Components[i])
+	}


I've changed to this approach to see if would help, but maybe it's missing to trigger a condition before doing this, checking if !reflect.DeepEqual(info.hfc.Status.Components, components) ?

controllers/metal3.io/hostfirmwarecomponents_controller.go

iurygregory · 2024-07-24T19:41:07Z

I've update this PR to address some of the upstream comments from 1793 and 1821. I'm still struggling to figure out how to proper update the Components information with the newer information Ironic has about the firmware.

iurygregory · 2024-07-26T12:31:23Z

@dtantsur @zaneb
This approach works!
In Scenario 1 at least 😅 - when you add a BMH + HostFirmwareComponents CRD - the firmware is update in preparing, when reaches available the newer info for the component is not yet available in status.components, we need to scale-up the machine set to make the BMH go to provisioning and after some time during this phase the information will be updated.

I'm going to test Scenario 2 - where we have a provisioned BMH and we need to scale-down and scale-up.

- Fixed the comments provided in the upstream review - investigation about how to update the newer firmware information in Status after the update.

openshift-ci · 2024-07-26T16:51:00Z

@iurygregory: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-metal-ipi-serial-ipv4	`dcb99c2`	link	true	`/test e2e-metal-ipi-serial-ipv4`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci bot requested review from derekhiggins and elfosardo May 8, 2024 11:47

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 8, 2024

iurygregory force-pushed the hfc-handleAvailable branch from c2448d6 to 0fe2a34 Compare May 8, 2024 22:43

iurygregory force-pushed the hfc-handleAvailable branch from 0fe2a34 to 1ed7c6f Compare May 13, 2024 11:20

zaneb reviewed May 14, 2024

View reviewed changes

iurygregory force-pushed the hfc-handleAvailable branch 2 times, most recently from 94b0778 to 84f1818 Compare May 16, 2024 03:34

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 17, 2024

iurygregory force-pushed the hfc-handleAvailable branch from 84f1818 to 53919fb Compare June 21, 2024 14:42

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 21, 2024

iurygregory force-pushed the hfc-handleAvailable branch 10 times, most recently from 57db30c to a903ba9 Compare June 28, 2024 01:57

iurygregory commented Jun 28, 2024

View reviewed changes

iurygregory force-pushed the hfc-handleAvailable branch from a903ba9 to 94eb128 Compare June 28, 2024 11:59

dtantsur reviewed Jun 28, 2024

View reviewed changes

controllers/metal3.io/hostfirmwarecomponents_controller.go Outdated Show resolved Hide resolved

iurygregory force-pushed the hfc-handleAvailable branch 4 times, most recently from cba0e20 to 6e930a0 Compare July 2, 2024 16:21

iurygregory force-pushed the hfc-handleAvailable branch from 6e930a0 to e7e1c3a Compare July 24, 2024 19:34

iurygregory force-pushed the hfc-handleAvailable branch 2 times, most recently from 89a6138 to 934b7a3 Compare July 25, 2024 18:13

HFC improvements

dcb99c2

- Fixed the comments provided in the upstream review - investigation about how to update the newer firmware information in Status after the update.

iurygregory force-pushed the hfc-handleAvailable branch from 934b7a3 to dcb99c2 Compare July 26, 2024 12:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle HSM for HFC #351

Handle HSM for HFC #351

iurygregory commented May 8, 2024

openshift-ci bot commented May 8, 2024

honza commented May 8, 2024

iurygregory commented May 8, 2024

honza commented May 8, 2024

rhjanders commented May 13, 2024

zaneb May 14, 2024

iurygregory May 14, 2024

hroyrh May 14, 2024

iurygregory May 15, 2024

zaneb May 14, 2024

iurygregory May 14, 2024

hroyrh May 14, 2024

iurygregory May 14, 2024

zaneb May 15, 2024

hroyrh May 15, 2024

iurygregory May 16, 2024

zaneb May 14, 2024

iurygregory May 14, 2024

iurygregory Jun 28, 2024

iurygregory commented Jul 24, 2024

iurygregory commented Jul 26, 2024

openshift-ci bot commented Jul 26, 2024

Handle HSM for HFC #351

Are you sure you want to change the base?

Handle HSM for HFC #351

Conversation

iurygregory commented May 8, 2024

openshift-ci bot commented May 8, 2024

honza commented May 8, 2024

iurygregory commented May 8, 2024

honza commented May 8, 2024

rhjanders commented May 13, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iurygregory commented Jul 24, 2024

iurygregory commented Jul 26, 2024

openshift-ci bot commented Jul 26, 2024