Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[occm] Add Openstack server hostId as k8s node label #2579

Open
chess-knight opened this issue Apr 22, 2024 · 22 comments · May be fixed by #2628
Open

[occm] Add Openstack server hostId as k8s node label #2579

chess-knight opened this issue Apr 22, 2024 · 22 comments · May be fixed by #2628
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@chess-knight
Copy link

Is this a BUG REPORT or FEATURE REQUEST?:

/kind feature

What happened:
This issue follows a discussion in kubernetes-sigs/cluster-api-provider-openstack#1605, as a request to automatically label nodes with underlying hostId information, so e.g. workload can be scheduled on different physical hosts. It can be used as a topology differentiator when all other topology labels are the same.

Anything else we need to know?:
Issue kubernetes/cloud-provider#67 is closed and potentially resolved by kubernetes/kubernetes#123223 so now based on comment kubernetes/cloud-provider#67 (comment), AdditionalLabels with hostId information can be added to InstanceMetadata:

return &cloudprovider.InstanceMetadata{
ProviderID: i.makeInstanceID(&server.Server),
InstanceType: instanceType,
NodeAddresses: addresses,
Zone: server.AvailabilityZone,
Region: i.region,
}, nil

This information should be available in the server struct AFAIK, so there should not be too much work I think now.

One potential issue that I see is live migration, e.g. see #1801, where occm will have to update the label because hostId will change.

Environment:

  • openstack-cloud-controller-manager(or other related binary) version:
  • OpenStack version:
  • Others:
@dulek
Copy link
Contributor

dulek commented Apr 26, 2024

This seems like a valid request from my point of view. @mdbooth, what do you think?

@chess-knight: Are you planning to contribute implementation?

@dulek
Copy link
Contributor

dulek commented Apr 26, 2024

@gryf, @stephenfin: This sounds like a low hanging fruit you can grab to get up to speed with CPO code.

@zetaab
Copy link
Member

zetaab commented May 25, 2024

is hostid really available to normal user in openstack? I do not have access now to openstack, but if I remember correctly normal user cannot see that?

@chess-knight
Copy link
Author

According to our research in SovereignCloudStack/issues#540, hostId should be available for all users. You can see e.g. in the nova code, that it is supported from API version 2.62 https://opendev.org/openstack/nova/commit/c2f7d6585818c04e626aa4b6c292e5c2660cb8b3. hostId is different from host. The host is available only to admins but hostId is a hash of (project_id + host) so it can be available to all.

@zetaab
Copy link
Member

zetaab commented May 27, 2024

actually I can see both hostId and host_id which has same value. Perhaps that label could be added if it exists. This can be added in https://github.com/kubernetes/cloud-provider-openstack/blob/master/pkg/openstack/instancesv2.go#L136-L142 as AdditionalLabels https://github.com/kubernetes/cloud-provider/blob/d2f5e75a5fd6d31f75b1519fe20879b1ab5347b8/cloud.go#L300

@chess-knight
Copy link
Author

actually I can see both hostId and host_id which has same value. Perhaps that label could be added if it exists. This can be added in https://github.com/kubernetes/cloud-provider-openstack/blob/master/pkg/openstack/instancesv2.go#L136-L142 as AdditionalLabels https://github.com/kubernetes/cloud-provider/blob/d2f5e75a5fd6d31f75b1519fe20879b1ab5347b8/cloud.go#L300

Yes, exactly as I originally wrote in the issue, thanks.
As @dulek said, it shouldn't be hard to implement this, but IMO discussion is needed here, on what should/will happen when live migration happens. When we initially set this custom label, it may not be correct later because the underlying host might change. So, should we consider this custom label only as "initial_host_id"? Or is there anything that we can do to reconcile it?

@zetaab
Copy link
Member

zetaab commented May 28, 2024

we are not using live migrations at all, so difficult to say how it works.

@mdbooth
Copy link
Contributor

mdbooth commented May 28, 2024

IMHO the node controller should update node labels to reflect their current reality, i.e. a live migration will trigger node relabelling the next time the node is reconciled. Most things currently running on the Node are unlikely to act on it, but:

  • Newly scheduled workloads can act on it
  • Existing workloads at least have the possibility to notice that they're in the wrong place now
  • We should document the known limitations and recommend an alternative (node reprovisioning)

We should also validate this decision with the cloud-provider folks in case there are caveats we're not aware of.

@mdbooth
Copy link
Contributor

mdbooth commented May 28, 2024

I'm very much in favour of adding a HostID label. I can confirm that HostID is unprivileged. It's an opaque value which can't be used to determine anything about the host including its name.

@artificial-intelligence
Copy link

I'm very much in favour of adding a HostID label. I can confirm that HostID is unprivileged. It's an opaque value which can't be used to determine anything about the host including its name.

Hi,

could you - or anybody - please clarify: You say this HostID can't be used to determine anything about the host, but from the name I would suppose it's some kind of unique(?) distinct value per host, no?

So can't I at least infer, that when the HostID changes, the underlying host has changed?

If the answer to the above is "no", so you can't infer anything from this ID, I don't see where adding it would bring any benefit, at least for our use case in the Sovereign Cloudstack project.

So any clarification around this would be highly appreciated, thanks!

@mdbooth
Copy link
Contributor

mdbooth commented Jun 7, 2024

I'm very much in favour of adding a HostID label. I can confirm that HostID is unprivileged. It's an opaque value which can't be used to determine anything about the host including its name.

Hi,

could you - or anybody - please clarify: You say this HostID can't be used to determine anything about the host, but from the name I would suppose it's some kind of unique(?) distinct value per host, no?

It's a sha224 of project_id and hostname: https://github.com/openstack/nova/blob/7dc4b1ea627d864a0ee2745cc9de4336fc0ba7b5/nova/utils.py#L1028-L1043

So hostID can't be compared between tenants.

So can't I at least infer, that when the HostID changes, the underlying host has changed?

If the answer to the above is "no", so you can't infer anything from this ID, I don't see where adding it would bring any benefit, at least for our use case in the Sovereign Cloudstack project.

So any clarification around this would be highly appreciated, thanks!

@stephenfin may be able to confirm that hostID will change if a VM live migrates, but I'm pretty sure that it would.

In general, k8s isn't going to handle a live migration well because, in general, we don't continuously reconcile the placement of things which have already been scheduled.

I think the value of HostID to a kubernetes cluster is the ability to schedule Pods on different underlying hypervisors. This means that an end-user can ensure their workload can survive a maintenance outage of a single hypervisor.

@chess-knight
Copy link
Author

I think the value of HostID to a kubernetes cluster is the ability to schedule Pods on different underlying hypervisors. This means that an end-user can ensure their workload can survive a maintenance outage of a single hypervisor.

I am thinking of the following scenario:

  1. OCCM will add the correct hostId label when the node starts
  2. Live migration happens
  3. hostId label is incorrect now so worker nodes with different hostId can be placed on the same host. And when the user uses this incorrect label, the pods can end up on the same host.

@mdbooth
Copy link
Contributor

mdbooth commented Jun 7, 2024

I think the value of HostID to a kubernetes cluster is the ability to schedule Pods on different underlying hypervisors. This means that an end-user can ensure their workload can survive a maintenance outage of a single hypervisor.

I am thinking of the following scenario:

  1. OCCM will add the correct hostId label when the node starts
  2. Live migration happens
  3. hostId label is incorrect now so worker nodes with different hostId can be placed on the same host. And when the user uses this incorrect label, the pods can end up on the same host.

I expect it to be updated.

However, live migrating k8s hosts already violates the scheduling constraints of everything which was running on it. Live migrating k8s workers is not a good idea if it can be avoided. Simply draining the node and shutting it down during maintenance is preferrable.

@chess-knight
Copy link
Author

I think the value of HostID to a kubernetes cluster is the ability to schedule Pods on different underlying hypervisors. This means that an end-user can ensure their workload can survive a maintenance outage of a single hypervisor.

I am thinking of the following scenario:

  1. OCCM will add the correct hostId label when the node starts
  2. Live migration happens
  3. hostId label is incorrect now so worker nodes with different hostId can be placed on the same host. And when the user uses this incorrect label, the pods can end up on the same host.

However, live migrating k8s hosts already violates the scheduling constraints of everything which was running on it. Live migrating k8s workers is not a good idea if it can be avoided. Simply draining the node and shutting it down during maintenance is preferrable.

I agree with you, that live-migrating k8s node is not a good idea.

I expect it to be updated.

But which controller should be responsible for that? Right now, I am not aware of any.

@mdbooth
Copy link
Contributor

mdbooth commented Jun 12, 2024

But which controller should be responsible for that? Right now, I am not aware of any.

This PR is requesting that OpenStack CCM sets it, so OpenStack CCM would also be responsible for updating it. IIRC there is now a mechanism for returning arbitrary node labels, but I don't recall what it is.

@chess-knight
Copy link
Author

But which controller should be responsible for that? Right now, I am not aware of any.

This PR is requesting that OpenStack CCM sets it, so OpenStack CCM would also be responsible for updating it. IIRC there is now a mechanism for returning arbitrary node labels, but I don't recall what it is.

Do you mean AdditionalLabels in the InstanceMetadata mentioned in this issue or something else? If so, I still don't know how these labels can be updated.

@chess-knight
Copy link
Author

chess-knight commented Jul 19, 2024

I created PR so the discussion can move on. Can someone try it, please? You can use registry.scs.community/occm-rh/openstack-cloud-controller-manager:v1.30.0-8-ga00ee1d8 image.

@chess-knight
Copy link
Author

#2628 is approved. Should we merge it immediately and close this issue or if someone wants to look at it? I am not able to test migrations where host-id will change. I tried only deletion of the label with kubectl command and this additional label is not reconciled back into the place, so I assume that live migration will have the same effect(wrong host-id label after).
Maybe we can just document this new host-id label with my observations and proceed...

@kayrus
Copy link
Contributor

kayrus commented Aug 1, 2024

@chess-knight I agree, the label must be reconciled and updated once the node is live-migrated.

@chess-knight
Copy link
Author

@chess-knight I agree, the label must be reconciled and updated once the node is live-migrated.

I am not sure if OCCM is capable of doing that. Maybe after all we should go back into the original issue and implement it in the CAPO, where machine reconciliation happens(I hope so). CAPO can introduce e.g. node.cluster.x-k8s.io/host-id label, set it on the Machine object and it will be propagated to the k8s Node labels.
Or just document this limitation, something like that host-id can serve as the "initial" host-id without any guarantees in the future.

@mdbooth
Copy link
Contributor

mdbooth commented Sep 5, 2024

@chess-knight As well as reconciling new Nodes, the node controller resyncs nodes periodically: https://github.com/kubernetes/kubernetes/blob/03fe89c2339a1582733649faab5f5df471f65f09/staging/src/k8s.io/cloud-provider/controllers/node/node_controller.go#L191-L198

However, it looks like that job:

  • Ignores tainted nodes. These are handled separately by sync(), which sets all values.
  • Only sets Addresses

It sounds like if we want to continuously reconcile zone information that should be a discussion with the cloud-provider folks. Maybe @aojea can let us know if this has been discussed before, and if not the best place to start the discussion.

My view: Kubernetes doesn't expect zone information to change, and in general will not respond to changes in zone information. We should advise users that there are alternatives which will give better behaviour. Despite that, zone information can still change, which means it will occasionally change. An example is a managed cloud service where the user has no influence over the migration of workloads. By updating the zone information on the Node when it does change we:

  • at least make it detectable
  • allow new workloads to be created in the correct place
    I believe we should continuously reconcile zone information, including node labels.

For now, this is an edge case. Lets return HostID in the instance metadata as is done by #2628. This is an immediate win for anybody wanting to schedule with hypervisor anti-affinity. The problem of continuous reconciliation is somewhat independent as it covers more than just the HostID label.

@chess-knight
Copy link
Author

Hi @mdbooth,
thank you for your review.

@chess-knight As well as reconciling new Nodes, the node controller resyncs nodes periodically: https://github.com/kubernetes/kubernetes/blob/03fe89c2339a1582733649faab5f5df471f65f09/staging/src/k8s.io/cloud-provider/controllers/node/node_controller.go#L191-L198

However, it looks like that job:

  • Ignores tainted nodes. These are handled separately by sync(), which sets all values.
  • Only sets Addresses

Interestingly, the comment suggests reconcile the nodes addresses and labels, not only addresses.

It sounds like if we want to continuously reconcile zone information that should be a discussion with the cloud-provider folks. Maybe @aojea can let us know if this has been discussed before, and if not the best place to start the discussion.

My view: Kubernetes doesn't expect zone information to change, and in general will not respond to changes in zone information. We should advise users that there are alternatives which will give better behaviour. Despite that, zone information can still change, which means it will occasionally change. An example is a managed cloud service where the user has no influence over the migration of workloads. By updating the zone information on the Node when it does change we:

  • at least make it detectable
  • allow new workloads to be created in the correct place
    I believe we should continuously reconcile zone information, including node labels.

For now, this is an edge case. Lets return HostID in the instance metadata as is done by #2628. This is an immediate win for anybody wanting to schedule with hypervisor anti-affinity. The problem of continuous reconciliation is somewhat independent as it covers more than just the HostID label.

I agree, that updating labels needs to be discussed. Maybe it can be configurable on/off behaviour. Do you think that I should write also some docs about the HostID label, so users are aware of it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants