-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand failure domains beyond region/zone #234
Comments
This is tied to the following issue: #179 |
@akutz can you please elaborate more on that feature? You mention "region/zone tags". Does this imply vSphere tags or K8s labels? |
@embano1 those are good questions. You can already apply regions/zones to Clusters, Datacenters, Host groups (by way of folders), and also ResourcePools. Is there something else you had in mind? You can also place a hierarchy on regions/zones too where leaf most nodes override higher-level constructs/nodes. Datastores you can't since regions/zones apply to where pods run (ie Clusters, Datacenters, Host groups, and ResourcePools)... the admin needs to make sure those regions/zones have access to the datastores you need. Maybe expand on this a little more? |
/priority important-longterm |
/lifecycle frozen |
ping @pdaigle |
This just came up in some discussions we were having internally. Wanting to potentially be able to add something like a "host" parameter to the failure domain. Reason being - say we have an edge cluster that has 5 nodes, that we are running 5 VMs on, one of those hosts fails and the VM is brought back up automatically on another host via HA rules. Now we have 1 host, that has 2 k8s nodes on it. The normal topology now mean that it is possible for an application with 2 replicas to be running on a physical host with both VMs. If that host also has an issue, the application completely fails until rescheduled etc. Being able to set a hardware host level affinity would have safely redistributed that application instead. We also have some scenarios using PCI devices where for optimization multiple VMs run on the same host, but they are still part of the same cluster, host level separation would be useful there as well, while still maintaining the zone and region concept parity with cloudy things. |
Another use case: running OpenShift Data Foundation recommends using vSphere host anti-affinity rules to ensure the Ceph failure domains are distributed among different physical chassis/hypervisors. But with rack topology keys, which may not match zone/region tags we can use a rack failure domain for Ceph: https://github.com/rook/rook/blob/master/Documentation/ceph-cluster-crd.md#osd-topology |
/kind feature
Failure domains should be expanded beyond region/zone tags to include the following:
The text was updated successfully, but these errors were encountered: