Expand failure domains beyond region/zone #234

akutz · 2019-08-16T18:45:08Z

/kind feature

Failure domains should be expanded beyond region/zone tags to include the following:

Clusters
Datacenters
Datastores
Host groups
ResourcePools

davidvonthenen · 2019-08-19T16:07:06Z

This is tied to the following issue: #179

embano1 · 2019-08-21T19:37:57Z

@akutz can you please elaborate more on that feature?

You mention "region/zone tags". Does this imply vSphere tags or K8s labels?
Are we saying that besides the mapping of vSphere tags for zones/region we also want additional fields (in the CPI configuration file) for clusters, datacenters, etc.? Assuming this is for additional labels on the workers for better placement decisions, correct?

davidvonthenen · 2019-08-30T14:28:19Z

@embano1 those are good questions. You can already apply regions/zones to Clusters, Datacenters, Host groups (by way of folders), and also ResourcePools. Is there something else you had in mind? You can also place a hierarchy on regions/zones too where leaf most nodes override higher-level constructs/nodes.

Datastores you can't since regions/zones apply to where pods run (ie Clusters, Datacenters, Host groups, and ResourcePools)... the admin needs to make sure those regions/zones have access to the datastores you need. Maybe expand on this a little more?

frapposelli · 2019-09-04T16:12:51Z

/priority important-longterm
/lfecycle frozen

frapposelli · 2019-09-04T16:12:58Z

/lifecycle frozen

akutz · 2019-12-17T23:13:24Z

ping @pdaigle

jordanrinke · 2020-02-01T03:45:52Z

This just came up in some discussions we were having internally. Wanting to potentially be able to add something like a "host" parameter to the failure domain. Reason being - say we have an edge cluster that has 5 nodes, that we are running 5 VMs on, one of those hosts fails and the VM is brought back up automatically on another host via HA rules. Now we have 1 host, that has 2 k8s nodes on it. The normal topology now mean that it is possible for an application with 2 replicas to be running on a physical host with both VMs. If that host also has an issue, the application completely fails until rescheduled etc. Being able to set a hardware host level affinity would have safely redistributed that application instead. We also have some scenarios using PCI devices where for optimization multiple VMs run on the same host, but they are still part of the same cluster, host level separation would be useful there as well, while still maintaining the zone and region concept parity with cloudy things.

QuingKhaos · 2022-05-11T14:47:50Z

Another use case: running OpenShift Data Foundation recommends using vSphere host anti-affinity rules to ensure the Ceph failure domains are distributed among different physical chassis/hypervisors. But with rack topology keys, which may not match zone/region tags we can use a rack failure domain for Ceph: https://github.com/rook/rook/blob/master/Documentation/ceph-cluster-crd.md#osd-topology

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Aug 16, 2019

k8s-ci-robot added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Sep 4, 2019

k8s-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Sep 4, 2019

frapposelli added this to the Next milestone Sep 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand failure domains beyond region/zone #234

Expand failure domains beyond region/zone #234

akutz commented Aug 16, 2019

davidvonthenen commented Aug 19, 2019

embano1 commented Aug 21, 2019

davidvonthenen commented Aug 30, 2019

frapposelli commented Sep 4, 2019

frapposelli commented Sep 4, 2019

akutz commented Dec 17, 2019

jordanrinke commented Feb 1, 2020

QuingKhaos commented May 11, 2022

Expand failure domains beyond region/zone #234

Expand failure domains beyond region/zone #234

Comments

akutz commented Aug 16, 2019

davidvonthenen commented Aug 19, 2019

embano1 commented Aug 21, 2019

davidvonthenen commented Aug 30, 2019

frapposelli commented Sep 4, 2019

frapposelli commented Sep 4, 2019

akutz commented Dec 17, 2019

jordanrinke commented Feb 1, 2020

QuingKhaos commented May 11, 2022