diff --git a/design-proposals/configurable-features.md b/design-proposals/configurable-features.md new file mode 100644 index 00000000..60319817 --- /dev/null +++ b/design-proposals/configurable-features.md @@ -0,0 +1,155 @@ +# Overview + +With the introduction +of [KubeVirt Feature Lifecycle](https://github.com/kubevirt/community/blob/main/design-proposals/feature-lifecycle.md) +policy, features reaching General Availability (GA) need to drop their use of feature gates. This applies also to +configurable features that we may want to disable. + +## Motivation + +Users or developers may want certain features to be in a given state, for example to make the best use out of given +resources or for compliance reasons features may expose sensitive information from the host to the virtual machines (VM) +or add additional containers to the launcher pod, which are not required by the user. The behavior of other features +might be changed by editing configurables, e.g. the maximum of CPU sockets allowed for each VM can be configured. + +Before the introduction +of [KubeVirt Feature Lifecycle](https://github.com/kubevirt/community/blob/main/design-proposals/feature-lifecycle.md) +policy, many feature gates remained after feature's graduation to GA with the sole purpose of acting as a switch for the +feature. Generally speaking, this is a bad practice, because feature gates should be reserved for controlling a feature +until it reaches maturity. i.e., GA. Therefore, in the case that a developer wants to provide the ability to tune/change +the state of the feature, configurables exposed in the KubeVirt CR should be provided. This should be +accomplished while achieving [eventually consistency](https://en.wikipedia.org/wiki/Eventual_consistency). This forces +us to avoid the feature state control checking on webhooks and moving the feature state control closer to the +responsible code. Moreover, it has to be decided how the system should behave if a virtual machine instance (VMi) is +requiring a feature in a state different from what was configured in the KubeVirt CR, or what should happen if the +configuration of a feature in use is changed. (see matrix below). + +## Goals + +- Establish how the features enablement switch should work. +- Describe how the system should react in these scenarios: + - A feature in KubeVirt is set to state A and a VMi requests the feature to be in state B. + - A feature in KubeVirt is set to state A, there are running VMis using the feature in state A, and the feature is + changed in KubeVirt to state B. + - A feature in KubeVirt is set to state A, and pending VMis want to use it. + - A feature in KubeVirt is set to state A, and running VMis using the feature in state B wants to live migrate. +- Graduate as many features as possible from features gates to configurables. + +## Non Goals + +- Describe how features protected with features gates should work. + +## Definition of Users + +Development contributors. + +Cluster administrators. + +## User Stories + +As a developer, I want to make a given feature configurable. + +As a cluster administrator, I want to be able to change the cluster wide state of a feature by editing configurables. + +As VM owner, I want to use a given feature. + +## Repos + +Kubevirt/Kubevirt + +# Design + +If a developer wants to make a feature configurable, he needs to do so by adding new fields to the KubeVirt CR +under `spec.configuration`. + + +> **NOTE:** The inclusion of these new KubeVirt API fields should be carefully considered and justified. The feature +> configurables should be avoided as much as possible. + + +This is current list of GA'd features present in KubeVirt/KubeVirt which are still using feature gates and are shown as +configurables in [HCO](https://github.com/kubevirt/hyperconverged-cluster-operator/blob/main/controllers/operands/kubevirt.go#L166-L174): + +- DownwardMetrics +- Root (not sure about this one) +- DisableMDEVConfiguration +- PersistentReservation +- AutoResourceLimitsGate +- AlignCPUs + +This is the current list of GA'd features present in KubeVirt/KubeVirt which are still using feature gates and are always +enabled by [HCO](https://github.com/kubevirt/hyperconverged-cluster-operator/blob/main/controllers/operands/kubevirt.go#L125-L142): + +- CPUManager +- Snapshot +- HotplugVolumes +- GPU +- HostDevices +- NUMA +- VMExport +- DisableCustomSELinuxPolicy +- KubevirtSeccompProfile +- HotplugNICs +- VMPersistentState +- NetworkBindingPlugins +- VMLiveUpdateFeatures + +Please note that only feature gates included in KubeVirt/KubeVirt are listed here. + +## API Examples +The proposal configuration field, for a given feature in the KubeVirt CR, may look like: + +```yaml +apiVersion: kubevirt.io/v1 +kind: KubeVirt +[...] +spec: + certificateRotateStrategy: {} + configuration: + feature-A: {} +[...] +``` +The VM object may or may not include a configuration field inside the relevant spec. + +## Interactions with the VMis requests + +In case that, the VM exposes a configuration field to request the feature as well as the KubeVirt CRD, the system may +encounter some inconsistent states that should be handled in the following way: + +- If the feature is set to state A in the KubeVirt CR and the VMi is requesting the feature in state B, the VMis must + stay in Pending state. The VM status should be updated, showing a status message, highlighting the reason(s) for the + Pending state. +- Feature status checks should only be performed during the scheduling process, not at runtime. Therefore, the feature + status changes in the KubeVirt CR should not affect running VMis. Moreover, the VMi should still be able to live + migrate, preserving its original feature state. +- Optionally, It could enable the possibility to reject the KubeVirt CR change request if running VMis are using the + feature in a given state. However, by the default the request should be accepted. + +## Scalability + +The feature state swapping should not affect in a meaningful way the cluster resource usage. + +## Update/Rollback Compatibility + +The feature enablement should not affect forward or backward compatibility once the feature GA. Before GA, it should +honor [feature stages](https://github.com/kubevirt/community/blob/main/design-proposals/feature-lifecycle.md#releases) +guidelines. + +## Functional Testing Approach + +The unit and functional testing frameworks should cover the relevant scenarios for each feature. + +# Implementation Phases + +The feature status check should be placed in the VMi reconciliation loop. In this way, the feature status evaluation is +close to the VMi scheduling process, as well as allowing KubeVirt to reconcile itself if it is out of sync temporally. + +Regarding already existing features transitioning from feature gates as a way to set the feature status to configurable +fields, this change is acceptable, but it should be marked as a breaking change and documented. Moreover, all feature +gates should be evaluated to determine if they need to be dropped and transitioned to configurables. + +## About implementing the checking logic in the VM controller + +The checking in the VM controller could be added to let the user know if a VM has requested a feature in a state which +is different from what it is specified in the KubeVirt CR. The VM will update the VM status, showing a status message +highlighting the misconfiguration. \ No newline at end of file