Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement distributed resizing #195

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,8 @@ Note that the external-resizer does not scale with more replicas. Only one exter

* `--timeout <duration>`: Timeout of all calls to CSI driver. It should be set to value that accommodates majority of `ControllerExpandVolume` calls. 10 seconds is used by default.

* `--node-deployment`: Enables the resizer sidecar to handle resize operations for the volumes local to the node on which it is deployed. Off by default.

* `-kube-api-burst <int>` : Burst to use while communicating with the kubernetes apiserver. Defaults to 10. (default 10).

* `-kube-api-qps <float>` : QPS to use while communicating with the kubernetes apiserver. Defaults to 5.0. (default 5).
Expand Down Expand Up @@ -97,6 +99,16 @@ Note that the external-resizer does not scale with more replicas. Only one exter

* All glog / klog arguments are supported, such as `-v <log level>` or `-alsologtostderr`.


### Distributed Resizing

The distributed resizing feature is provided to handle resize operations for local volumes. To use this functionality, the resizer sidecar should be deployed along with the csi driver on each node so that every node manages the resize operations only for the volumes local to that node. This feature can be enabled by setting the following command line option to true:

* `--node-deployment`: Enables the resizer sidecar to handle resize operations for the volumes local to the node on which it is deployed. Off by default.

Other than this, the NODE_NAME environment variable must be set where the CSI resizer sidecar is deployed. The value of NODE_NAME should be the name of the node where the sidecar is running.


### HTTP endpoint

The external-resizer optionally exposes an HTTP endpoint at address:port specified by `--http-endpoint` argument. When set, these two paths are exposed:
Expand Down
18 changes: 17 additions & 1 deletion cmd/csi-resizer/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,8 @@ var (

handleVolumeInUseError = flag.Bool("handle-volume-inuse-error", true, "Flag to turn on/off capability to handle volume in use error in resizer controller. Defaults to true if not set.")

enableNodeDeployment = flag.Bool("node-deployment", false, "Enables deploying the sidecar controller together with a CSI driver on nodes to manage resizing for node-local volumes.")

featureGates map[string]bool

version = "unknown"
Expand Down Expand Up @@ -104,6 +106,19 @@ func main() {
klog.Fatal(err)
}

// If distributed resizing is enabled and leaderElection is also set to true, return
if *enableNodeDeployment && *enableLeaderElection {
klog.Error("Leader election cannot happen when node-deployment is set to true")
os.Exit(1)
}

if *enableNodeDeployment {
node := os.Getenv("NODE_NAME")
if node == "" {
klog.Fatal("The NODE_NAME environment variable must be set when using --node-deployment.")
}
}

var config *rest.Config
var err error
if *master != "" || *kubeConfig != "" {
Expand Down Expand Up @@ -156,7 +171,8 @@ func main() {
*timeout,
kubeClient,
informerFactory,
driverName)
driverName,
*enableNodeDeployment)
Copy link
Contributor

@pohly pohly Mar 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one potential scalability problem here: if there are now as many resizers running in the cluster as there are nodes, then the apiserver has to keep all of them informed about PV and PVC updates.

external-provisioner has the same problem, but it cannot really be avoided there because support for immediate provisioning depends on all provisioners seeing a new PVC.

I'm just wondering whether something better can be done for the other sidecars. The code below already checks for the util.VolumeSelectedNodeKey label. It would be possible to set up the PVC informer so that it does server-side filtering. It's doable, I just don't have a code example at hand.

That leaves the PV, though. external-provisioner would have to be modified to set a similar (the same?) label there. Such a label may even be useful for the provisioner. This code here does client-side filtering of PVs:
https://github.com/kubernetes-csi/external-provisioner/blob/3b752c36ca71fbf5d7310bb6a568b93844062189/pkg/controller/controller.go#L1132-L1145

That could be replaced with server-side filtering.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that as well but didn't want to 'kick against the pricks' as they say with how the others were implemented. I also noted that we are not currently doing server-side filtering on the driver either (which I found odd).

Given my limited golang skill set I would definitely need hand-holding on proper code to manage the watches/filters should we go that route.

if err != nil {
klog.Fatal(err.Error())
}
Expand Down
4 changes: 2 additions & 2 deletions pkg/controller/controller_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -233,7 +233,7 @@ func TestController(t *testing.T) {
pvcInformer := informerFactory.Core().V1().PersistentVolumeClaims()
podInformer := informerFactory.Core().V1().Pods()

csiResizer, err := resizer.NewResizerFromClient(client, 15*time.Second, kubeClient, informerFactory, driverName)
csiResizer, err := resizer.NewResizerFromClient(client, 15*time.Second, kubeClient, informerFactory, driverName, false)
if err != nil {
t.Fatalf("Test %s: Unable to create resizer: %v", test.Name, err)
}
Expand Down Expand Up @@ -365,7 +365,7 @@ func TestResizePVC(t *testing.T) {
pvcInformer := informerFactory.Core().V1().PersistentVolumeClaims()
podInformer := informerFactory.Core().V1().Pods()

csiResizer, err := resizer.NewResizerFromClient(client, 15*time.Second, kubeClient, informerFactory, driverName)
csiResizer, err := resizer.NewResizerFromClient(client, 15*time.Second, kubeClient, informerFactory, driverName, false)
if err != nil {
t.Fatalf("Test %s: Unable to create resizer: %v", test.Name, err)
}
Expand Down
2 changes: 1 addition & 1 deletion pkg/controller/expand_and_recover_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ func TestExpandAndRecover(t *testing.T) {

kubeClient, informerFactory := fakeK8s(initialObjects)

csiResizer, err := resizer.NewResizerFromClient(client, 15*time.Second, kubeClient, informerFactory, driverName)
csiResizer, err := resizer.NewResizerFromClient(client, 15*time.Second, kubeClient, informerFactory, driverName, false)
if err != nil {
t.Fatalf("Test %s: Unable to create resizer: %v", test.name, err)
}
Expand Down
24 changes: 16 additions & 8 deletions pkg/resizer/csi_resizer.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ import (
"context"
"errors"
"fmt"
"os"
"strconv"
"time"

Expand Down Expand Up @@ -47,7 +48,7 @@ func NewResizerFromClient(
timeout time.Duration,
k8sClient kubernetes.Interface,
informerFactory informers.SharedInformerFactory,
driverName string) (Resizer, error) {
driverName string, enableNodeDeployment bool) (Resizer, error) {

supportControllerService, err := supportsPluginControllerService(csiClient, timeout)
if err != nil {
Expand Down Expand Up @@ -81,18 +82,20 @@ func NewResizerFromClient(
}

return &csiResizer{
name: driverName,
client: csiClient,
timeout: timeout,
name: driverName,
client: csiClient,
timeout: timeout,
enableNodeDeployment: enableNodeDeployment,

k8sClient: k8sClient,
}, nil
}

type csiResizer struct {
name string
client csi.Client
timeout time.Duration
name string
client csi.Client
timeout time.Duration
enableNodeDeployment bool

k8sClient kubernetes.Interface
}
Expand All @@ -105,6 +108,7 @@ func (r *csiResizer) Name() string {
// Resizer will resize the volume if it is CSI volume or is migration enabled in-tree volume
func (r *csiResizer) CanSupport(pv *v1.PersistentVolume, pvc *v1.PersistentVolumeClaim) bool {
resizerName := pvc.Annotations[util.VolumeResizerKey]
selectedNode := pvc.Annotations[util.VolumeSelectedNodeKey]
// resizerName will be CSI driver name when CSI migration is enabled
// otherwise, it will be in-tree plugin name
// r.name is the CSI driver name, return true only when they match
Expand All @@ -119,7 +123,11 @@ func (r *csiResizer) CanSupport(pv *v1.PersistentVolume, pvc *v1.PersistentVolum
return false
}
if source.Driver != r.name {
klog.V(4).Infof("Skip resize PV %s for resizer %s", pv.Name, source.Driver)
klog.V(4).Infof("Skip resize PV %s for resizer %s non-matching driver", pv.Name, source.Driver)
return false
}
if r.enableNodeDeployment && selectedNode != os.Getenv("NODE_NAME") {
klog.V(4).Infof("Skip resize PV %s for resizer %s non-matching node", pv.Name, source.Driver)
return false
}
return true
Expand Down
6 changes: 3 additions & 3 deletions pkg/resizer/csi_resizer_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ func TestNewResizer(t *testing.T) {
client := csi.NewMockClient("mock", c.SupportsNodeResize, c.SupportsControllerResize, c.SupportsPluginControllerService, c.SupportsControllerSingleNodeMultiWriter)
driverName := "mock-driver"
k8sClient, informerFactory := fakeK8s()
resizer, err := NewResizerFromClient(client, 0, k8sClient, informerFactory, driverName)
resizer, err := NewResizerFromClient(client, 0, k8sClient, informerFactory, driverName, false)
if err != c.Error {
t.Errorf("Case %d: Unexpected error: wanted %v, got %v", i, c.Error, err)
}
Expand Down Expand Up @@ -160,7 +160,7 @@ func TestResizeMigratedPV(t *testing.T) {
client := csi.NewMockClient(driverName, true, true, true, true)
client.SetCheckMigratedLabel()
k8sClient, informerFactory := fakeK8s()
resizer, err := NewResizerFromClient(client, 0, k8sClient, informerFactory, driverName)
resizer, err := NewResizerFromClient(client, 0, k8sClient, informerFactory, driverName, false)
if err != nil {
t.Fatalf("Failed to create resizer: %v", err)
}
Expand Down Expand Up @@ -428,7 +428,7 @@ func TestCanSupport(t *testing.T) {
driverName := tc.driverName
client := csi.NewMockClient(driverName, true, true, true, true)
k8sClient, informerFactory := fakeK8s()
resizer, err := NewResizerFromClient(client, 0, k8sClient, informerFactory, driverName)
resizer, err := NewResizerFromClient(client, 0, k8sClient, informerFactory, driverName, false)
if err != nil {
t.Fatalf("Failed to create resizer: %v", err)
}
Expand Down
3 changes: 2 additions & 1 deletion pkg/util/events.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,5 +27,6 @@ const (
const (
// If CSI migration is enabled, the value will be CSI driver name
// Otherwise, it will be in-tree storage plugin name
VolumeResizerKey = "volume.kubernetes.io/storage-resizer"
VolumeResizerKey = "volume.kubernetes.io/storage-resizer"
VolumeSelectedNodeKey = "volume.kubernetes.io/selected-node"
)