Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e: use helm to install out-of-tree cloud-provider-azure #2209

Conversation

jackfrancis
Copy link
Contributor

@jackfrancis jackfrancis commented Mar 31, 2022

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR updates the "external" (or out-of-tree) cloud-provider-azure templates and test implementation so that we use the official helm chart (see this PR: kubernetes-sigs/cloud-provider-azure#1306).

This approach has the following advantages:

  • rather than define static versions of cloud-provider-azure to install, we rely upon the helm chart to choose the right version for the version of Kubernetes running in the cluster
  • no longer rely upon ClusterResourceSet CRD, which is alpha and may not be enabled on your cluster-api mgmt cluster

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

replace ClusterResourceSet with helm for external cloud-provider-azure templates

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Mar 31, 2022
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Mar 31, 2022
@jackfrancis
Copy link
Contributor Author

/test pull-cluster-api-provider-azure-e2e-optional

Copy link
Contributor

@jsturtevant jsturtevant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern with this is we now have two ways to install cluster addons on until the new proposal lands and goes forward. This just adds some complexity from maintenace and possible confusion. We are also suggesting using helm for most out of the box clusters. Is that what we desire? I don't have any strong objections just pointing it out.

We also probably need to add docs on how to do this post cluster creation.

@@ -450,6 +450,10 @@ var _ = Describe("Workload cluster creation", func() {
WaitForClusterIntervals: e2eConfig.GetIntervals(specName, "wait-cluster"),
WaitForControlPlaneIntervals: e2eConfig.GetIntervals(specName, "wait-control-plane"),
WaitForMachinePools: e2eConfig.GetIntervals(specName, "wait-machine-pool-nodes"),
ControlPlaneWaiters: clusterctl.ControlPlaneWaiters{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, didn't know about this!

test/e2e/cloud-provider-azure.go Outdated Show resolved Hide resolved
test/e2e/cloud-provider-azure.go Outdated Show resolved Hide resolved
@jackfrancis jackfrancis force-pushed the e2e-external-cloud-provider-helm branch 2 times, most recently from fa8fd21 to 7313554 Compare April 1, 2022 20:05
@jackfrancis
Copy link
Contributor Author

/test pull-cluster-api-provider-azure-e2e-optional

docs/book/src/topics/cloud-provider-config.md Outdated Show resolved Hide resolved
docs/book/src/topics/cloud-provider-config.md Outdated Show resolved Hide resolved
if err != nil {
return err
}
if len(n.Items) == (int(*input.ConfigCluster.WorkerMachineCount) + int(*input.ConfigCluster.ControlPlaneMachineCount)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't run for windows currently, but if it did we would need to include WINDOWS_WORKER_MACHINE_COUNT here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this brings up a good question. On this test, for example, this WorkerMachineCount value seems to apply to each node pool (not total number of expected worker nodes):

https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/v1.2.1/test/e2e/azure_test.go#L216

And indeed, this value seems to be used to populate the named env var that kustomize uses to configure a single value which by convention is shared across all MachinePools/MachineDeployments:

https://github.com/kubernetes-sigs/cluster-api/blob/v1.1.3/cmd/clusterctl/client/config.go#L436

This check is really here to know that we're ready to reliably perform a helm install against the cluster. We can simply check for 1 Running node to do that.

This will work for a Windows-enabled cluster.

Do we want to just convert the reference "external-cloud-provider" template to have a Windows MachineDeployment as well? @CecileRobertMichon for thoughts as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should not change the existing external cloud provider template in this PR but instead change all the reference templates to use external cloud provider as we had started doing in #1889 in a follow PR. That will include various windows flavors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense

@jackfrancis jackfrancis force-pushed the e2e-external-cloud-provider-helm branch 3 times, most recently from ca7138b to 1fc368f Compare April 2, 2022 00:34
@jackfrancis
Copy link
Contributor Author

/test pull-cluster-api-provider-azure-e2e-optional

@jackfrancis jackfrancis force-pushed the e2e-external-cloud-provider-helm branch from 1fc368f to 60d5a04 Compare April 4, 2022 17:17
p := helmGetter.All(settings)
valueOpts := &helmVals.Options{}
valueOpts.Values = []string{fmt.Sprintf("infra.clusterName=%s", input.ClusterProxy.GetName())}
if imageRegistryFromEnv := os.Getenv("IMAGE_REGISTRY"); imageRegistryFromEnv != "" {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CecileRobertMichon @jsturtevant the below is a proposed solution to ensuring these changes are back-compat with upstream cloud-provider-azure test-infra jobs that rely upon capz to test out-of-tree. This is how the current tests currently work:

  1. TEST_CCM=true is set when calling scripts/ci-entrypoint.sh:
  1. capz maintains a script to set the CI-built image data here:

These env vars aren't as specifically named as we'd ideally want, but re-using them as-is allows us to reduce the change surface area (don't have to update capz CI scripts and/or test-infra job defs).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the out of tree cloud provider PR tests and conformance tests don't run CAPZ e2e so they won't call InstallCloudProviderAzureHelmChart as far as I know. We may need to 1) modify ci-entrypoint.sh directly to do the helm install for out of tree cases, and 2) change conformance_test.go to use this custom control plane waiter too when testing out of tree

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have to worry about the conformance stuff yet (in this PR), it seems the only template flavors that it currently uses are default and windows (not the external templates).

When we eventually get to the part of this effort where we're testing out-of-tree by default, then yeah, the conformance ControlPlaneWaiters will need to get updated as well.

@jackfrancis
Copy link
Contributor Author

/test pull-cluster-api-provider-azure-e2e-optional

@jackfrancis jackfrancis force-pushed the e2e-external-cloud-provider-helm branch from 60d5a04 to 20b2855 Compare April 4, 2022 21:18
@jackfrancis
Copy link
Contributor Author

/test pull-cluster-api-provider-azure-e2e-optional

@k8s-ci-robot k8s-ci-robot requested a review from lzhecheng April 4, 2022 22:40
@jackfrancis jackfrancis force-pushed the e2e-external-cloud-provider-helm branch from 20b2855 to fb2b32e Compare April 4, 2022 23:02
@@ -152,6 +154,11 @@ create_cluster
# export the target cluster KUBECONFIG if not already set
export KUBECONFIG="${KUBECONFIG:-${PWD}/kubeconfig}"

# install cloud-provider-azure components, if using out-of-tree
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CecileRobertMichon @lzhecheng incorporated the helm stuff into the ci-entrypoint.sh script itself. The way this will work is basically the same way I implemented this in the Tiltfile (wait for kubeconfig secret, wait for successful get nodes). Reference:

# Wait for the kubeconfig to become available.
timeout --foreground 300 bash -c "while ! kubectl get secrets | grep $(CLUSTER_NAME)-kubeconfig; do sleep 1; done"
# Get kubeconfig and store it locally.
kubectl get secrets $(CLUSTER_NAME)-kubeconfig -o json | jq -r .data.value | base64 --decode > ./kubeconfig
timeout --foreground 600 bash -c "while ! kubectl --kubeconfig=./kubeconfig get nodes | grep control-plane; do sleep 1; done"

@jackfrancis
Copy link
Contributor Author

/test pull-cluster-api-provider-azure-e2e-optional

@jackfrancis jackfrancis force-pushed the e2e-external-cloud-provider-helm branch from fb2b32e to bb81770 Compare April 4, 2022 23:19
Namespace: input.ConfigCluster.Namespace,
}
return client.Get(ctx, key, secret)
}, 20*time.Minute, 5*time.Second).Should(Succeed())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be using a config interval variable here instead of hardcoding to 20 minutes? same question for intervals below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reusing the WaitForControlPlaneIntervals var everywhere here

arguably you can tune these down per retryable operations, but it's debatable whether or not it's worth the effort

@jackfrancis jackfrancis force-pushed the e2e-external-cloud-provider-helm branch from 90026ff to 2527091 Compare April 8, 2022 22:30
@jackfrancis
Copy link
Contributor Author

/retest

@jackfrancis
Copy link
Contributor Author

/test pull-cluster-api-provider-azure-e2e-optional

@jackfrancis
Copy link
Contributor Author

/retest

@CecileRobertMichon
Copy link
Contributor

/lgtm
/assign @jsturtevant

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 11, 2022
@CecileRobertMichon
Copy link
Contributor

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 11, 2022
@CecileRobertMichon
Copy link
Contributor

/assign @mboersma

James is OOF this week

Copy link
Contributor

@mboersma mboersma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few questions, but this looks good.

@@ -331,9 +332,13 @@ def deploy_worker_templates(template, substitutions):
yaml = yaml.replace("${" + substitution + "}", value)

yaml = yaml.replace('"', '\\"') # add escape character to double quotes in yaml
flavor_name = os.path.basename(flavor)
flavor_cmd = "RANDOM=$(bash -c 'echo $RANDOM'); CLUSTER_NAME=" + flavor.replace("windows", "win") + "-$RANDOM; make generate-flavors; echo \"" + yaml + "\" > ./.tiltbuild/" + flavor + "; cat ./.tiltbuild/" + flavor + " | " + envsubst_cmd + " | " + kubectl_cmd + " apply -f - && echo \"Cluster \'$CLUSTER_NAME\' created, don't forget to delete\""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to suggest this would be better as an f-string, but then I remembered this is Starlark.

@@ -236,6 +241,9 @@ verify-tiltfile: ## Verify Tiltfile format.

##@ Development:

.PHONY: install-tools # populate hack/tools/bin
install-tools: $(ENVSUBST) $(KUSTOMIZE) $(KUBECTL) $(HELM) $(GINKGO)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also mildly opposed to this change if it adds any noticeable time to testing targets. Could we maybe have the new target install just the tools that are actually shared, or is this still a maintenance headache?

install-tools: $(ENVSUBST) $(KUSTOMIZE) $(KUBECTL)

test-e2e-run: generate-e2e-templates install-tools $(GINKGO)

docs/book/src/developers/kubernetes-developers.md Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 14, 2022
@jackfrancis jackfrancis force-pushed the e2e-external-cloud-provider-helm branch from 7251cc7 to f8c166d Compare April 14, 2022 23:15
@jackfrancis
Copy link
Contributor Author

/test pull-cluster-api-provider-azure-e2e-optional

Copy link
Contributor

@mboersma mboersma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 15, 2022
@CecileRobertMichon
Copy link
Contributor

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CecileRobertMichon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 15, 2022
@k8s-ci-robot k8s-ci-robot merged commit b06af49 into kubernetes-sigs:main Apr 15, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.3 milestone Apr 15, 2022
@jackfrancis jackfrancis deleted the e2e-external-cloud-provider-helm branch April 15, 2022 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants