diff --git a/.gitignore b/.gitignore index 143651a..edc0074 100644 --- a/.gitignore +++ b/.gitignore @@ -60,6 +60,8 @@ ssh_key_id_rsa.pub # kubeconfig /kubeone-kubeconfig +/kubeone-demo-kubeconfig +/kubeone-dev-kubeconfig /kubeconfig # helm temporary values files diff --git a/README.md b/README.md index 78a4fad..9ba82a9 100644 --- a/README.md +++ b/README.md @@ -75,8 +75,8 @@ The final result is a fully functioning, highly available, autoscaling Kubernete | Component | Type | Description | | --- | --- | --- | | [Cilium](https://cilium.io/) | Networking | An open-source, cloud native and eBPF-based Kubernetes CNI that is providing, securing and observing network connectivity between container workloads | -| [Longhorn](https://longhorn.io/) | Storage (Default) | Highly available persistent storage for Kubernetes, provides cloud-native block storage with backup functionality | -| [vCloud CSI driver](https://github.com/vmware/cloud-director-named-disk-csi-driver) | Storage (Alternative) | Container Storage Interface (CSI) driver for VMware Cloud Director | +| [vCloud CSI driver](https://github.com/vmware/cloud-director-named-disk-csi-driver) | Storage (Default) | Container Storage Interface (CSI) driver for VMware Cloud Director | +| [Longhorn](https://longhorn.io/) | Storage (Alternative) | Highly available persistent storage for Kubernetes, provides cloud-native block storage with backup functionality | | [Machine-Controller](https://github.com/kubermatic/machine-controller) | Compute | Dynamic creation of Kubernetes worker nodes on VMware Cloud Director | | [Ingress NGINX](https://kubernetes.github.io/ingress-nginx/) | Routing | Provides HTTP traffic routing, load balancing, SSL termination and name-based virtual hosting | | [Cert Manager](https://cert-manager.io/) | Certificates | Cloud-native, automated TLS certificate management and [Let's Encrypt](https://letsencrypt.org/) integration for Kubernetes | @@ -251,9 +251,9 @@ Here are some examples for possible cluster size customizations: | Worker | Maximum number of VMs | `cluster_autoscaler_max_replicas` | `15` | | Worker | vCPUs | `worker_cpus` | `4` | | Worker | Memory (in MB) | `worker_memory` | `16384` | -| Worker | Disk size (in GB) | `worker_disk_size_gb` | `180` | +| Worker | Disk size (in GB) | `worker_disk_size_gb` | `150` | -> **Note**: The more worker nodes you have, the smaller the disk size gets that they need in order to distribute and cover all your `PersistentVolume` needs. This is why the worker nodes in the *Large* cluster example actually have a smaller disk than in the *Medium* example. +> **Note**: The more worker nodes you have, the smaller the disk size gets that they need in order to distribute and cover all your `PersistentVolume` needs if you are using the Longhorn storage class. This is why the worker nodes in the *Large* cluster example actually have a smaller disk than in the *Medium* example. If you don't intend to use Longhorn volumes and mostly rely on the vCloud-CSI, you can reduce your worker disks to less than 100 GB each for example. Set the amount of control plane nodes to either be 1, 3 or 5. They have to be an odd number for the quorum to work correctly, and anything above 5 is not really that beneficial anymore. For a highly-available setup usually the perfect number of control plane nodes is `3`. @@ -287,8 +287,10 @@ The other file of interest is the main configuration file of KubeOne itself, [ku Please refer to the [Kubermatic KubeOne - v1beta2 API Reference](https://docs.kubermatic.com/kubeone/v1.6/references/kubeone-cluster-v1beta2/) for a full list of all configuration settings available. -The `kubeone.yaml` provided in this repository should mostly already have sensible defaults and only really needs to be adjusted if you want to make use of the vCloud-CSI for volumes on Kubernetes and set it as your default storage-class. It is currently not set as default since you will need to open up a Service Request with Swisscom first in order to request your API user being able to upload OVF templates while preserving the `ExtraConfig: disk.EnableUUID=true` parameter. By default API users on DCS+ unfortunately do not have the necessary permissions unless explicitely requested. Without that permission the uploaded OS template and any VMs created based on it will not allow the vCloud-CSI to detect attached disks by UUID, and thus not function properly. For this reason it is not set as the default storage-class. -If you are sure your API user has the necessary permission, then you can uncomment and modify the `default-storage-class` addon in `kubeone.yaml`, you will need to adjust the `storageProfile` of the `default-storage-class`: +The `kubeone.yaml` provided in this repository should mostly already have sensible defaults and only really needs to be adjusted if you either don't want to make use of the vCloud-CSI for volumes on Kubernetes and set it as your default storage-class, or to adjust the `storageProfile` to match your Swisscom DCS+ storage. + +Before you can use the vCloud-CSI you will need to open up a Service Request with Swisscom first in order to request your API user being able to upload OVF templates while preserving the `ExtraConfig: disk.EnableUUID=true` parameter. By default API users on DCS+ unfortunately do not have the necessary permissions unless explicitely requested. Without that permission the uploaded OS template and any VMs created based on it will not allow the vCloud-CSI to detect attached disks by UUID, and thus not function properly. +If you are sure your API user has the necessary permission, then all that is left to do is to modify the `default-storage-class` addon in `kubeone.yaml`, you will need to adjust the `storageProfile` of the `default-storage-class`: ```yaml addons: addons: @@ -298,11 +300,9 @@ addons: ``` Please adjust the `storageProfile` to one of the storage policies available to you in your Swisscom DCS+ data center. You can view the storage policies from the DCS+ UI by clicking on **Data Centers** -> **Storage** -> **Storage Policies**. -In order to not conflict with Longhorn you will also have to edit `deployments/longhorn.sh` and change the value of `persistence:defaultClass` to `false`. Otherwise Longhorn and vCloud-CSI storage classes would both claim to be the default at the same time! - -> **Note**: When using the vCloud-CSI you must adjust the `storageProfile`, or it is highly likely that *PersistentVolumes* will not work! Also make sure that your API user has the necessary **"vApp > Preserve ExtraConfig Elements during OVA Import and Export"** permission! +> **Note**: When using the vCloud-CSI you must adjust the `storageProfile` and have the additional permissions for OVF upload on your user/API accounts, or *PersistentVolumes* will not work! Make sure that your API user has the necessary **"vApp > Preserve ExtraConfig Elements during OVA Import and Export"** permission! -If you do not want to go through the trouble of having to request these extra permission for your API users, then you simply don't need to do any modifications to `kubeone.yaml`. By default this repository will also install Longhorn on your cluster and use it provide volumes. +If you do not want to go through the trouble of having to request these extra permission for your API users, then you simply don't need to deploy the vCloud-CSI. To disable it go into `kubeone.yaml` and comment out the `csi-vmware-cloud-director` and `default-storage-class` addons. This repository will then automatically configure Longhorn to be the default storage class on your cluster and use it provide volumes. ### Installation @@ -557,12 +557,10 @@ Run the Helm deployment again once the chart rollback is successful and it is no Due to the nature of Longhorn and how it distributes volume replicas, it might happen that the draining and eviction of a Kubernetes node can get blocked. Longhorn tries to keep all its volumes (and their replicas) in a *`Healthy`* state and thus can block node eviction. -If you noticed that the cluster-autoscaler or machine-controller cannot remove an old node, scale down to fewer nodes, or a node remaining seemingly forever being stuck in an unschedulable state, then it might be because there are Longhorn volume replicas on those nodes. +If you use Longhorn as your default storage class instead of the vCloud-CSI and you noticed that the cluster-autoscaler or machine-controller cannot remove an old node, scale down to fewer nodes, or a node remaining seemingly forever being stuck in an unschedulable state, then it might be because there are Longhorn volume replicas on those nodes. To fix the issue, login to the Longhorn UI (check further [above](#longhorn) on how to do that), go to the *"Node"* tab, click on the hamburger menu of the affected node and then select *"Edit Node and Disks"*. In the popup menu you can then forcefully disable *"Node Scheduling"* and enable *"Eviction Requested"*. This will instruct Longhorn to migrate the remaining volume replicas to other available nodes, thus freeing up Kubernetes to fully drain and remove the old node. -If you don't want to deal with the hassle of blocked node evictions at all, you could disable and remove the Longhorn deployment completely and instead use the [vCloud-CSI](https://github.com/vmware/cloud-director-named-disk-csi-driver) as an alternative storage provider. See the section about how to configure [`kubeone.yaml`](#kubeone) for caveats. - ## Q&A ### Why have shell scripts for deployments? diff --git a/kubeone.version.json b/kubeone.version.json index abeffc5..2485a2b 100644 --- a/kubeone.version.json +++ b/kubeone.version.json @@ -2,18 +2,18 @@ "kubeone": { "major": "1", "minor": "6", - "gitVersion": "1.6.1", - "gitCommit": "525565563dffd7cd97b8d9652b878fc32ee32fe2", + "gitVersion": "1.6.2", + "gitCommit": "184adc3b7d0c1e2e7630ded518cbfdfab7300755", "gitTreeState": "", - "buildDate": "2023-03-23T14:27:44Z", - "goVersion": "go1.19.6", + "buildDate": "2023-04-14T11:20:23Z", + "goVersion": "go1.19.8", "compiler": "gc", "platform": "linux/amd64" }, "machine_controller": { "major": "1", "minor": "56", - "gitVersion": "v1.56.0", + "gitVersion": "v1.56.2", "gitCommit": "", "gitTreeState": "", "buildDate": "", diff --git a/kubeone.yaml b/kubeone.yaml index 4b8b6d9..bfb5ec7 100644 --- a/kubeone.yaml +++ b/kubeone.yaml @@ -2,7 +2,7 @@ apiVersion: kubeone.k8c.io/v1beta2 kind: KubeOneCluster versions: - kubernetes: "1.26.3" + kubernetes: "1.26.4" cloudProvider: vmwareCloudDirector: diff --git a/machines/kubeone-worker-pool.yml b/machines/kubeone-worker-pool.yml index d9432a9..e6a52b4 100644 --- a/machines/kubeone-worker-pool.yml +++ b/machines/kubeone-worker-pool.yml @@ -59,9 +59,9 @@ spec: distUpgradeOnBoot: false sshPublicKeys: - | - ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQCipOCiyYaAXFDGY4F6bDuafSBax+iXEID338Mms7movZvv37DVYOktbCx0OyoWoYNXmm1w3s7MqqnEQYtYzB/qNWRkm2dBTRqvw8bMuvEa0srfo5sX/g7EuljsvpKG7rYoZXNk7+7lU7Bx4RRi2K7fKrQ8e30Mi9yjai3QHK5G8NTo0gapzdReb9NiTPofW39G3jm7U2B5gqzpbleUyrxfuNEv6iyayR7UXLcgCeEPH0vAhnXKnPgFSSL0dO8FbDUXvWCZlNmkDG8c18iRSfclHDqG2y9Nw7bd2sQnGM/z3mrAdlVlWgj9Vtx2OC/xGB1dBLwRuukiOT9rDGN/f4U+f2hwXgIr8LWVfKJqYbXf8ICePdw1O+iA9pDqIj7T3CbSumqL4+cmmZhea7Xp7Udy9Bf83Zl0NIibu6oidD/UNCcD9zCkdkHKAY28jZq4qSgHRA31hB25Fk2PpSHDdmGI2IyaGx8V4N92J4f5nYD/CkVDVLxmtcBD+FgXrwzResEJR0ftn9xjjP+SmE8iiW7MwqRil86EOsaQ0Po3vG2x7JTsQJrxwhf2nC2v6dhcaLjTsl6BwOq95+JzRHrniOhXe2sIl4AZRdKwjxADUYU0f7IiH5Ef+BwA8n5jMG0P5fha/S0BQhbhhvwvlb3UZGlirzTjFytdPQ/yz/ouhN+iFQ== jamesclonk@cluster-admin + ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQC2QKTfntKlsoo4ELBZgBLvIshQht2hzaQ7AGmJU2cxnt6cen7mwkqHkYA+ZoBBSi5DxcAGv1M5PX7ICiW8vdkHjStQzLXvXeov7YYKR9+WXBPhw5MF6O/PqW30Vf40EnRWdr0mlTcm/8BYai4DyJDNLayOxkefsVfhUDw42/nWK2uHEA/RzCLrYQhp4A6fP2YWXoGWNHzHuOfY1rea+TZkDStPcJQj07Qlvqpb9wk7O9VxaaxcsJbHQDpFzW4nWFYzb2AJUBbnQZoRDlx0GAYSVyEPXRIL7mpTUHKyZeLD5b46xvWyhynMGOfMChtihGX2ITdidLzd5WXs162pt0rhf/Y0aAwbgDN0R1UfS8l7dLiNojj4sZcZ717eEjaCGZUKBeNftHvKnF+G8YZHB3eOR2ryoNT2S5ZlwabJiOrP3V51us2KMlkvdeI81VeVLU+C39MY/jGipAkIsQil+XKLtw7HVSEZ+gH/3MmFeGrOM5wbiIatum15J0IGxGTma+E= admin@kubernetes versions: - kubelet: 1.26.3 + kubelet: 1.26.4 status: {} --- diff --git a/terraform/output.json b/terraform/output.json index 16fb9cf..e1e8bdd 100644 --- a/terraform/output.json +++ b/terraform/output.json @@ -199,7 +199,7 @@ "distUpgradeOnBoot": false }, "sshPublicKeys": [ - "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQCipOCiyYaAXFDGY4F6bDuafSBax+iXEID338Mms7movZvv37DVYOktbCx0OyoWoYNXmm1w3s7MqqnEQYtYzB/qNWRkm2dBTRqvw8bMuvEa0srfo5sX/g7EuljsvpKG7rYoZXNk7+7lU7Bx4RRi2K7fKrQ8e30Mi9yjai3QHK5G8NTo0gapzdReb9NiTPofW39G3jm7U2B5gqzpbleUyrxfuNEv6iyayR7UXLcgCeEPH0vAhnXKnPgFSSL0dO8FbDUXvWCZlNmkDG8c18iRSfclHDqG2y9Nw7bd2sQnGM/z3mrAdlVlWgj9Vtx2OC/xGB1dBLwRuukiOT9rDGN/f4U+f2hwXgIr8LWVfKJqYbXf8ICePdw1O+iA9pDqIj7T3CbSumqL4+cmmZhea7Xp7Udy9Bf83Zl0NIibu6oidD/UNCcD9zCkdkHKAY28jZq4qSgHRA31hB25Fk2PpSHDdmGI2IyaGx8V4N92J4f5nYD/CkVDVLxmtcBD+FgXrwzResEJR0ftn9xjjP+SmE8iiW7MwqRil86EOsaQ0Po3vG2x7JTsQJrxwhf2nC2v6dhcaLjTsl6BwOq95+JzRHrniOhXe2sIl4AZRdKwjxADUYU0f7IiH5Ef+BwA8n5jMG0P5fha/S0BQhbhhvwvlb3UZGlirzTjFytdPQ/yz/ouhN+iFQ== jamesclonk@cluster-admin\n" + "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQC2QKTfntKlsoo4ELBZgBLvIshQht2hzaQ7AGmJU2cxnt6cen7mwkqHkYA+ZoBBSi5DxcAGv1M5PX7ICiW8vdkHjStQzLXvXeov7YYKR9+WXBPhw5MF6O/PqW30Vf40EnRWdr0mlTcm/8BYai4DyJDNLayOxkefsVfhUDw42/nWK2uHEA/RzCLrYQhp4A6fP2YWXoGWNHzHuOfY1rea+TZkDStPcJQj07Qlvqpb9wk7O9VxaaxcsJbHQDpFzW4nWFYzb2AJUBbnQZoRDlx0GAYSVyEPXRIL7mpTUHKyZeLD5b46xvWyhynMGOfMChtihGX2ITdidLzd5WXs162pt0rhf/Y0aAwbgDN0R1UfS8l7dLiNojj4sZcZ717eEjaCGZUKBeNftHvKnF+G8YZHB3eOR2ryoNT2S5ZlwabJiOrP3V51us2KMlkvdeI81VeVLU+C39MY/jGipAkIsQil+XKLtw7HVSEZ+gH/3MmFeGrOM5wbiIatum15J0IGxGTma+E= admin@kubernetes\n" ] }, "replicas": 3 diff --git a/tools/install_tools.sh b/tools/install_tools.sh index a389f54..cbdaf3c 100755 --- a/tools/install_tools.sh +++ b/tools/install_tools.sh @@ -86,22 +86,22 @@ if [ ${OS} == "Darwin" ]; then echo "-> downloading binaries for Apple Silicon MacOSX ..." install_tool "kubectl" "https://storage.googleapis.com/kubernetes-release/release/v1.25.8/bin/darwin/arm64/kubectl" "6519e273017590bd8b193d650af7a6765708f1fed35dcbcaffafe5f33872dfb4" install_tool "jq" "https://github.com/stedolan/jq/releases/download/jq-1.6/jq-osx-amd64" "5c0a0a3ea600f302ee458b30317425dd9632d1ad8882259fcaf4e9b868b2b1ef" - install_tool_from_zipfile "kubeone" "kubeone" "https://github.com/kubermatic/kubeone/releases/download/v1.6.1/kubeone_1.6.1_darwin_arm64.zip" "caad36ea534741204abc7bc13e97b3744676ebab6885cc30c87be9eef6d53138" + install_tool_from_zipfile "kubeone" "kubeone" "https://github.com/kubermatic/kubeone/releases/download/v1.6.2/kubeone_1.6.2_darwin_arm64.zip" "6119c779cfef51ceb50d8ca6ccad55a8409fbcc75046a76c9db40197ec85b773" install_tool_from_zipfile "terraform" "terraform" "https://releases.hashicorp.com/terraform/1.2.9/terraform_1.2.9_darwin_arm64.zip" "98f73281fd89a4bac7426149b9f2de8df492eb660b9441f445894dd112fd2c5c" install_tool_from_tarball "darwin-arm64/helm" "helm" "https://get.helm.sh/helm-v3.10.3-darwin-arm64.tar.gz" "b5176d9b89ff43ac476983f58020ee2407ed0cbb5b785f928a57ff01d2c63754" else echo "-> downloading binaries for Intel MacOSX ..." install_tool "kubectl" "https://storage.googleapis.com/kubernetes-release/release/v1.25.8/bin/darwin/amd64/kubectl" "4fc94a62065d25f8048272da096e1c5e3bd22676752fb3a24537e4ad62a33382" install_tool "jq" "https://github.com/stedolan/jq/releases/download/jq-1.6/jq-osx-amd64" "5c0a0a3ea600f302ee458b30317425dd9632d1ad8882259fcaf4e9b868b2b1ef" - install_tool_from_zipfile "kubeone" "kubeone" "https://github.com/kubermatic/kubeone/releases/download/v1.6.1/kubeone_1.6.1_darwin_arm64.zip" "caad36ea534741204abc7bc13e97b3744676ebab6885cc30c87be9eef6d53138" - install_tool_from_zipfile "terraform" "terraform" "https://releases.hashicorp.com/terraform/1.2.9/terraform_1.2.9_darwin_arm64.zip" "98f73281fd89a4bac7426149b9f2de8df492eb660b9441f445894dd112fd2c5c" + install_tool_from_zipfile "kubeone" "kubeone" "https://github.com/kubermatic/kubeone/releases/download/v1.6.2/kubeone_1.6.2_darwin_amd64.zip" "ac4b003da67aa9ee900421be353259b82364ff9dc5180502939ab9afbf0bb5cf" + install_tool_from_zipfile "terraform" "terraform" "https://releases.hashicorp.com/terraform/1.2.9/terraform_1.2.9_darwin_amd64.zip" "4b7b4179653c5d501818d8523575e86e60f901506b986d035f2aa6870a810f24" install_tool_from_tarball "darwin-amd64/helm" "helm" "https://get.helm.sh/helm-v3.10.3-darwin-amd64.tar.gz" "8f422d213a9f3530fe516c8b69be74059d89b9954b1afadb9ae6dc81edb52615" fi else echo "-> downloading binaries for Linux ..." install_tool "kubectl" "https://storage.googleapis.com/kubernetes-release/release/v1.25.8/bin/linux/amd64/kubectl" "80e70448455f3d19c3cb49bd6ff6fc913677f4f240d368fa2b9f0d400c8cd16e" install_tool "jq" "https://github.com/stedolan/jq/releases/download/jq-1.6/jq-linux64" "af986793a515d500ab2d35f8d2aecd656e764504b789b66d7e1a0b727a124c44" - install_tool_from_zipfile "kubeone" "kubeone" "https://github.com/kubermatic/kubeone/releases/download/v1.6.1/kubeone_1.6.1_linux_amd64.zip" "d4df1bd4581988837567ae881808cb8b92b5fde38de8794e251562766cd24ccb" + install_tool_from_zipfile "kubeone" "kubeone" "https://github.com/kubermatic/kubeone/releases/download/v1.6.2/kubeone_1.6.2_linux_amd64.zip" "3586b92e0c8e7a18384ffccfa160faf25290ecf86828419df71720947f82fdb6" install_tool_from_zipfile "terraform" "terraform" "https://releases.hashicorp.com/terraform/1.2.9/terraform_1.2.9_linux_amd64.zip" "70fa1a9c71347e7b220165b9c06df0a55f5af57dad8135f14808b343d1b5924a" install_tool_from_tarball "linux-amd64/helm" "helm" "https://get.helm.sh/helm-v3.10.3-linux-amd64.tar.gz" "cc5223b23fd2ccdf4c80eda0acac7a6a5c8cdb81c5b538240e85fe97aa5bc3fb" fi