diff --git a/architecture-decision-record/026-Managed-Prometheus.md b/architecture-decision-record/026-Managed-Prometheus.md index deb0459c..face2c2a 100644 --- a/architecture-decision-record/026-Managed-Prometheus.md +++ b/architecture-decision-record/026-Managed-Prometheus.md @@ -16,12 +16,12 @@ Use [Amazon Managed Service for Prometheus](https://aws.amazon.com/prometheus/) It's good operational practice to have good 'observability'. This includes monitoring, achieved by regular checking the metrics, or health numbers, of the containers running. The timeseries data which is collected can be shown as graphs or other indicators in a dashboard, and evaluated against rules which trigger alerts to the operators. Typical use by operators include: -* to become familiar with the typical quantity of resources consumed by their software -* to be alerted to deteriorating health, so that they can fix it, before it becomes an incident -* being alerted to an incident, to be able to react quickly, not just when users flag it -* during an incident getting an at-a-glance overview of where problems exist -* after an incident to understand what went wrong, and help review the actions taken during the response -* reviewing long-term patterns of health +- to become familiar with the typical quantity of resources consumed by their software +- to be alerted to deteriorating health, so that they can fix it, before it becomes an incident +- being alerted to an incident, to be able to react quickly, not just when users flag it +- during an incident getting an at-a-glance overview of where problems exist +- after an incident to understand what went wrong, and help review the actions taken during the response +- reviewing long-term patterns of health ### Choice of Prometheus @@ -35,9 +35,9 @@ So overall we are happy to stick with Prometheus. Prometheus is setup to monitor the whole of Cloud Platform, including: -* Tenant containers -* Tenant AWS resources -* Kubernetes cluster. kube-prometheus +- Tenant containers +- Tenant AWS resources +- Kubernetes cluster. kube-prometheus Prometheus is configured to store 24h worth of data, which is enough to support most use cases. The data is also sent on to Thanos, which efficiently stores 1 year of metrics data, and makes it available for queries using the same PromQL syntax. @@ -47,18 +47,19 @@ Alertmanager uses the Prometheus data when evaluating its alert rules. The Prometheus container has not run smoothly in recent months: -* **Performance (resolved)** - There were some serious performance issues - alert rules were taking too long to evaluate against the Prometheus data, however this was successfully alleviated by increasing the disk iops, so is not a remaining concern. +- **Performance (resolved)** - There were some serious performance issues - alert rules were taking too long to evaluate against the Prometheus data, however this was successfully alleviated by increasing the disk iops, so is not a remaining concern. -* **Custom node group** - Being a single Prometheus instance for monitoring the entire platform, it consumes a lot of resources. We've put it on a dedicated node, so it has the full resources. And it needs more memory than other nodes, which means it needs a custom node group, which is a bit of extra management overhead. +- **Custom node group** - Being a single Prometheus instance for monitoring the entire platform, it consumes a lot of resources. We've put it on a dedicated node, so it has the full resources. And it needs more memory than other nodes, which means it needs a custom node group, which is a bit of extra management overhead. -* **Scalability** - Scaling in this vertical way is not ideal - scaling up is not smooth and eventually we'll hit a limit of CPU/memory/iops. There are options to shard - see below. +- **Scalability** - Scaling in this vertical way is not ideal - scaling up is not smooth and eventually we'll hit a limit of CPU/memory/iops. There are options to shard - see below. We also need to address: -* **Management overhead** - Managed cloud services are generally preferred to self-managed because the cost tends to be amortized over a large customer base and be far cheaper than in-house staff. And people with ops skills are at a premium. The management overhead is: - * for each of Prometheus, kube-prometheus +- **Management overhead** - Managed cloud services are generally preferred to self-managed because the cost tends to be amortized over a large customer base and be far cheaper than in-house staff. And people with ops skills are at a premium. The management overhead is: -* **High availability** - We have a single instance of Prometheus, simply because we've not got round to choosing and implementing a HA arrangement yet. This risks periods of outage where we don't collect metrics data. Although the impact on the use cases is not likely to be very disruptive, there is some value in fixing this up. + - for each of Prometheus, kube-prometheus + +- **High availability** - We have a single instance of Prometheus, simply because we've not got round to choosing and implementing a HA arrangement yet. This risks periods of outage where we don't collect metrics data. Although the impact on the use cases is not likely to be very disruptive, there is some value in fixing this up. ### Options for addressing the concerns @@ -82,21 +83,20 @@ Resilience: AMP is relatively isolated against cluster issues. The data kept in Lock-in: the configuration syntax and other interfaces are the same or similar to our existing self-hosted Prometheus, so we maintain low lock-in / migration cost. - ### Existing install -The 'monitoring' namespace is configured in [components terraform](https://github.com/ministryofjustice/cloud-platform-infrastructure/blob/main/terraform/aws-accounts/cloud-platform-aws/vpc/eks/components/components.tf#L115-L138) calling the [cloud-platform-terraform-monitoring module](https://github.com/ministryofjustice/cloud-platform-terraform-monitoring). This [installs](https://github.com/ministryofjustice/cloud-platform-terraform-monitoring/blob/main/prometheus.tf#L88) the [kube-prometheus-stack Helm chart](https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/README.md) / [kube-prometheus](https://github.com/prometheus-operator/kube-prometheus) (among other things). +The 'monitoring' namespace is configured in [components terraform](https://github.com/ministryofjustice/cloud-platform-infrastructure/blob/main/terraform/aws-accounts/cloud-platform-aws/vpc/eks/core/components/components.tf#L115-L138) calling the [cloud-platform-terraform-monitoring module](https://github.com/ministryofjustice/cloud-platform-terraform-monitoring). This [installs](https://github.com/ministryofjustice/cloud-platform-terraform-monitoring/blob/main/prometheus.tf#L88) the [kube-prometheus-stack Helm chart](https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/README.md) / [kube-prometheus](https://github.com/prometheus-operator/kube-prometheus) (among other things). [kube-prometheus](https://github.com/prometheus-operator/kube-prometheus) contains a number of things: -* [Prometheus Operator](https://github.com/prometheus-operator/prometheus-operator) - adds kubernetes-native wrappers for managing Prometheus - * CRDs for install: Prometheus, Alertmanager, Grafana, ThanosRuler - * CRDs for configuring: ServiceMonitor, PodMonitor, Probe, PrometheusRule, AlertmanagerConfig - - allows specifying monitoring targets using kubernetes labels -* Kubernetes manifests -* Grafana dashboards -* Prometheus rules -* example configs for: node_exporter, scrape targets, alerting rules for cluster issues +- [Prometheus Operator](https://github.com/prometheus-operator/prometheus-operator) - adds kubernetes-native wrappers for managing Prometheus + - CRDs for install: Prometheus, Alertmanager, Grafana, ThanosRuler + - CRDs for configuring: ServiceMonitor, PodMonitor, Probe, PrometheusRule, AlertmanagerConfig + - allows specifying monitoring targets using kubernetes labels +- Kubernetes manifests +- Grafana dashboards +- Prometheus rules +- example configs for: node_exporter, scrape targets, alerting rules for cluster issues High Availability - not implemented (yet). @@ -105,8 +105,8 @@ https://github.com/ministryofjustice/cloud-platform/issues/1749#issue-587058014 Prometheus config is held in k8s resources: -* ServiceMonitor -* PrometheusRule - alerting +- ServiceMonitor +- PrometheusRule - alerting ## How it would work with AMP @@ -122,23 +122,23 @@ Storage: - you can throw as much data at it. Instead there is a days limit of 15 Alertmanager: -* AMP has an Alertmanager-compatible option, which we'd use with the same rules -* Sending alerts would need to us to configure: create SNS topic that forwards to user Slack channels +- AMP has an Alertmanager-compatible option, which we'd use with the same rules +- Sending alerts would need to us to configure: create SNS topic that forwards to user Slack channels Grafana: -* Amazon Managed Grafana has no terraform support yet so just setup in AWS console. So in the meantime we stick with self-managed Grafana, which works fine. +- Amazon Managed Grafana has no terraform support yet so just setup in AWS console. So in the meantime we stick with self-managed Grafana, which works fine. Prometheus web interface - previously AMP was headless, but now it comes with the web interface Prometheus Rules and Alerts: -* In our existing cluster: - * we get ~3500 Prometheus rules from: https://github.com/kubernetes-monitoring/kubernetes-mixin - * kube-prometheus compiles it to JSON and applies it to the cluster -* So for our new cluster: - * we need to do the same thing for our new cluster. But let's avoid using kube-prometheus. Just copy what it does. - * when we upgrade the prometheus version, we'll manually [run the jsonnet config generation](https://github.com/kubernetes-monitoring/kubernetes-mixin#generate-config-files), and paste the resulting rules into our terraform module e.g.: https://github.com/ministryofjustice/cloud-platform-terraform-amp/blob/main/example/rules.tf +- In our existing cluster: + - we get ~3500 Prometheus rules from: https://github.com/kubernetes-monitoring/kubernetes-mixin + - kube-prometheus compiles it to JSON and applies it to the cluster +- So for our new cluster: + - we need to do the same thing for our new cluster. But let's avoid using kube-prometheus. Just copy what it does. + - when we upgrade the prometheus version, we'll manually [run the jsonnet config generation](https://github.com/kubernetes-monitoring/kubernetes-mixin#generate-config-files), and paste the resulting rules into our terraform module e.g.: https://github.com/ministryofjustice/cloud-platform-terraform-amp/blob/main/example/rules.tf ### Still to figure out @@ -152,8 +152,8 @@ Look at scale and costs. Ingestion: $1 for 10m samples Prices (Ireland): -* EU-AMP:MetricSampleCount - $0.35 per 10M metric samples for the next 250B metric samples -* EU-AMP:MetricStorageByteHrs - $0.03 per GB-Mo for storage above 10GB +- EU-AMP:MetricSampleCount - $0.35 per 10M metric samples for the next 250B metric samples +- EU-AMP:MetricStorageByteHrs - $0.03 per GB-Mo for storage above 10GB #### Region @@ -163,10 +163,10 @@ AMP is not released in the London region yet (at the time of writing, 3/11/21). We should check our usage of these related components, and if we still need them in the new cluster: -* CloudWatch exporter -* Node exporter -* ECR exporter -* Pushgateway +- CloudWatch exporter +- Node exporter +- ECR exporter +- Pushgateway #### Showing alerts @@ -178,4 +178,4 @@ Or maybe we can give users read-only access to the console, for their team's SNS #### Workspace as a service? -We could offer users a Prometheus workspace to themselves - a full monitoring stack that they fully control. Just a terraform module they can run. Maybe this is better for everyone, than a centralized one, or just for some specialized users - do some comparison? +We could offer users a Prometheus workspace to themselves - a full monitoring stack that they fully control. Just a terraform module they can run. Maybe this is better for everyone, than a centralized one, or just for some specialized users - do some comparison? diff --git a/runbooks/source/add-concourse-to-cluster.html.md.erb b/runbooks/source/add-concourse-to-cluster.html.md.erb index 0fe30cc9..dba51a97 100644 --- a/runbooks/source/add-concourse-to-cluster.html.md.erb +++ b/runbooks/source/add-concourse-to-cluster.html.md.erb @@ -40,8 +40,8 @@ terraform plan -var "enable_oidc_associate=false" terraform apply -var "enable_oidc_associate=false" ``` -- Go to [`cloud-platform-infrastructure/terraform/aws-accounts/cloud-platform-aws/vpc/eks/components` directory](https://github.com/ministryofjustice/cloud-platform-infrastructure/tree/main/terraform/aws-accounts/cloud-platform-aws/vpc/eks/components). - Amend the following file and remove the count line from the [concourse module](https://github.com/ministryofjustice/cloud-platform-infrastructure/blob/main/terraform/aws-accounts/cloud-platform-aws/vpc/eks/components/components.tf#L2). +- Go to [`cloud-platform-infrastructure/terraform/aws-accounts/cloud-platform-aws/vpc/eks/core/components` directory](https://github.com/ministryofjustice/cloud-platform-infrastructure/tree/main/terraform/aws-accounts/cloud-platform-aws/vpc/eks/core/components). + Amend the following file and remove the count line from the [concourse module](https://github.com/ministryofjustice/cloud-platform-infrastructure/blob/main/terraform/aws-accounts/cloud-platform-aws/vpc/eks/core/components/components.tf#L2). - Apply the terraform module to your test cluster ``` diff --git a/runbooks/source/add-new-receiver-alert-manager.html.md.erb b/runbooks/source/add-new-receiver-alert-manager.html.md.erb index 31beb346..bae59452 100644 --- a/runbooks/source/add-new-receiver-alert-manager.html.md.erb +++ b/runbooks/source/add-new-receiver-alert-manager.html.md.erb @@ -22,7 +22,7 @@ You must have the below details from the development team. ## Creating a new receiver set -1. Fill in the template with the details provided from development team and add the array to [`terraform.tfvars`](https://github.com/ministryofjustice/cloud-platform-infrastructure/blob/main/terraform/aws-accounts/cloud-platform-aws/vpc/eks/components/terraform.tfvars) file. +1. Fill in the template with the details provided from development team and add the array to [`terraform.tfvars`](https://github.com/ministryofjustice/cloud-platform-infrastructure/blob/main/terraform/aws-accounts/cloud-platform-aws/vpc/eks/core/components/terraform.tfvars) file. The `terraform.tfvars` file is encrypted so you have to `git-crypt unlock` to view the contents of the file. Check [git-crypt documentation in user guide](https://user-guide.cloud-platform.service.justice.gov.uk/documentation/other-topics/git-crypt-setup.html#git-crypt) for more information on how to setup git-crypt. diff --git a/runbooks/source/auth0-rotation.html.md.erb b/runbooks/source/auth0-rotation.html.md.erb index 9a4fd274..40976ccd 100644 --- a/runbooks/source/auth0-rotation.html.md.erb +++ b/runbooks/source/auth0-rotation.html.md.erb @@ -34,7 +34,7 @@ $ terraform apply ## 2) Apply changes within components (terraform) -Execute `terraform plan` inside [`cloud-platform-infrastructure/terraform/aws-accounts/cloud-platform-aws/vpc/eks` directory](https://github.com/ministryofjustice/cloud-platform-infrastructure/tree/main/terraform/aws-accounts/cloud-platform-aws/vpc/eks/components) +Execute `terraform plan` inside [`cloud-platform-infrastructure/terraform/aws-accounts/cloud-platform-aws/vpc/eks/core/components` directory](https://github.com/ministryofjustice/cloud-platform-infrastructure/tree/main/terraform/aws-accounts/cloud-platform-aws/vpc/eks/core/components) to ensure changes match resources below, if they do, apply them: ``` @@ -63,9 +63,9 @@ In order to verify that the changes were successfully applied, follow the checkl ## 4) Update Manager cluster within components (terraform) Our pipelines read auth0 credentials from a K8S secret inside the manager cluster. This secret is updated through concourse's TF module variable called `tf_provider_auth0_client_secret` and `tf_provider_auth0_client_id` in -[cloud-platform-infrastructure/terraform/aws-accounts/cloud-platform-aws/vpc/eks/components/terraform.tfvars](https://github.com/ministryofjustice/cloud-platform-infrastructure/tree/main/terraform/aws-accounts/cloud-platform-aws/vpc/eks/components/terraform.tfvars) +[cloud-platform-infrastructure/terraform/aws-accounts/cloud-platform-aws/vpc/eks/core/components/terraform.tfvars](https://github.com/ministryofjustice/cloud-platform-infrastructure/tree/main/terraform/aws-accounts/cloud-platform-aws/vpc/eks/core/components/terraform.tfvars) -Switch to manager cluster and Execute `terraform plan` inside [`cloud-platform-infrastructure/terraform/aws-accounts/cloud-platform-aws/vpc/eks` directory](https://github.com/ministryofjustice/cloud-platform-infrastructure/tree/main/terraform/aws-accounts/cloud-platform-aws/vpc/eks/components) +Switch to manager cluster and Execute `terraform plan` inside [`cloud-platform-infrastructure/terraform/aws-accounts/cloud-platform-aws/vpc/eks` directory](https://github.com/ministryofjustice/cloud-platform-infrastructure/tree/main/terraform/aws-accounts/cloud-platform-aws/vpc/eks/core/components) to ensure changes match resources below, if they do, apply them: ``` diff --git a/runbooks/source/container-images.html.md.erb b/runbooks/source/container-images.html.md.erb index 29c7e15e..d9585e73 100644 --- a/runbooks/source/container-images.html.md.erb +++ b/runbooks/source/container-images.html.md.erb @@ -129,7 +129,7 @@ This depends on several factors, some of them are: | docker.io/grafana/grafana:10.4.0 | 🟠 | v11.1.0| [v11.1.0](https://github.com/grafana/grafana/releases/tag/v11.1.0) | [60.4.0](https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/Chart.yaml#L26) | | ministryofjustice/prometheus-ecr-exporter:0.2.0 | 🟢 | managed by us | n/a | [0.4.0](https://github.com/ministryofjustice/cloud-platform-helm-charts/blob/main/prometheus-ecr-exporter/Chart.yaml#L5) | | ghcr.io/nerdswords/yet-another-cloudwatch-exporter:v0.61.2 | 🟢 | v0.61.2 | [v0.61.2](https://github.com/nerdswords/yet-another-cloudwatch-exporter/releases) | [0.38.0](https://github.com/nerdswords/helm-charts/releases) -| quay.io/kiwigrid/k8s-sidecar:1.26.1 | 🟢 | v1.26.4 | [v1.26.4](https://github.com/kiwigrid/k8s-sidecar/releases/tag/1.26.4) | [60.4.0](https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/Chart.yaml#L26) | +| quay.io/kiwigrid/k8s-sidecar:1.26.1 | 🟢 | v1.26.2 | [v1.26.2](https://github.com/kiwigrid/k8s-sidecar/releases/tag/1.26.2) | [60.4.0](https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/Chart.yaml#L26) | | quay.io/oauth2-proxy/oauth2-proxy:v7.6.0 | 🟢 | v7.6.0 | [v7.6.0](https://github.com/oauth2-proxy/oauth2-proxy/releases/tag/v7.6.0) | [7.7.7](https://github.com/oauth2-proxy/manifests/releases/tag/oauth2-proxy-7.7.7) | | quay.io/prometheus-operator/prometheus-config-reloader:v0.72.0 | 🟢 | v0.75.0 | [v0.75.0](https://github.com/prometheus-operator/prometheus-operator/releases/tag/v0.73.0) | [60.4.0](https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/Chart.yaml#L26) | | quay.io/prometheus-operator/prometheus-operator:v0.72.0 | 🟢 | v0.75.0 | [v0.75.0](https://github.com/prometheus-operator/prometheus-operator/releases/tag/v0.75.0) | [60.4.0](https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/Chart.yaml#L26) | diff --git a/runbooks/source/creating-a-live-like.html.md.erb b/runbooks/source/creating-a-live-like.html.md.erb index 69f78561..e720af76 100644 --- a/runbooks/source/creating-a-live-like.html.md.erb +++ b/runbooks/source/creating-a-live-like.html.md.erb @@ -26,7 +26,7 @@ to the configuration similar to the live cluster. ## Installing live components and test applications -1. In [terraform/aws-accounts/cloud-platform-aws/vpc/eks/components] enable the following components: +1. In [terraform/aws-accounts/cloud-platform-aws/vpc/eks/core/components] enable the following components: * cluster_autoscaler * large_nodegroup * kibana_proxy @@ -80,4 +80,4 @@ See documentation for upgrading a [cluster](upgrade-eks-cluster.html). [cluster build pipeline]: https://concourse.cloud-platform.service.justice.gov.uk/teams/main/pipelines/create-cluster [terraform/aws-accounts/cloud-platform-aws/vpc/eks/cluster.tf]: https://github.com/ministryofjustice/cloud-platform-infrastructure/blob/main/terraform/aws-accounts/cloud-platform-aws/vpc/eks/cluster.tf -[terraform/aws-accounts/cloud-platform-aws/vpc/eks/components]: https://github.com/ministryofjustice/cloud-platform-infrastructure/blob/main/terraform/aws-accounts/cloud-platform-aws/vpc/eks/components +[terraform/aws-accounts/cloud-platform-aws/vpc/eks/core/components]: https://github.com/ministryofjustice/cloud-platform-infrastructure/blob/main/terraform/aws-accounts/cloud-platform-aws/vpc/eks/core/components diff --git a/runbooks/source/delete-cluster.html.md.erb b/runbooks/source/delete-cluster.html.md.erb index 18adfec6..654f026d 100644 --- a/runbooks/source/delete-cluster.html.md.erb +++ b/runbooks/source/delete-cluster.html.md.erb @@ -79,7 +79,7 @@ Then, from the root of a checkout of the `cloud-platform-infrastructure` reposit these commands to destroy all cluster components, and delete the terraform workspace: ``` -$ cd terraform/aws-accounts/cloud-platform-aws/vpc/eks/components +$ cd terraform/aws-accounts/cloud-platform-aws/vpc/eks/core/components $ terraform init $ terraform workspace select ${cluster} $ terraform destroy diff --git a/runbooks/source/disaster-recovery-scenarios.html.md.erb b/runbooks/source/disaster-recovery-scenarios.html.md.erb index c17895ce..d74a3454 100644 --- a/runbooks/source/disaster-recovery-scenarios.html.md.erb +++ b/runbooks/source/disaster-recovery-scenarios.html.md.erb @@ -253,7 +253,7 @@ Plan: 7 to add, 0 to change, 0 to destroy. In this scenario, terraform state can be restored from the remote_state stored in the terraform backend S3 bucket. -For example [eks/components](https://github.com/ministryofjustice/cloud-platform-infrastructure/tree/main/terraform/aws-accounts/cloud-platform-aws/vpc/eks/components) state is stored in "aws-accounts/cloud-platform-aws/vpc/eks/components" s3 bucket as defined [here-eks](https://github.com/ministryofjustice/cloud-platform-infrastructure/blob/main/terraform/aws-accounts/cloud-platform-aws/vpc/eks/components/main.tf/#L5-L14). +For example [eks/core/components](https://github.com/ministryofjustice/cloud-platform-infrastructure/tree/main/terraform/aws-accounts/cloud-platform-aws/vpc/eks/core/components) state is stored in "aws-accounts/cloud-platform-aws/vpc/eks/core/components" s3 bucket as defined [here-eks](https://github.com/ministryofjustice/cloud-platform-infrastructure/blob/main/terraform/aws-accounts/cloud-platform-aws/vpc/eks/core/components/main.tf/#L5-L14). Access the S3 bucket where the effected terraform state is stored. From the list of terraform.tfstate file versions, identify the file before the state got removed and download as terraform.tfstate. Upload the file again, this will set uploaded file as latest version. diff --git a/runbooks/source/divergence-error.html.md.erb b/runbooks/source/divergence-error.html.md.erb index d2516eb1..504ccf07 100644 --- a/runbooks/source/divergence-error.html.md.erb +++ b/runbooks/source/divergence-error.html.md.erb @@ -26,7 +26,8 @@ To run the same `terraform plan` command as the pipeline does: | divergence-kops |terraform/cloud-platform | | divergence-k8s-components | terraform/cloud-platform-components | | divergence-eks | terraform/cloud-platform-eks | -| divergence-eks-components | terraform/cloud-platform-eks/components | +| divergence-eks-core | terraform/cloud-platform-eks/core | +| divergence-eks-components | terraform/cloud-platform-eks/core/components | | divergence-networking | terraform/cloud-platform-network | * Check you're using the correct terraform workspace (e.g. `terraform workspace select live`) diff --git a/runbooks/source/upgrade-cluster-components.html.md.erb b/runbooks/source/upgrade-cluster-components.html.md.erb index 529b3cef..7b3abb8f 100644 --- a/runbooks/source/upgrade-cluster-components.html.md.erb +++ b/runbooks/source/upgrade-cluster-components.html.md.erb @@ -90,7 +90,7 @@ make run-tests 7. Once the testing is complete and integration tests are passed, create a PR to be reviewed by the team and have the module unit tests passed. After the PR is approved, merge the changes to the main branch of the module and make a release. - 8. Change the module release tag in the eks/components folder of [cloud-platform-infrastructure repo] and raise a PR. + 8. Change the module release tag in the eks/core/components folder of [cloud-platform-infrastructure repo] and raise a PR. Verify the terraform plan from the [cloud-platform-infrastructure plan pipeline] and get it reviewed by the team. 9. Once approved, merge the PR and monitor the [cloud-platform-infrastructure apply pipeline] when applying the changes. diff --git a/runbooks/source/upgrade-terraform-version.html.md.erb b/runbooks/source/upgrade-terraform-version.html.md.erb index ee294346..76887546 100644 --- a/runbooks/source/upgrade-terraform-version.html.md.erb +++ b/runbooks/source/upgrade-terraform-version.html.md.erb @@ -119,7 +119,7 @@ When all namespaces in the cloud-platform-environments repository are using the ### Infrastructure state files The Infrastructure state we have in the Cloud Platform is structured in a tree related to its dependency, -so for example, the [components](https://github.com/ministryofjustice/cloud-platform-infrastructure/tree/main/terraform/aws-accounts/cloud-platform-aws/vpc/eks/components) state (in the output below) relies heavily on the directory above and so on. +so for example, the [components](https://github.com/ministryofjustice/cloud-platform-infrastructure/tree/main/terraform/aws-accounts/cloud-platform-aws/vpc/eks/core/components) state (in the output below) relies heavily on the directory above and so on. Here is a snapshot of how our directory looks but this is likely to change: ```