Adding a new environment to the IDF is a fairly complex process, mostly because there's a lot of bootstrapping required.
The first step is to create a PR to the idf_deploy repository adding the new environment to the foundation 1-org-b tfvars list.
Note that this is a list rather than a set. Do not be tempted to add your environment in the logical place (whether that's alphabetical, or in sorted importance order, least to most). If you change any of the already-present list items, what you will do is rename existing environments, and great sadness and confusion will arise. Your new environment goes at the bottom.
Edit this data structure.
Create a PR, carefully read the outcome of "Terraform Plan" when the build action runs, correct the PR if necessary, and when it is only creating new folders and not modifying anything new, merge it.
Next you will create a PR to add new actions for each of the items you will want to create in your new project. Go to the Github Actions Directory, pick the thing that is most like the environment you wish to create (in my case, the Dev RSP was the most like the Demo RSP I was building), and copy and edit all the files to reflect the new name you want. In the case of the RSP, you will create YAML files for each of:
- alertdb
- cloudsql
- firestore
- gke
- proj
(You may not need all of these, but you will certainly want all of
cloudsql
, gke
, and proj
.)
Change the existing environment name to the new environment name.
Finally, as part of the bootstrapping process, rather than renaming, for
instance, the service key whose value is (in the source environment)
${{ secrets.PIPELINE_RSP_DEV_PROJECT }}
, that will need to be
${{ secrets.GOOGLE_CREDENTIALS }}
for the moment. That's because you
haven't created the project or any of its associated service accounts
and security keys yet, so it's necessary to bootstrap with broader
admin credentials.
Once again, test this PR by making sure the Terraform plan
output
looks OK, and then merge the PR.
Decide which environment you need to modify. In my case it's the Science Platform. You will effectively be performing the same set of operations you did with the GitHub Actions: copy an existing environment and change it to fit.
You must start with the base environment file, as this will create the project that will give you a project ID you will need for all the rest of the environment files.
This and the following instructions will only be correct for adding a new Science Platform deployment. Other types of environment (e.g. qserv) will be analogous, but specifics will differ.
I'm creating a demo
environment from dev
, so:
- the first step is to copy the dev base file to the demo base file
- Second, reset the serial number to
1
at the bottom. Not, strictly speaking, needed, but it would be sloppy not to. - Next, change the string
dev
todemo
everywhere inside it. - Fourth, replace the folder ID at the top with the ID from the folder that was created when you applied the first PR.
- Finally, pick new subnets that are not yet in use. You're going to
need two. They start at
10.128.0.0/23
, and the next unused one is (at the time of writing)10.168.0.0/23
. It is necessary to search all the environments, not just the type you're creating, to determine what is first unused (that is, all the projects, qserv, roundtable, whatever, each have their own subnets). The first one becomes thesubnet_ip
key for the first subnet. Thesecondary_ranges
subnet should increase the second octet by one and use the entire 16 bits of possible address space (so, for instance, if you picked10.168.0.0/23
forsubnet_ip
, thekubernetes-pods
should get10.169.0.0/16
. Thekubernetes-services
range should use the first subnet range, but the third octet should be16
and the width should be20
: in this example,10.168.16.0/20
.
Applying this PR will fail with 400 errors about Identity Pools. This is happening because you do not yet have a cluster for which Workload Identity would make sense. That indicates that we should move the Workload Identity resource creation into the GKE provisioning tfvars steps.
However, the project itself will be created. You will need that project ID for subsequent steps.
The next stage will take place in the Foundation
configuration. The Pipeline Serviceaccounts
main.tf will
need GKE and project accounts added, and like everything else here,
basically this boils down to "replace 'dev' with 'demo'", but now
that there is a project ID (e.g. science-platform-demo-9e05
), you can
plug that in to the project_roles
definitions.
Additionally, outputs.tf, variables.tf, and terraform.tfvars will require the addition of variables describing your new environment. You should also update readme.md with the new service accounts and email addresses you are creating. All of this is just "copy an existing definition and replace the environment name".
Check this PR; when "Terraform plan" gives what look like the right results, merge it.
At this point a project administrator will need to temporarily enable
the ability to extract service account keys. The two service accounts
just created should have their keys extracted and put into Github
secrets, as something like PIPELINE_RSP_DEMO_PROJECT
and
PIPELINE_RSP_DEMO_GKE
.
This is not the pattern we ultimately want. A future set of improvements to idf_deploy will replace this with keyless authentication.
Do the same for the additional resources you plan to deploy. For a minimal Science Platform, start with the dev GKE file, and copy it to the demo GKE file. This one is simper: all you need to do is replace the environment name and reset the serial, assuming your base and target environment have similar sets of resource requirements.
Repeat the process with cloudSQL
resources,
where you will also need to update project_id
. I did not need a
butler database yet, so I commented out that section as well. The org
policy will have to be manually removed and re-added if a butler
registry is needed: the process is remove, make change, reenable.
If you want other resources (or you are adding something besides a science platform instance, which has different requirements), edit those files in an analogous manner.
Again, create a PR, examine the Terraform output, and when happy, merge the PR.
Go back to the Workflows and update the proj
and
gke
workflows to use the new keys just created, rather than
GOOGLE_CREDENTIALS
.
Merge this PR. (If you do this earlier, the resource creation will not happen as it should, because the SAs don't have correct permissions.)
Looks like the reduced-scope tokens lack some permissions. The base env is having permission errors creating a cluster-scoped SA, and cloudsql fails with "Identity Pool does not exist" which I believe to come from the failures in the base, because we saw them earlier...but that was when we didn't have a cluster.
That's because the service account needs the Service Account Admin permissions. For the moment, modify it manually, and it goes on the backlog of stuff we need to fix.