From e9abee69b0a8f49e91ce1dd2cfa4f38f749e20e2 Mon Sep 17 00:00:00 2001 From: Mike Cole Date: Thu, 11 Jul 2024 00:31:29 +0000 Subject: [PATCH] added blog post --- ...ving-chocolatey-au-not-updating-issues.md} | 0 ...azure-app-services-slot-before-swapping.md | 34 +++++++++++++++++++ 2 files changed, 34 insertions(+) rename _posts/{2024-06-26-resolving-chocolatey-au-not-updating-issues.md => 2024-07-05-resolving-chocolatey-au-not-updating-issues.md} (100%) create mode 100644 _posts/2024-07-10-warming-azure-app-services-slot-before-swapping.md diff --git a/_posts/2024-06-26-resolving-chocolatey-au-not-updating-issues.md b/_posts/2024-07-05-resolving-chocolatey-au-not-updating-issues.md similarity index 100% rename from _posts/2024-06-26-resolving-chocolatey-au-not-updating-issues.md rename to _posts/2024-07-05-resolving-chocolatey-au-not-updating-issues.md diff --git a/_posts/2024-07-10-warming-azure-app-services-slot-before-swapping.md b/_posts/2024-07-10-warming-azure-app-services-slot-before-swapping.md new file mode 100644 index 0000000..bd9e70e --- /dev/null +++ b/_posts/2024-07-10-warming-azure-app-services-slot-before-swapping.md @@ -0,0 +1,34 @@ +--- +layout: post +title: Warming Up An Azure App Service Slot Before Swapping +date: 2024-07-10 +tags: azure dev-ops github-action +meta-description: When doing an Azure App Service slot swap deployment, it's important to warm up the staging slot before swapping for more reliable behavior. +--- + +Azure App Service deployments using slot swapping is an easy way to do zero-downtime deployments. The basic workflow is to deploy your app to the staging slot in Azure, then swap the slot with production. Azure magic ensure that the application will +cut over to the new version without a disruption as long as you've architected your application in a way that supports this. When reading about the `az webapp deployment slot swap` command, one of the first steps is -supposed- to warm up the slot +automatically. In reality what we were finding was a lengthy operation (20+ minutes) followed by a failure with no helpful details. As part of the troubleshooting process I added a step to ping the staging slot after deployment, and I saw that it +resulted in random errors, mostly `503 Service Unavailable`. This was happening on several different systems. I ended up adding a few retries and saw that the site would eventually successfully return `200 OK` after a minute or so and then the slot swap +step would work reliably within 2 minutes. It seems that when your site does not immediately return `200 OK` when starting a slot swap, the command would freak out. + +In our deployment GitHub Actions jobs, I added the following step to manually warm up the site. The `warmup_url` should point to a page on your *staging* slot. We have it pointing to a health checks endpoint so we can ensure stability before swapping. `--connect-timeout=30` will attempt to connect to the site for 30 seconds, then retry 4 times with a 30 second delay in between attempts. `-f` argument will tell your retry loop to not fail the step if it receives an http error. `--retry-all-errors` is an +aggressive form of retry which ensures a retry on pretty much everything that isn't a 200 level success code. The second `curl` statement instructs the step to fail, which will fail the entire job altogether. + +```yaml +- name: Warmup + if: inputs.warmup_url != '' + run: | + curl \ + --connect-timeout 30 \ + --retry 4 \ + --retry-delay 30 \ + -f \ + --retry-all-errors \ + ${{ inputs.warmup_url }} + curl \ + --fail-with-body \ + ${{ inputs.warmup_url }} +``` + +After adding this to all of our slot swap deployments, we consistently saw quite a bit more reliable behavior, and no more frustrating 20 minute black hole failures. \ No newline at end of file