Skip to content

Latest commit

 

History

History
89 lines (81 loc) · 6.66 KB

endpoint-network-switcharoo.md

File metadata and controls

89 lines (81 loc) · 6.66 KB

Endpoint Network Switcharoo

This runbook explains how to take a new set of virtual machines (VMs) running upgraded software for one of our endpoints, deploy them to the endpoint, then remove the existing set of virtual machines running outdated software from our endpoints...ideally with zero downtime for our customers.

Contents

  1. Prerequisites
  2. Steps
  3. Next Steps
  4. See Also

Prerequisites

This runbook is based on several assumptions that must be met.

  1. This upgrade has been approved by all relevant stakeholders.
  2. You have an Amazon Web Services (AWS) IAM user account with the necessary permissions to perform the upgrade(s) in the evm-testnet and/or evm-mainnet AWS accounts.
  3. The new virtual machine(s):
    1. Are in the correct availability zones (AZs),
    2. Have been initialized with upgraded software,
    3. Are prepared to pass health checks if they are functioning correctly, and;
    4. Will fail health checks if they are not functioning correctly.

Steps

Here are the steps to take a new set of virtual machines (VMs) running upgraded software for one of our endpoints, deploy them to the endpoint, then remove the existing set of virtual machines running outdated software from our endpoints...ideally with zero downtime for our customers.

  1. Login to the AWS web console.
  2. Switch to the intended region in AWS. 1
  3. Perform all smoke tests on the existing endpoint to verify everything is working before making any changes. 2
    • To guarantee your traffic is being served from these endpoints, use a public virtual private network (VPN) to connect to this part of the world for the smoke tests. After the smoke tests, you can disconnect.
  4. Perform all relevant smoke tests on each and every individual virtual machine to verify the new virtual machines are working as expected.
  5. EC2 > Target Groups > ${TARGET_GROUP_NAME} > Targets > Register targets 3
    • For example, if you are upgrading the testnet RPC API in the Asia-Pacific datacenter, ${TARGET_GROUP_NAME} might be evm-testnet-ap-api-tg.
  6. Under "Available instances," check the instances with the upgraded software. 4
    • Following the previous example, evm-testnet-ap-api-vm-1-v0.4.1 and evm-testnet-ap-api-vm-2-v0.4.1.
  7. Click "Include as pending below." 5
  8. Verify the upgraded virtual machines (VMs) appear in "Review targets," then click "Register pending targets." 6
  9. The "Health status" column of the new instances will be "initial." 7 Wait for the "Health status" column of the new instances to change from "initial" to "healthy." 8
    • This took about one minute for me.
    • If the health status changes to "unhealthy," or anything besides "healthy," stop here and escalate the situation! We need to investigate further.
  10. Under "Registered targets," check the VMs that are running the outdated software. 9
  11. Click "Deregister." A
  12. The "Health status" column of the old instances will be "draining." B Wait until these old instances disappear. C
    • This took a long time for me, maybe ten minutes.
    • This process is where AWS verifies no traffic is being routed to these instances before removing them.
  13. Perform a smoke test on the public endpoint to verify everything is working. D
    • To guarantee your traffic is being served from these endpoints, use a public virtual private network (VPN) to connect to this part of the world for the smoke test. After the smoke test, you can disconnect.

Repeat this process for the other regions.

Next Steps

You may want to leave the outdated virtual machines (VMs) up for 12-24 hours in case there is an issue with the upgrade and you need to rollback. The process to rollback is the same as the upgrade process, where the VMs running the previous version are added and the VMs running the current version are removed. After a sufficient amount of time with no reported issues using the upgraded software, the VMs running outdated software should be terminated to minimize costs.

See Also