Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(sage-monorepo): document how to create an EC2 for remote development #2614

Merged
merged 8 commits into from
Apr 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,32 +1,31 @@
# Developing on a remote host

## Introduction

Team members who develop locally may not benefit from the same compute resources. The most notable
resources that can impact the productivity of developers are the number and frequency of the CPU
cores, the memory available and internet speed. The worse case is when a machine does not have the
resources to run the apps that the team develops, for example when not enough memory is available.
On other times, the time required to complete a task may be many times slower on a computer with
lower CPU resources.

Working remotely means that developers no longer benefit from the same internet speed, either
because of the quality of the internet connection available at their location or because the speed
is shared among the members of a household. As a result, tasks that involve downloading or uploading
artifacts, like pulling or pushing Docker images, may take significantly longer to complete.

This page describes how to setup a development environment that enables developers to use VS Code
while using the compute resources of a remote host. The developers start by creating identical EC2
instances before [connecting to them with VS
Code](https://code.visualstudio.com/remote/advancedcontainers/develop-remote-host). This SOP enables
developers to continue working [inside the devcontainer](#devcontainer) provided with this project,
hence further contributing to the standardization of the development envrionment.
Moreover, working remotely means that developers no longer benefit from the same internet speed,
either because of the quality of the internet connection available at their location or because the
speed is shared among the members of a household. As a result, tasks that involve downloading or
uploading artifacts, like pulling or pushing Docker images, may take significantly longer to
complete.

> **Note** 2023-01-28: Added documentation to connect to a GitHub Codespace.
This page describes how to setup a environment that enables developers to use VS Code while using
the compute resources of a remote host.

## Use case
## Motivation

This table summarizes the local compute resources available to the developers of the challenge
registry. The same information is displayed for two types of Amazon EC2 instances and one type of
GitHub Codespace instance that were selected as candidate alternative development environments for
the team members. The table also includes the runtimes in seconds of different tasks such as linting
or testing all the projects included in the monorepo (the method used to generate these results is
described in the next section).
To illustrate the benefit of developing on a remote host, this table summarizes the local compute
resources available to the developers of OpenChallenges in 2023. The same information is displayed
for two types of Amazon EC2 instances and one type of GitHub Codespace instance that were selected
as candidate alternative development environments for the team members. The table also includes the
runtimes in seconds of different tasks such as linting or testing all the projects included in the
monorepo (the method used to generate these results is described in the next section).

| | Shirou | Rin | Sakura | m5.2xlarge | t3a.xlarge | 4-core Codespace | 8-core Codespace |
| ------------------------------------------------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ---------------- | ---------------- |
Expand All @@ -45,25 +44,28 @@ described in the next section).
| On-Demand Cost ($/day) | n/a | n/a | n/a | 9.2 | 3.6 | 8.64 (1,2) | 17.28 (1,2) |
| On-Demand Cost ($/year) | n/a | n/a | n/a | 3363.8 | 1317.5 | 3153.6 (1,2) | 6307.2 (1,2) |

(1) GitHub codespaces stop automatically after 1h of inactivity. A codespace used by an engineer
with 100 %FTE and 8 working hours per day - without taking into account vacation for the sake of
simplicity - would cost 8 hours/day * 5 days/week * 52 weeks * $0.36/hour (4-core) = $748/year (see
[Codespaces pricing]). Similarly, the cost for an 8-core codespace would become $1496/year. In
addition, GitHub bills $0.07 of GB of storage.
(1) GitHub codespaces stop automatically after 1h of inactivity. A codespace used by an full-time
engineer (8h/day) - without taking into account vacation for the sake of simplicity - would cost 8
hours/day * 5 days/week * 52 weeks * $0.36/hour (4-core) = $748/year (see [Codespaces pricing]).
Similarly, the cost for an 8-core codespace would become $1496/year. In addition, GitHub bills $0.07
of GB of storage independently on whether the codespace is running or stopped. Pricing valid on
2023-12-31.

(2) GitHub offers core hours and storage. For example, a Free user can use a 2-core instance for 60
hours per month for free or an 8-core instance for 15 hours. You will be notified by email when you
have used 75%, 90%, and 100% of your included quotas.
- Free users: 120 core hours/month and 15 GB month of storage
- Pro users: 180 core hours/month and 20 GB month of storage

Note that developers have been asked to measure runtimes and internet speeds while keeping open the
applications that are usually running when they develop (e.g. Spotify, several instances of VS Code,
browser with many tabs open). This could be one reason why runtimes reported by a developer are
larger that those reported by another developer who has less compute resources available.
!!! note

The table below shows the number of times a task is faster than the slowest runtime (denoted by
"1.0").
Note that developers have been asked to measure runtimes and internet speeds while keeping open the
applications that are usually running when they develop (e.g. Spotify, several instances of VS Code,
browser with many tabs open). This could be one reason why runtimes reported by a developer are
larger that those reported by another developer who has less compute resources available.

The table below shows the number of times a task ran by a developer is faster than the slowest
runtime (denoted by "1.0").

| | Shirou | Rin | Sakura | m5.2xlarge | t3a.xlarge |
| ------------------------------------------------------ | ------------ | ------------ | ------------ | ------------ | ------------ |
Expand All @@ -82,139 +84,186 @@ instance. This table illustrates well the diversity in compute resources availab
developers, and how relying on remote hosts like EC2 instances can provide a better working
environment to developers.

### Data collection

- Runtimes are obtained from [this
commit](https://github.com/Sage-Bionetworks/sage-monorepo/tree/25f2292388d9e71bf46ba137aa530aefb571deab).
- Identification of the compute resources.
```console
$ nproc
$ cat /proc/cpuinfo
$ cat /proc/meminfo
```
- Runtimes are averaged over 10 runs that follow a warmup run using
[hyperfine](https://github.com/sharkdp/hyperfine).
```console
$ hyperfine --warmup 1 --runs 10 'nx run-many --all --target=lint --skip-nx-cache'
$ hyperfine --warmup 1 --runs 10 'nx run-many --all --target=build --skip-nx-cache'
$ hyperfine --warmup 1 --runs 10 'nx run-many --all --target=test --skip-nx-cache'
$ hyperfine --warmup 1 --runs 10 'nx test api --skip-nx-cache'
$ hyperfine --warmup 1 --runs 10 'nx test web-ui --skip-nx-cache'
```
- Internet speeds are measured with [speedtest-cli](https://www.speedtest.net/apps/cli).
```console
$ speedtest
```

## Preparing the remote host (AWS EC2)

This section describes how to instantiate an AWS EC2 as the remote host. Steps outlined below will
### Collectings OS info and benchmarking tasks

Runtimes are obtained from [this
commit](https://github.com/Sage-Bionetworks/sage-monorepo/tree/25f2292388d9e71bf46ba137aa530aefb571deab).

Identification of the compute resources.

```console
$ nproc
$ cat /proc/cpuinfo
$ cat /proc/meminfo
```

Runtimes are averaged over 10 runs that follow a warmup run using
[hyperfine](https://github.com/sharkdp/hyperfine).

```console
$ hyperfine --warmup 1 --runs 10 'nx run-many --all --target=lint --skip-nx-cache'
$ hyperfine --warmup 1 --runs 10 'nx run-many --all --target=build --skip-nx-cache'
$ hyperfine --warmup 1 --runs 10 'nx run-many --all --target=test --skip-nx-cache'
$ hyperfine --warmup 1 --runs 10 'nx test api --skip-nx-cache'
$ hyperfine --warmup 1 --runs 10 'nx test web-ui --skip-nx-cache'
```

Internet speeds are measured with [speedtest-cli](https://www.speedtest.net/apps/cli).

```console
$ speedtest
```

## Preparing the remote host - AWS EC2

This section describes how to instantiate an AWS EC2 as the remote host. Steps outlined below will
assume you have access to the Sage AWS Service Catalog.

### On the Service Catalog Portal
### Creating the EC2 instance

- Log in to the [Service Catalog](https://sc.sageit.org) with your Synapse credentials.
- From the list of Products, select **EC2: Linux Docker**. On the Product page, click on **Launch
1. Log in to the [Service Catalog](https://sc.sageit.org) with your Synapse credentials.
2. From the list of Products, select **EC2: Linux Docker**. On the Product page, click on **Launch
product** in the upper-right corner.
- On the next page, fill out the wizard as follows:
- **Provisioned product name**
- Name: `<GitHub username>-devcontainers`
- **Parameters**:
- EC2 Instance Type: `t3.2xlarge`
- Base Image: `AmazonLinuxDocker` (leave default)
- Disk Size: 80
- **Manage tags**:
- `Department`: `IBC` or `CNB` (selected from [this
list](https://github.com/Sage-Bionetworks-IT/organizations-infra/blob/master/sceptre/scipool/sc-tag-options/internal/Departments.json))
- `Project`: `challenge` (selected from [this
list](https://github.com/Sage-Bionetworks-IT/organizations-infra/blob/master/sceptre/scipool/sc-tag-options/internal/Projects.json))
- `CostCenter`: `NIH-ITCR / 101600` (selected from [these
lists](https://github.com/Sage-Bionetworks/aws-infra/tree/master/templates/tags))
- **Enable event notifications**: SKIP - DO NOT MODIFY
- Click on **Launch product**. Your instance will take anywhere between 3-5 minutes to deploy. You
3. On the next page, fill out the wizard as follows:
- **Provisioned product name**
- Name: `{GitHub username}-devcontainers-{yyyymmdd}`
- Example: `tschaffter-devcontainers-20240404`
- **Parameters**
- EC2 Instance Type: `t3a.2xlarge`
- Base Image: `AmazonLinuxDocker` (leave default)
- Disk Size: 80
- **Manage tags**
- `CostCenter`: Select the Cost Center associated to your project
- **Enable event notifications**: SKIP - DO NOT MODIFY
4. Click on **Launch product**. Your instance will take anywhere between 3-5 minutes to deploy. You
can either wait on this page until "EC2Instance" shows up on the list under Resources, or you can
leave and come back at a later time.

### On your local host

> #### Note:
> If this is your first time **ever** connecting to an instance from your machine, you will first
> need to set up EC2 access with the AWS Systems Manager (SSM). Follow the instructions below to
> complete the setup:
> - [**Create a Synapse personal access
> token**](https://help.sc.sageit.org/sc/Service-Catalog-Provisioning.938836322.html#ServiceCatalogProvisioning-CreateaSynapsepersonalaccesstoken)
> - [**SSM access to an
> Instance**](https://help.sc.sageit.org/sc/Service-Catalog-Provisioning.938836322.html#ServiceCatalogProvisioning-SSMaccesstoanInstance)
>
> (Don't worry, you will only need to do this once for your local machine!)

- In your terminal, connect to your instance following the [**Connecting to an Instance - SSM with
SSH**](https://help.sc.sageit.org/sc/Service-Catalog-Provisioning.938836322.html#ServiceCatalogProvisioning-SSMwithSSH)
instructions from the Service Catalog Provisioning doc.
- Once you can successfully login through SSM with SSH, exit the instance.
- Navigate to the Provisioned products page for your instance. Under **Events**, copy the
`EC2InstancePrivateIpAddress`
- In your terminal, add the following into your local `~/.ssh/config`:
```console
Host devcontainers
HostName <private_ip>
User ec2-user
IdentityFile ~/.ssh/id_rsa
```
- Connect to the [Sage
VPN](https://sagebionetworks.jira.com/wiki/spaces/IT/pages/1705246745/AWS+Client+VPN+User+Guide)
- In your terminal, SSH to the instance to ensure `~/.ssh/config` was setup correctly.
```console
ssh devcontainers
```

### On the EC2 instance

- Update the system packages.
```console
sudo yum update -y
```
- Docker should already be readily available on the instance. Verify this by running any Docker
command, e.g.
```console
docker --version
```
- Clone your fork into the home directory.
- To easily pull and push changes, we suggest storing your GitHub credentials onto the instance.
Follow the [**Storing GitHub credentials on the EC2
instance**](https://sagebionetworks.jira.com/wiki/spaces/APGD/pages/2590244872/Service+Catalog+Instance+Setup#Storing-GitHub-credentials-on-the-EC2-instance).
instructions to do so.

### In VS Code

- Install the extension `Remote - SSH` and `Remote - Containers`.
- `Remote-SSH: Connect to Host...` > Select the host.
- Verify that the bottom-left corner of the VSCode window shows `SSH: <host name>` upon successfully
connecting to the remote instance.

<img src="images/vscode-remote-ssh-button.png" height="24">

- `Remote-Containers: Open Folder in Container...`
- Select the project folder and click on `OK`.
- Verify that the bottom-left corner of the VSCode window shows `Dev Container: OpenChallenges @
ssh://<host name>`.

<img src="images/vscode-remote-ssh-devcontainer-button.png" height="58">

Congratulations, you are now ready to develop in the devcontainer that runs on the EC2 instance! 🚀

## Preparing the remote host (GitHub Codespace)
### Stopping the EC2 instance

It's not something you should do now as part of this tutorial. This section serves as a reminder
that AWS charges for evey hour the EC2 instance is running. As soon as you identify that you will no
longer need the instance for the rest of the day, open the Service Catalog to stop it.

1. Open the Service Catalog, then select **Provisioned products**.
2. Select the EC2 instance.
3. Click on the button **Actions** > **Service actions** > **Stop**.
4. Confirm the action.

After a few seconds, the EC2 instance will be stopped.

!!! note

AWS still charges us for the storage space that the EC2 instance takes even when it's not running.
Consider destroying the EC2 instance when you decide that you will no longer need it.

### Connecting to the EC2 instance with AWS Console

We will now use the AWS Console to open a terminal to the EC2 instance and setup your public SSH
key.

!!! note

This section assumes that you already have a public and private SSH key created on your local
machine from where you are running VS Code.

1. Open the Service Catalog, then select **Provisioned products**.
2. In the section **Resources**, click on the link for "EC2Instance".
3. Click on the checkbox of the new EC2 instance created.
4. Click on the button **Actions** > **Connect**.
5. Click on **Connect**.

### Configuring the SSH public key on the EC2 instance

6. Login as the user `ec2-user` and move to its home directory.
```console
$ sudo -s
# su ec2-user
$ cd
```
7. Create the folder `~/.ssh` (if needed).
```console
$ mkdir ~/.ssh
$ chmod 700 ~/.ssh
```
8. Create the file `~/.ssh/authorized_keys` (if needed).
```console
$ touch ~/.ssh/authorized_keys
$ chmod 644 ~/.ssh/authorized_keys
```
9. Copy and paste your public SSH key at the end of `~/.ssh/authorized_keys`.
10. Click on the button **Terminate** to terminate the session and confirm the action.

### Configuring SSH on the local machine

This section describes how to create a profile for the EC2 instance in your local `~./ssh/config`
file.

!!! note

This section assumes that you already have a public and private SSH key created on your local
machine from where you are running VS Code.

First, you need to identify the private IP address of the EC2 instance.

1. Open the Service Catalog, then select **Provisioned products**.
2. In the section **Outputs**, the private IP address is the value associated to
"EC2InstancePrivateIpAddress".

Then, on your local machine:

1. Create the file `~/.ssh/config` (if needed).
```console
$ touch ~/.ssh/config
$ chmod 600 ~/.ssh/config
```
2. Add the following content to your local `~/.ssh/config`.
```console
Host {alias}
HostName {private ip}
User ec2-user
IdentityFile {path to your private SSH key, e.g. ~/.ssh/id_rsa}
```
where the placeholder values `{...}` should be replaced with the correct values.

### Connecting to the EC2 instance with VS Code

1. Open VS Code.
2. Install the VS Code extension pack "Remote Development".
3. Open the command palette with `Ctrl+Shit+P`.
4. `Remote-SSH: Connect to Host...` > Select the host.
5. Answer the prompts

You are now connected to the EC2 instance! 🚀

!!! tip

Please remember to stop the EC2 instance at the end of your working day to save on costs.

### Next

Go to the section XXX for the instructions on how to setup your environment to contribute to Sage
Monorepo.

## Preparing the remote host - GitHub Codespace

This section describes how to open your fork of Sage Monorepo in a GitHub Codespaces instance.

!!! note

In practice, we will prefer to develop in an EC2 instance created from the Service Catalog for
security and budget reasons. Please refer to the instructions given above. Using a GitHub Codespace
has been proven to be ponctually useful for quick tests that require a fresh environment, as one of
Codespaces benefits is that they can be created and destroyed faster than EC2 instances.

1. Open your browser and go to [GitHub Codespaces].
2. Click on the "New codespace".
3. Enter the information requested:
- `Repository`: Select your fork of the monorepo
- `Branch`: Select the default branch
- `Dev container configuration`: Select the dev container definition
- `Region`: Select your preferred region
- `Machine type`: Select the machine type
> **Note** 4-core is preferred for the OpenChallenges project as a trade-off between
> performance and cost.
- **Repository**: Select your fork of the monorepo
- **Branch**: Select the default branch
- **Dev container configuration**: Select the dev container definition
- **Region**: Select your preferred region
- **Machine type**: Select the machine type
4. Click on "Create codespace".
5. Wait for the codespace to be created.
6. Configure the monorepo and install its dependencies (see README).
Expand Down
Loading
Loading