Fix deploy #808

kshepard · 2019-10-22T20:32:36Z

Overview

There were several problems with running a deployment, which have been fixed here:

Need to restart PostgreSQL at the beginning of the DB tasks, since it may be in a bad state if a previous provision attempt failed
The certbot PPA is no longer available for this version of Ubuntu, so it needed to be installed manually
The production.j2 template had a couple problems: the CIDR range for single IPs were invalid, and the js_html5mode variables weren't set up correctly
Migrations weren't being run in production, so the database wasn't being initialized properly

Checklist

PR has a descriptive enough title to be useful in changelogs

Demo

Testing Instructions

Unfortunately, the only real way to test this is to do a full deploy from scratch.

Clone the repo into a brand new directory and checkout this branch
Run: touch gradle/data/driver.keystore
Go to the DRIVER AWS account CloudFormation page, and create a new stack from the template in: deployment/demo-cfn-template.yaml
Once the resources are created, run ./scripts/generate_deployment_config and plug in the values of the public/private IP addresses for the three instances
Create a Route53 A record that points to the public IP of the web machine, using the domain name you selected in the previous step
Open the production group_vars file in an editor and change the following (note -- some of these are probably not strictly necessarily, but it's what I did, so better safe than sorry):
- Set app_version and docker_image_tag to "2.0.4"
- Uncomment the web_js_nominatim_key line and add a working value
- Uncomment the monit_allow_password line and add a working value
- Add a new language: - { id: 'fr', label: 'Français', rtl: false } (this was added in the 2.0.4 tag, so good to verify)
- Add a valid forecast_io_api_key
- Add a valid google_analytics_id
Use ssh-add to add the relevant PEM key
Run: ansible-galaxy install -r deployment/ansible/roles.yml
Run: ansible-playbook -i deployment/ansible/inventory/production --user=ubuntu deployment/ansible/database.yml deployment/ansible/app.yml deployment/ansible/celery.yml
- Note: you may want to connect to each machine via SSH first. I've noticed ansible isn't great at prompting for accepting connections from multiple machines at once.
It should run to completion with no errors 🎉
Go to https://<your_subdomain>.roadsafety.io, and verify that you can log in
Delete your CloudFormation stack when you're satisfied

Closes #807
Closes #786
Closes #781

* The single-ip CIDR values were invalid: needed to add `/32` * The `js_html5mode*` values weren't being used properly to populate the `web*` equivalents, and they ended up being blank.

ddohler

I haven't had a chance to actually test this and may not get to it today, but all of the changes here look like the fix the relevant problems. I'm still planning to test this tomorrow, but if time becomes short I'm comfortable merging as-is.

ddohler · 2019-10-23T15:52:09Z

deployment/ansible/roles/driver.app/tasks/main.yml

@@ -51,6 +51,5 @@
 - name: Run Django migrations
  command: >
    /usr/bin/docker exec -i driver-app ./manage.py migrate
-  when: developing or staging


I think this makes sense; the potential risk is that for more complex migrations, we might not want migrations to run automatically upon updating the app. However, based on the current usage patterns, I think it's probably better to assume that users may not know how to run migrations manually, so I think this is a good change. We'll need to account for this when writing future migrations, to ensure that they can all be run through a simple ./manage.py migrate.

Yeah, the main problem here is that without running the migrations, performing a fresh deploy will fail when attempting to add Windshaft access roles (since the tables referenced by the role don't exist). So at a minimum, we'd need to ensure migrations are run automatically at least based on some condition (the absence of those tables?). But regardless, I think running them automatically here will be what most users will desire in the general case. Definitely do need to keep it in mind for future migrations though.

ddohler · 2019-10-23T15:59:43Z

deployment/ansible/roles/driver.letsencrypt/tasks/main.yml

+  get_url: >
+    url=https://dl.eff.org/certbot-auto
+    dest=/usr/local/bin/certbot
+    mode=0755


This should be able to be changed to:

get_url: url: https://dl.eff.org/certbot-auto dest: /usr/local/bin/certbot mode: 0755

Some other things I thought of here (no changes required, necessarily):

I noticed that this script is auto-self-updating. There is a --no-self-upgrade flag we could use to prevent that, but there doesn't seem to be any way to pin to a specific version. If we allow it to self-upgrade, that could potentially cause problems with existing instances when it tries to renew certificates if there's some kind of breaking change. On the other hand, if we disable self-upgrading, it'll still update to the latest version of certbot itself, so that doesn't seem much better, and there could still be cross-instance variation due to instances being deployed at different times. In summary, using a PPA seems preferable, so hopefully we'll have a chance to do an Ubuntu upgrade soon, but none of the options here seems clearly better.

This looks like it checks for and installs dependencies upon execution, so it might be worth adding another step here to run it with --os-packages-only and --install-only in order to keep any errors from happening during the Use CertBot to obtain certificate stage, which could be confusing. Not a big deal though, if anything is going to happen it'll come soon after this. I don't necessarily like the sound of it trying to self-update every time we renew a certificate, so that's another point in favor of getting back to a PPA.

Good call on the get_url syntax: I copied it over from another project that was using the older syntax. Updated.

And I agree that the auto-self-updating isn't the most ideal, and I'd also highly prefer switching back to the PPA when we're able to. Since it takes a couple hours to test this out, in the interest of saving time, would you be able to test out adding some of those flags when you're running through the instance setup, and push a commit if it works out?

ddohler

Tested, everything works great! I added a commit to explicitly install certbot and its dependencies.

One thing I noticed this morning is that this PR is targeted at master, but it should probably be targeted at develop in order to follow our standard deployment process.

kshepard · 2019-10-24T13:15:05Z

Excellent. Thanks for testing it out and making that change.

kshepard added 4 commits October 22, 2019 14:59

Restart PostgreSQL at beginning of driver.database tasks

f1a9551

Install cerbot without PPA

0fc3800

Don't restrict migrations to developing/staging

4f4dcad

Fix problems in production.js file

807d2dc

* The single-ip CIDR values were invalid: needed to add `/32` * The `js_html5mode*` values weren't being used properly to populate the `web*` equivalents, and they ended up being blank.

kshepard requested a review from ddohler October 22, 2019 20:32

kshepard assigned ddohler Oct 22, 2019

ddohler approved these changes Oct 23, 2019

View reviewed changes

kshepard and others added 2 commits October 23, 2019 15:14

Update get_url syntax

20106bc

Explicitly install Certbot and dependencies

496f4ad

ddohler approved these changes Oct 24, 2019

View reviewed changes

kshepard changed the base branch from master to develop October 24, 2019 13:14

kshepard merged commit 0c918ac into develop Oct 24, 2019

kshepard deleted the feature/fix-deploy branch October 24, 2019 13:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix deploy #808

Fix deploy #808

kshepard commented Oct 22, 2019 •

edited

Loading

ddohler left a comment

ddohler Oct 23, 2019

kshepard Oct 23, 2019 •

edited

Loading

ddohler Oct 23, 2019

ddohler Oct 23, 2019

kshepard Oct 23, 2019

ddohler Oct 23, 2019

ddohler left a comment •

edited

Loading

kshepard commented Oct 24, 2019

Fix deploy #808

Fix deploy #808

Conversation

kshepard commented Oct 22, 2019 • edited Loading

Overview

Checklist

Demo

Testing Instructions

ddohler left a comment

Choose a reason for hiding this comment

ddohler Oct 23, 2019

Choose a reason for hiding this comment

kshepard Oct 23, 2019 • edited Loading

Choose a reason for hiding this comment

ddohler Oct 23, 2019

Choose a reason for hiding this comment

ddohler Oct 23, 2019

Choose a reason for hiding this comment

kshepard Oct 23, 2019

Choose a reason for hiding this comment

ddohler Oct 23, 2019

Choose a reason for hiding this comment

ddohler left a comment • edited Loading

Choose a reason for hiding this comment

kshepard commented Oct 24, 2019

kshepard commented Oct 22, 2019 •

edited

Loading

kshepard Oct 23, 2019 •

edited

Loading

ddohler left a comment •

edited

Loading