Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix deploy #808

Merged
merged 6 commits into from
Oct 24, 2019
Merged

Fix deploy #808

merged 6 commits into from
Oct 24, 2019

Conversation

kshepard
Copy link
Member

@kshepard kshepard commented Oct 22, 2019

Overview

There were several problems with running a deployment, which have been fixed here:

  • Need to restart PostgreSQL at the beginning of the DB tasks, since it may be in a bad state if a previous provision attempt failed
  • The certbot PPA is no longer available for this version of Ubuntu, so it needed to be installed manually
  • The production.j2 template had a couple problems: the CIDR range for single IPs were invalid, and the js_html5mode variables weren't set up correctly
  • Migrations weren't being run in production, so the database wasn't being initialized properly

Checklist

  • PR has a descriptive enough title to be useful in changelogs

Demo

driver-deploy-fix

Testing Instructions

Unfortunately, the only real way to test this is to do a full deploy from scratch.

  • Clone the repo into a brand new directory and checkout this branch
  • Run: touch gradle/data/driver.keystore
  • Go to the DRIVER AWS account CloudFormation page, and create a new stack from the template in: deployment/demo-cfn-template.yaml
  • Once the resources are created, run ./scripts/generate_deployment_config and plug in the values of the public/private IP addresses for the three instances
  • Create a Route53 A record that points to the public IP of the web machine, using the domain name you selected in the previous step
  • Open the production group_vars file in an editor and change the following (note -- some of these are probably not strictly necessarily, but it's what I did, so better safe than sorry):
    • Set app_version and docker_image_tag to "2.0.4"
    • Uncomment the web_js_nominatim_key line and add a working value
    • Uncomment the monit_allow_password line and add a working value
    • Add a new language: - { id: 'fr', label: 'Français', rtl: false } (this was added in the 2.0.4 tag, so good to verify)
    • Add a valid forecast_io_api_key
    • Add a valid google_analytics_id
  • Use ssh-add to add the relevant PEM key
  • Run: ansible-galaxy install -r deployment/ansible/roles.yml
  • Run: ansible-playbook -i deployment/ansible/inventory/production --user=ubuntu deployment/ansible/database.yml deployment/ansible/app.yml deployment/ansible/celery.yml
    • Note: you may want to connect to each machine via SSH first. I've noticed ansible isn't great at prompting for accepting connections from multiple machines at once.
  • It should run to completion with no errors 🎉
  • Go to https://<your_subdomain>.roadsafety.io, and verify that you can log in
  • Delete your CloudFormation stack when you're satisfied

Closes #807
Closes #786
Closes #781

 * The single-ip CIDR values were invalid: needed to add `/32`
 * The `js_html5mode*` values weren't being used properly to populate
   the `web*` equivalents, and they ended up being blank.
Copy link
Contributor

@ddohler ddohler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't had a chance to actually test this and may not get to it today, but all of the changes here look like the fix the relevant problems. I'm still planning to test this tomorrow, but if time becomes short I'm comfortable merging as-is.

@@ -51,6 +51,5 @@
- name: Run Django migrations
command: >
/usr/bin/docker exec -i driver-app ./manage.py migrate
when: developing or staging
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense; the potential risk is that for more complex migrations, we might not want migrations to run automatically upon updating the app. However, based on the current usage patterns, I think it's probably better to assume that users may not know how to run migrations manually, so I think this is a good change. We'll need to account for this when writing future migrations, to ensure that they can all be run through a simple ./manage.py migrate.

Copy link
Member Author

@kshepard kshepard Oct 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the main problem here is that without running the migrations, performing a fresh deploy will fail when attempting to add Windshaft access roles (since the tables referenced by the role don't exist). So at a minimum, we'd need to ensure migrations are run automatically at least based on some condition (the absence of those tables?). But regardless, I think running them automatically here will be what most users will desire in the general case. Definitely do need to keep it in mind for future migrations though.

get_url: >
url=https://dl.eff.org/certbot-auto
dest=/usr/local/bin/certbot
mode=0755
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be able to be changed to:

get_url:
  url: https://dl.eff.org/certbot-auto
  dest: /usr/local/bin/certbot
  mode: 0755

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some other things I thought of here (no changes required, necessarily):

  • I noticed that this script is auto-self-updating. There is a --no-self-upgrade flag we could use to prevent that, but there doesn't seem to be any way to pin to a specific version. If we allow it to self-upgrade, that could potentially cause problems with existing instances when it tries to renew certificates if there's some kind of breaking change. On the other hand, if we disable self-upgrading, it'll still update to the latest version of certbot itself, so that doesn't seem much better, and there could still be cross-instance variation due to instances being deployed at different times. In summary, using a PPA seems preferable, so hopefully we'll have a chance to do an Ubuntu upgrade soon, but none of the options here seems clearly better.
  • This looks like it checks for and installs dependencies upon execution, so it might be worth adding another step here to run it with --os-packages-only and --install-only in order to keep any errors from happening during the Use CertBot to obtain certificate stage, which could be confusing. Not a big deal though, if anything is going to happen it'll come soon after this. I don't necessarily like the sound of it trying to self-update every time we renew a certificate, so that's another point in favor of getting back to a PPA.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call on the get_url syntax: I copied it over from another project that was using the older syntax. Updated.

And I agree that the auto-self-updating isn't the most ideal, and I'd also highly prefer switching back to the PPA when we're able to. Since it takes a couple hours to test this out, in the interest of saving time, would you be able to test out adding some of those flags when you're running through the instance setup, and push a commit if it works out?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do!

Copy link
Contributor

@ddohler ddohler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested, everything works great! I added a commit to explicitly install certbot and its dependencies.

One thing I noticed this morning is that this PR is targeted at master, but it should probably be targeted at develop in order to follow our standard deployment process.

@kshepard kshepard changed the base branch from master to develop October 24, 2019 13:14
@kshepard
Copy link
Member Author

Excellent. Thanks for testing it out and making that change.

@kshepard kshepard merged commit 0c918ac into develop Oct 24, 2019
@kshepard kshepard deleted the feature/fix-deploy branch October 24, 2019 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants