Cloud (AWS) management and deployment for Argos.
This provisions and configures infrastructure and handles deployment of updates and so on.
$ virtualenv-2.7 ~/envs/argos.cloud --no-site-packages
$ source ~/envs/argos.cloud/bin/activate
$ pip install -r requirements.txt
You will need to supply a few configuration files and keys:
Required
- The AWS key you use to authenticate on EC2 goes into
keys/
. It must have the same name as thekey_name
you specify inplaybooks/group_vars/all.yml
. For example, ifkey_name=foobar
, you must have your key atkeys/foobar.pem
. Ansible needs this to access your instances and configure them. - Configure
playbooks/group_vars/all.yml
to your needs. - Configure
playbooks/roles/app/templates/app_config.py
to your needs (this contains your application settings). - Configure
playbooks/roles/app/templates/celery_config.py
to your needs (this contains your Celery settings). - If you don't already have one, create a Boto config file at
~/.boto
with your AWS access credentials, e.g:
[Credentials]
aws_access_key_id = YOURACCESSKEY
aws_secret_access_key = YOURSECRETKEY
Optional
- Configure other playbook files (
playbooks/roles/*/files/
) - Configure provisioning variables in
playbooks/roles/*/vars/
as needed.
You might also want to add this to your SSH config to help prevent timeouts/broken pipes for especially long-running tasks:
# ~/.ssh/config
ServerAliveInterval 120
TCPKeepAlive no
The most important functionality is accessed via the manage.py
script.
For example:
# Activate the virtualenv
$ source ~/envs/argos.cloud/bin/activate
# Commission new staging environment application infrastructure
$ python manage.py staging commission
# Deploy new changes to it.
$ python manage.py staging deploy
# Update the infrastructure.
$ python manage.py staging update
# Decommission it (i.e. dismantle the infrastructure)
$ python manage.py staging decommission
# Clean it (i.e. delete the image and its instance)
$ python manage.py staging clean
# For other options, see
$ python manage.py -h
You can test that the provisioning works with the testing script:
# ./test <role>
# Example:
$ ./test app
This sets up a Vagrant VM (using a base Ubuntu 13.10 image) and then provisions it with the Ansible playbook for the specified role.
If the needs for the Argos application change, for the most part you
won't need to modify anything in the cloud/
directory (unless more
comprehensive infrastructural changes are necessary). If its a matter of
a few additional packages, for example, you should only need to modify
the playbooks in playbooks/
, or global configuration options in
playbooks/group_vars/all.yml
.
If you need to change infrastructural resources, look in formations/
.
Note that all the formation templates except for formations/image.json
are merged into a single mega-template, and because of these you should
take care not to have conflicting resource logical ids (i.e. don't call
a resource Ec2Instance
in multiple templates, better to be specific,
such as KnowledgeEc2Instance
, and BrokerEc2Instance
). The upside to
this is that you can refer to variables/resources/parameters/etc in
the other templates.
The application infrastructure runs on Amazon Web Services.
It consists of:
- a database server (AWS RDS)
- application servers (
in an autoscaling group behind a load balancerduring prototyping, just a single instance. See below.) - a knowledge server (running Apache Fuseki and Stanford NER)
- a broker server (manages distributed tasks) [currently disabled]
- worker servers (in an autoscaping group) [currently disabled]
Infrastructure creation is handled by AWS's CloudFormation in the cloud
module.
Server configurations are handled by Ansible
in playbooks/
.
At a high level, the commissioning process works like so:
First an image is baked, if one by the specified name (<app>-image
) doesn't already exist. The image is environment and configuration agnostic; it doesn't store any sensitive information, it can be reused across environments, and you don't need to worry about it if configuration options change. This is probably the longest running step.
In greater detail:
An image instance is created, which is used to generate a base template for the application and worker instances. For this image, all necessary application packages are installed, but no configuration files are copied over. The instance is automatically deleted after the image has been created from it. Images are environment agnostic, that is, they don't load any environment-specific information. Thus this image instance can be reused for different environments's infrastructures.
Then the infrastructure is created. That is, the instances, RDS instances, autoscaling groups, etc are all made. Where appropriate, the baked image is used.
As a courtesy the commission function will also try to deploy to this infrastructure (see below).
In greater detail:
- The database (RDS) is created.
- The broker instance is created [currently disabled].
An autoscaling group is created for application instancesAn application instance is created, using the image instance as the base. (See below)- An autoscaling group is created for worker instances, using the image instance as the base [currently disabled].
Once the infrastructure is up and running it is likely that your future interactions with it will primarily be deployment.
For instance, if you need to update some configuration values or ensure the application is synced with the latest changes in git.
You can make your changes to the config or to the app's git repo and then just run the deploy function. This will run the playbooks for the servers to update them.
Note that the deployment with Ansible is able to locate and identify all hosts by standardized tags (set during the commissioning process) thanks to the EC2 dynamic inventory script. Thus you don't need to, for instance, explicitly set the database host for the application config; the playbooks are setup so Ansible will find the hostname itself and fill it in.
This just deletes everything that was created during the commissioning process, except for the image (unless its deletion is specified).
If you hit a snag while generating the image, you may want to clean things up. This command will do that for you by removing the image instance (if it still exists) and the image itself (if it still exists).
For the cloud
module and the playbooks to coordinate, standardized
names and tags are used.
The cloud
module sets up the infrastructure resources with these tags
and then Ansible can use these tags to locate the resources and get
their hostnames/IPs.
Every resource is tagged with the following:
- Name:
<app>-<env>-<group>
- Group:
<group>
- Env:
<env>
- App:
<app>
With the exception of the image instance, which is tagged as just Name: <app>-image
.
The application image is environment-agnostic (no environment-specific data is baked into it) so it can be reused across environments.
To further ensure coordination between the playbooks and the cloud
module, the cloud
module uses configuration values from the playbooks' "global" variables in playbooks/group_vars/all.yml
.
<app>
is set as app_name
in the config (playbooks/group_vars/all.yml
).
<group>
is typically set by default in the CloudFormation templates (see formations/
).
<env>
is passed in when commissioning new infrastructure (see manage.py
)
Then the Name
is generated by combining these values (as shown above) in the CloudFormation templates (formations/
).
For now application autoscaling groups are on hold. Instead, a single application instance is used (it could be a high-powered instance). For prototyping this is OK. It will be one of the things to revisit if the infrastructure needs to scale (other things include setting up the app's clustering to be offloaded to workers, adding a worker autoscaling group, adding a broker instance, and so on).
Having just a single instance means everything can be configured with Ansible's default push mode instead of figuring out the best way to use its pull mode, which would make it so that autoscaling instances would configure themselves as they were spun up). I was having a lot of trouble coming up with a satisfying design for this autoscaling plus pull mode system. I was considering creating a master instance (as a git server) as part of the infrastructure which is itself configured via Ansible, such that configs are copied to it, then it generates git repositories for the autoscaling roles (app, worker). Then the init script for the autoscaling groups would just involve an ansible-pull command to the proper git repository. But this seems overly complicated and might not work the way I'm thinking it might.
So for now a single app instance is all that's needed, and is much more easily managed.
Eventually the heavier processing should be offloaded to workers (in an autoscaling group) managed by a broker instance. Particularly taxing processes such as clustering and summarization could potentially be handled by workers.