This repository contains an Ansible playbook for launching JupyterHub for ACCY 570: Data Analytics Foundations for Accountancy and ACCY 571: Statistical Analyses for Accountancy classes at the University of Illinois.
See also the INFO490 setup.
The setup is inspired by the compmodels class but there are some major differences:
- Shibboleth authentication: Jupyterhub runs behind Shibboleth (via Apache).
- Consul: Consul serves as the back-end discovery service for the Swarm cluster.
- Instead of creating creating users on the host system and using the
systemuser Docker image,
we change the ownership of the files on the host system to the
jupyter
user and mount the appropriate directory onto the singleuser Docker image. - CentOS, instead of Ubuntu.
When a user accesses the server, the following happens behind the scenes:
- First, they go to the main url for the server.
- This url actually points to an Apache proxy server which authenticates the TSL connection, and proxies the connection to Shibboleth.
- After students are autheticated by Shibboleth, they are redirected to the JupyterHub instance running on the hub server.
- The hub server is both a NFS server (to serve user's home directories) and the JupyterHub server.
JupyterHub runs in a docker container called
jupyterhub
. - When they access their server, JupyterHub creates a new docker container on one of the node servers running an IPython notebook server. This docker container is called "jupyter-username", where "username" is the user's username.
- As users open IPython notebooks and run them, they are actually communicating with one of the node servers. The URL still appears the same, because the connection is first being proxied to the hub server via the proxy server, and then proxied a second time to the node server via the JupyterHub proxy.
- Users have access to their home directory, because each node server is also a NFS client with the filesystem mounted at /home.
Any management system benefits from being run near the machines being managed. If you are running Ansible in a cloud, consider running it from a machine inside that cloud. In most cases this will work better than on the open Internet.
$ git clone https://github.com/edwardjkim/jupyterhub-accounting
$ cd jupyterhub-accounting
$ sudo yum update
$ sudo tee /etc/yum.repos.d/docker.repo <<-'EOF'
[dockerrepo]
name=Docker Repository
baseurl=https://yum.dockerproject.org/repo/main/centos/$releasever/
enabled=1
gpgcheck=1
gpgkey=https://yum.dockerproject.org/gpg
EOF
$ sudo yum install docker-engine
$ sudo service docker start
On CentOS 7.x,
$ sudo yum install python python-devel gcc openssl-devel
$ sudo yum install epel-release
$ sudo yum install python-pip
$ sudo pip install paramiko PyYAML Jinja2 httplib2 six pycrypto
$ git clone git://github.com/ansible/ansible.git --recursive
$ cd ./ansible
$ source ./hacking/env-setup
If you log out of a session, you have to do source ./hacking/env-setup
again
when you log back in, so you might want to add cd ./ansible && source ./hacking/env-setup
to
.bashrc
.
Use the example YAML files to change your server configurations.
$ cp inventory.example inventory
$ vim inventory
$ cp users.yml.example users.yml
$ vim users.yml
$ cp vars.yml.example vars.yml
$ vim vars.yml
You will need to generate three sets of key and certificate for the web server, Shibboleth, and Docker sockets.
Get signed SSL certificates from a certificate authority and edit host_vars
.
$ cp host_vars/example host_vars/proxy_server
$ vim host_vars/proxy_server
You can also generate and use self-signed certificates, but self-signed certificates may not work with some web browswers (e.g. Safari).
Generate a key and certificate to be used by the Shibboleth service provider (SP). Note that this is different from the web server certificate.
In the below commands, we will use the keygen.sh
script provided by Shibboleth.
your.host.name
is the hostname you chose for your entityID
.
These commands will create a key and cert pair, sp-key.pem
and sp-cert.pem
.
See Setting up Shibboleth for U of I.
$ ./script/keygen.sh -o certificates -h your.host.name -e https://your.host.name/shibboleth -y 10
Use SP's certificate sp-cert.pem
to register with iTrust.
You'll need to generate SSL/TLS certificates for the hub server and node servers. To do this, you can use the keymaster docker container. First, setup the certificates directory, password, and certificate authority:
$ mkdir certificates
$ touch certificates/password
$ chmod 600 certificates/password
$ cat /dev/urandom | head -c 128 | base64 > certificates/password
$ KEYMASTER="sudo docker run --rm -v $(pwd)/certificates/:/certificates/ cloudpipe/keymaster"
$ ${KEYMASTER} ca
Then, to generate a keypair for a server:
$ ${KEYMASTER} signed-keypair -n server1 -h server1.website.com -p both -s IP:192.168.0.1
For example, if you have the following in inventory
:
jupyterhub_host ansible_user=root ansible_host=123.456.78.90 private_ip=123.456.78.90
run
$ ${KEYMASTER} signed-keypair -n jupyterhub_host -h 123.456.78.90 -p both -s IP:123.456.78.90
This generate pem files in certificates directory.
Use ca.pem
, jupyterhub_host-cert.pem
, and jupyterhub_host-key.pem
to fill in the docker_ca_cert
, docker_tls_cert
, and docker_tls_key
fields in the host_vars
files.
You'll need to generate keypairs for the hub server and for each of the node servers.
You'll need to generate keypairs for the hub server and for each of the node servers.
Don't forget to edit the host_vars
files.
$ cp host_vars/example host_vars/jupyterhub_host
$ vim host_vars/jupyterhub_host
You can also use the script/assemble_certs
script to automatically copy-paste the generated
certs and keys into host_vars
files. Open up script/aseemble_certs
in a text editor, modify
the name_map
dictionary if necessary, and run ./script/aseemble_certs
.
Some files, such as SSL certificates and vars.yml
, should not be stored in plain text.
$ ansible-vault encrypt vars.yml
$ ansible-vault encrypt host_vars/proxy_server
$ ansible-vault encrypt host_vars/jupyterhub_host
$ ansible-vault encrypt host_vars/jupyterhub_node1
$ ansible-vault encrypt host_vars/jupyterhub_node2
Generate a key pair in the nfs_server
machine:
$ ssh-keygen -t rsa
Press Enter at the prompts to create a password-less SSH key with the default settings.
Transfer it to the system that will host your backups:
$ ssh-copy-id root@backupHost
Test that you can now log in without a password from your nfs_server
by issuing:
$ ssh -oHostKeyAlgorithms='ssh-rsa' root@backupHost
We can use GPG for extra security and encryption. The commands will store our keys in a hidden directory at /root/.gnupg/:
$ gpg --gen-key
Use the key to define the gpg_key
and gpg_pass
variables in vars.yml
.
If you specified ansible_host=root
in inventory
, you need to be able to log into the VM as root.
If you see the following message,
ubuntu@deploy:~/jupyterhub-accounting$ ssh root@<VM IP address>
Please login as the user "centos" rather than the user "root".
edit the /root/.ssh/authorized_keys
file in the node VM and remove everything that comes before ssh-keys
.
$ ./script/deploy
This shell script will ask for SSH passwords and ansible-vault password.