Skip to content

Commit

Permalink
Merge pull request #303 from pneerincx/fix/cleanup
Browse files Browse the repository at this point in the history
Fix and cleanup
  • Loading branch information
marieke-bijlsma authored Aug 10, 2020
2 parents 788d4b8 + b5f37d3 commit 51c3a95
Show file tree
Hide file tree
Showing 47 changed files with 257 additions and 248 deletions.
3 changes: 2 additions & 1 deletion .ansible-lint
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,6 @@ exclude_paths:
- '~/.ansible' # Exclude external playbooks.
skip_list:
# We explicitly use latest combined with other tech to pin versions (e.g. Spacewalk).
- '403' # "Package installs should not use latest".
- '403' # "Package installs should not use latest."
- '701' # "No 'galaxy_info' found in meta/main.yml of a role."
...
4 changes: 2 additions & 2 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ jobs:
cat lint_results
errors=$(grep -c '^[0-9]* [A-Z].*' lint_results)
echo '###############################################'
printf 'Counted %d ansible-lint errors.' ${errors:-0}
printf 'Counted %d ansible-lint errors.\n' ${errors:-0}
echo '###############################################'
if (( errors > 18 )); then /bin/false; fi
if (( errors > 1 )); then /bin/false; fi
...
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,9 +123,9 @@ Deploying a fully functional virtual cluster from scratch involves the following
3. Configure Ansible settings including the vault.
To create a new virtual cluster you will need ```group_vars``` and an inventory for that HPC cluster:
To create a new virtual cluster you will need ```group_vars``` and an static inventory for that HPC cluster:
* See the ```*_hosts.ini``` files for existing clusters for examples to create a new ```[name-of-the-cluster]*_hosts.ini```.
* See the ```static_inventories/*_hosts.ini``` files for existing clusters for examples to create a new ```[name-of-the-cluster]*_hosts.ini```.
* Create a ```group_vars/[name-of-the-cluster]/``` folder with a ```vars.yml```.
You'll find and example ```vars.yml``` file in ```group_vars/template/```.
To generate a new ```secrets.yml``` with new random passwords for the various daemons/components and encrypt this new ```secrets.yml``` file:
Expand Down Expand Up @@ -196,7 +196,7 @@ Deploying a fully functional virtual cluster from scratch involves the following
Some examples for the *Talos* development cluster:
* Configure the dynamic inventory and jumphost for the *Talos* test cluster:
```bash
export AI_INVENTORY='talos_hosts.ini'
export AI_INVENTORY='static_inventories/talos_hosts.ini'
export AI_PROXY='reception'
export ANSIBLE_VAULT_IDENTITY_LIST='[email protected]/vault_pass.txt.all, [email protected]/vault_pass.txt.talos'
```
Expand All @@ -206,7 +206,7 @@ Deploying a fully functional virtual cluster from scratch involves the following
. ./lor-init
lof-config talos
```
* Firstly
* Firstly,
* Create local admin accounts, which can then be used to deploy the rest of the playbook.
* Deploy the signed hosts keys.
Without local admin accounts we'll need to use either a ```root``` account for direct login or the default user account of the image used to create the VMs.
Expand Down
118 changes: 55 additions & 63 deletions cluster.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,8 @@
# Order of deployment required to prevent chicken versus the egg issues:
# 0. For all deployment phases:
# export AI_PROXY="${jumphost_name}"
# export AI_INVENTORY="${cluster_name}_hosts.ini"
# export AI_INVENTORY="static_inventories/${cluster_name}_hosts.ini"
# ANSIBLE_VAULT_PASSWORD_FILE=".vault_pass.txt.${cluster_name}"
#
# 1. Use standard CentOS cloud image user 'centos' or 'root' user and without host key checking:
# export ANSIBLE_HOST_KEY_CHECKING=False
# ansible-playbook -i inventory.py -u centos -l 'jumphost,cluster' single_role_playbooks/admin-users.yml
Expand All @@ -17,14 +16,29 @@
# ansible-playbook -i inventory.py -u [admin_account] cluster.yml
# This will configure:
# A. Jumphost first as it is required to access the other machines.
# B. SAI as it is required to
# * configure layout on shared storage devices used by other machines.
# * configure Slurm control and Slurm database.
# C. DAI
# D. UI
# E. Compute nodes
# F. Documentation server
# B. Basic roles for all cluster machines part 1:
# * Roles that do NOT require regular accounts or groups to be present.
# C. An LDAP with regular user accounts, which may be required for additional roles.
# (E.g. a chmod or chgrp for a file/folder requires the corresponding user or group to be present.)
# D. Basic roles for all cluster machines part 2:
# * Roles that DO depend on regular accounts and groups.
# E. SAI as it is required to:
# * Configure layout on shared storage devices used by other machines.
# * Configure Slurm control and Slurm database.
# F. DAI
# G. UI
# H. Compute nodes
# I. Documentation server
#

#
# Dummy play to ping jumphosts and establish a persisting SSH connection
# before trying to connect to the machines behind the jumphost,
# which may otherwise fail when SSH connection multiplexing is used.
#
- name: 'Dummy play to ping jumphosts and establish a persistent SSH connection.'
hosts: jumphost

- name: 'Sanity checks before we start.'
hosts: all
pre_tasks:
Expand All @@ -47,7 +61,7 @@
- sshd
- node_exporter
- {role: geerlingguy.security, become: true}
- grafana_proxy
- {role: grafana_proxy, when: ansible_hostname == 'airlock'}
- regular-users
tasks:
- name: 'Install cron job to reboot jumphost regularly to activate kernel updates.'
Expand All @@ -61,28 +75,45 @@
cron_file: reboot
become: true

- name: 'B. Roles for SAIs.'
- name: 'B. Basic roles for all cluster machines part 1.'
hosts:
- sys-admin-interface
- cluster
roles:
- admin-users
- ssh_host_signer
- ssh_known_hosts
- spacewalk_client
- logins
- figlet_motd
- mount-volume
- ldap
- node_exporter
- static-hostname-lookup
- cluster
- sshd
- resolver
- shared_storage
- coredumps

- name: 'C. Create LDAP account server.'
hosts:
- ldap-server
roles:
- role: openldap
when:
- use_ldap | default(true, true) | bool
- create_ldap | default(false, true) | bool

- name: 'D. Basic roles for all cluster machines part 2.'
hosts:
- cluster
roles:
- ldap # client
- sshd
- regular-users
- shared_storage

- hosts: slurm-management
- name: 'E. Roles for SAIs.'
hosts:
- sys-admin-interface
roles:
- mount-volume
- slurm-management
- prom_server
- grafana
Expand All @@ -94,70 +125,31 @@
hostname_node0: "{{ ansible_hostname }}"
ip_node0: "{{ ansible_default_ipv4['address'] }}"

- name: 'C. Roles for DAIs.'
- name: 'F. Roles for DAIs.'
hosts: deploy-admin-interface
roles:
- admin-users
- ssh_host_signer
- ssh_known_hosts
- spacewalk_client
- logins
- figlet_motd
- mount-volume
- build-environment
- ldap
- node_exporter
- static-hostname-lookup
- cluster
- sshd
- resolver
- shared_storage
- regular-users
- envsync

- name: 'D. Roles for UIs.'
- name: 'G. Roles for UIs.'
hosts: user-interface
roles:
- admin-users
- ssh_host_signer
- ssh_known_hosts
- spacewalk_client
- logins
- figlet_motd
- build-environment
- ldap
- node_exporter
- static-hostname-lookup
- cluster
- sshd
- resolver
- shared_storage
- slurm_exporter
- slurm-client
- regular-users
- sudoers
- subgroup_directories
- role: fuse-layer
when: fuse_mountpoint is defined and fuse_mountpoint | length >= 1

- name: 'E. Roles for compute nodes.'
- name: 'H. Roles for compute nodes.'
hosts: compute-vm
roles:
- admin-users
- ssh_host_signer
- ssh_known_hosts
- spacewalk_client
- logins
- figlet_motd
- mount-volume
- ldap
- node_exporter
- static-hostname-lookup
- cluster
- sshd
- resolver
- shared_storage
- slurm-client
- regular-users

- name: 'F. Roles for documentation servers.'
- name: 'I. Roles for documentation servers.'
hosts:
- docs
roles:
Expand Down
1 change: 0 additions & 1 deletion galaxy-requirements.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
---
- src: geerlingguy.firewall
version: 2.4.0
- src: geerlingguy.postfix
- src: geerlingguy.repo-epel
- src: geerlingguy.security
...
2 changes: 1 addition & 1 deletion group_vars/boxy-cluster/vars.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
slurm_cluster_name: 'boxy'
slurm_cluster_domain: 'hpc.rug.nl'
stack_prefix: 'bx'
uri_ldap: 172.23.40.249
ldap_uri: ldap://172.23.40.249
ldap_base: ou=umcg,o=asds
ldap_binddn: cn=clusteradminumcg,o=asds
regular_groups:
Expand Down
3 changes: 1 addition & 2 deletions group_vars/fender-cluster/vars.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,7 @@ ui_ethernet_interfaces:
ssh_host_signer_ca_private_key: "{{ ssh_host_signer_ca_keypair_dir }}/ca-key-production-ebi"
use_ldap: yes
create_ldap: yes
uri_ldap: fd-dai
uri_ldaps: fd-dai
ldap_uri: ldap://fd-dai
ldap_port: 389
ldaps_port: 636
ldap_base: dc=hpc,dc=rug,dc=nl
Expand Down
3 changes: 1 addition & 2 deletions group_vars/gearshift-cluster/vars.yml
Original file line number Diff line number Diff line change
Expand Up @@ -59,8 +59,7 @@ ui_ethernet_interfaces:
ssh_host_signer_ca_private_key: "{{ ssh_host_signer_ca_keypair_dir }}/umcg-hpc-ca"
use_ldap: yes
create_ldap: no
uri_ldap: '172.23.40.249'
uri_ldaps: 'comanage-in.id.rug.nl'
ldap_uri: 'ldap://172.23.40.249'
ldap_port: '389'
ldaps_port: '636'
ldap_base: 'ou=research,o=asds'
Expand Down
9 changes: 1 addition & 8 deletions group_vars/hyperchicken-cluster/vars.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,7 @@ ui_ethernet_interfaces:
ssh_host_signer_ca_private_key: "{{ ssh_host_signer_ca_keypair_dir }}/umcg-hpc-development-ca"
use_ldap: yes
create_ldap: yes
uri_ldap: hc-dai
uri_ldaps: hc-dai
ldap_uri: ldap://hc-dai
ldap_port: 389
ldaps_port: 636
ldap_base: dc=hpc,dc=rug,dc=nl
Expand Down Expand Up @@ -65,9 +64,6 @@ nameservers: [
local_admin_groups:
- 'admin'
- 'docker'
- 'solve-rd'
- 'umcg-atd'
- 'depad'
local_admin_users:
- 'centos'
- 'egon'
Expand All @@ -77,9 +73,6 @@ local_admin_users:
- 'morris'
- 'pieter'
- 'wim'
- 'umcg-atd-dm'
- 'solve-rd-dm'
- 'envsync'
envsync_user: 'envsync'
envsync_group: 'depad'
hpc_env_prefix: '/apps'
Expand Down
3 changes: 1 addition & 2 deletions group_vars/marvin-cluster/vars.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,7 @@ ui_ethernet_interfaces:
ssh_host_signer_ca_private_key: "{{ ssh_host_signer_ca_keypair_dir }}/ca-key-production-ebi"
use_ldap: yes
create_ldap: yes
uri_ldap: mv-dai
uri_ldaps: mv-dai
ldap_uri: ldap://mv-dai
ldap_port: 389
ldaps_port: 636
ldap_base: dc=ejp,dc=rd,dc=nl
Expand Down
3 changes: 1 addition & 2 deletions group_vars/nibbler-cluster/vars.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,7 @@ ui_ethernet_interfaces:
ssh_host_signer_ca_private_key: "{{ ssh_host_signer_ca_keypair_dir }}/umcg-hpc-development-ca"
use_ldap: yes
create_ldap: no
uri_ldap: ldap.pilot.scz.lab.surf.nl
uri_ldaps: ldap.pilot.scz.lab.surf.nl
ldap_uri: ldap://ldap.pilot.scz.lab.surf.nl
ldap_port: 636
ldaps_port: 636
ldap_base: o=ElixirNL,dc=pilot-clients,dc=scz,dc=lab,dc=surf,dc=nl
Expand Down
3 changes: 1 addition & 2 deletions group_vars/talos-cluster/vars.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,7 @@ ui_ethernet_interfaces:
ssh_host_signer_ca_private_key: "{{ ssh_host_signer_ca_keypair_dir }}/umcg-hpc-development-ca"
use_ldap: yes
create_ldap: no
uri_ldap: '172.23.40.249'
uri_ldaps: 'comanage-in.id.rug.nl'
ldap_uri: 'ldap://172.23.40.249'
ldap_port: '389'
ldaps_port: '636'
ldap_base: 'ou=umcg,o=asds'
Expand Down
4 changes: 3 additions & 1 deletion roles/cluster/tasks/build_lustre_client.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,7 @@
dest: '/tmp/lustre-client-dkms-2.11.0-1.el7.src.rpm'

- name: 'Build the Lustre client.'
command: rpmbuild --rebuild --without servers /tmp/lustre-client-dkms-2.11.0-1.el7.src.rpm
command:
cmd: 'rpmbuild --rebuild --without servers /tmp/lustre-client-dkms-2.11.0-1.el7.src.rpm'
creates: '/tmp/lustre-client-dkms-2.11.0-1.el7.src.rpm.rebuild'
...
12 changes: 8 additions & 4 deletions roles/cluster/tasks/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,10 @@
become: true

- name: Check if rsync >= 3.1.2 is installed on the managed hosts.
shell: |
rsync --version 2>&1 | head -n 1 | sed 's|^rsync *version *\([0-9\.]*\).*$|\1|' | tr -d '\n'
shell:
cmd: |
set -o pipefail
rsync --version 2>&1 | head -n 1 | sed 's|^rsync *version *\([0-9\.]*\).*$|\1|' | tr -d '\n'
args:
warn: no
changed_when: false
Expand All @@ -66,8 +68,10 @@
failed_when: 'rsync_version_managed_host is failed or (rsync_version_managed_host.stdout is version_compare("3.1.2", operator="<"))'

- name: Check if rsync >= 3.1.2 is installed on the control host.
shell: |
rsync --version 2>&1 | head -n 1 | sed 's|^rsync *version *\([0-9\.]*\).*$|\1|' | tr -d '\n'
shell:
cmd: |
set -o pipefail
rsync --version 2>&1 | head -n 1 | sed 's|^rsync *version *\([0-9\.]*\).*$|\1|' | tr -d '\n'
args:
warn: no
changed_when: false
Expand Down
Loading

0 comments on commit 51c3a95

Please sign in to comment.