Skip to content

Commit

Permalink
Merge pull request #1316 from millingw/20240229-millingw-malcolm-hand…
Browse files Browse the repository at this point in the history
…over-notes

added handover notes
  • Loading branch information
Zarquan authored Feb 29, 2024
2 parents 10a3c5a + 3050c74 commit a44fe52
Showing 1 changed file with 157 additions and 0 deletions.
157 changes: 157 additions & 0 deletions notes/millingw/20240229-01-malcolm-handover-notes.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
Cient VM Details:
Vanilla Ubuntu VM on EIDF cloud (hereafter refered to as 'the client VM')
Authorisation issues encounted when using podman with normal EIDF freeipa - authenticated account.
To get round this, created local unix user malcolm, with sudo access
Used local user malcolm for all subsequent deployment.

Reserve development cluster using shared document at https://github.com/wfau/gaia-dmp/wiki/Cloud-assignments

Github setup:
Followed guide at https://gist.github.com/xirixiz/b6b0c6f4917ce17a90e00f9b60566278
Private, public keys stored on client VM under malcolm account's .ssh directory

Created own millingw fork of gaia-dmp via github gui
On client VM, clone fork into ~malcolm
cd ~malcolm
git clone [email protected]:millingw/gaia-dmp.git

Logged into arcus horizon interface.
Created application credentials for blue, data
Warning! horizon lists credentials for all projects, although each credential only applies to a specific project.
Name credentials clearly, blue_credential, data_credential - lost time unnecessarily due to not realising I was overwriting credentials

Broadly followed (with a few caveats) notes at https://github.com/wfau/gaia-dmp/blob/c978fd081ed4d3f25367155e9d31ef297512a1c8/notes/zrq/20240207-02-test-deploy.txt

Contents of aglais.env in ~malcolm on client VM

AGLAIS_REPO='[email protected]:millingw/gaia-dmp.git'
AGLAIS_CODE="${HOME}/gaia-dmp"
PATH="${PATH}:${AGLAIS_CODE}/bin"
export PATH

Contents of clouds.yaml in ~malcolm

clouds:
iris-gaia-blue:
auth:
auth_url: https://arcus.openstack.hpc.cam.ac.uk:5000
application_credential_id: "4515530e7cea40f89a4c057f658fe8bc"
application_credential_secret: <****>
region_name: "RegionOne"
interface: "public"
identity_api_version: 3
auth_type: "v3applicationcredential"

iris-gaia-data:
auth:
auth_url: https://arcus.openstack.hpc.cam.ac.uk:5000
application_credential_id: "4825e2f9d651464b8e75d1731447e313"
application_credential_secret: <*******>
region_name: "RegionOne"
interface: "public"
identity_api_version: 3
auth_type: "v3applicationcredential"


Warning:
Have to run ssh-agent commands each time opening a new terminal on the client VM!
Perhaps due to some weirdness with the VM authentication ...

Followed client software installation

source "${HOME:?}/aglais.env"

check $AGLAIS_CODE does point to where we have gaia-dmp checked out

sanity check live host - don't proceed if it appears we are targeting live host!

sourcing aglais.env should put all scripts on the path, if not then something has gone wrong

create the client, first time may take a little while as things are downloaded / cached

ansi-client 'blue'
(following instructions all assume executing inside ansi-client)
Warning: on the client VM, exiting ansi-client leaves a listener hogging port 3000, need to manually kill before restarting the client)



run the deploy - this does a complete teardown and rebuild of the targeted environment. can take a while.
source /deployments/hadoop-yarn/bin/deploy.sh

Note: this went through a period of not working for some days, appearing like internal routing errors between the different arcus hardware nodes
If the zepellin jars fail to be downloaded then the problem has reappeared
Note: the arcus metadata server can have a bad day, in which case errors may appear in deleting / creating ceph mounts ...

Import the test users and run tests:

source /deployments/admin/bin/create-user-tools.sh
import-test-users

This creates a set of canned users on the cluster, with ceph volumes and example notebooks
any issues with ceph will show up at this point

(Could have followed instructions to run benchmarks, but didn't due to versioning issue causing known errors)

If we got here, things are looking ok

enable the user creation tools, and start to create our known users

source /deployments/admin/bin/create-user-tools.sh
import-live-users

live users are created, but only in this instance (ie red, green unaffected)
verbose output file created in /tmp/live-users.json
various tools to inspect the users and the unix / ceph resources

list-username /tmp/live-users.json
list-usernames /tmp/live-users.json
less /tmp/live-users.json
list-linux-info /tmp/live-users.json
list-shiro-info /tmp/live-users.json
list-shiro-full /tmp/live-users.json
list-ceph-info /tmp/live-users.json
list-note-copy /tmp/live-users.json

To add a new user:
In a new terminal on the client VM, cd to /home/malcolm/gaia-dmp/deployments/common/users
Followed Dave's notes to create branch on millingw fork
https://github.com/wfau/gaia-dmp/blob/master/notes/zrq/20240212-01-git-branch.txt

edit live-users.yml file
add a new user at the end, e.g. (pick next sequential linuxuid)
- name: "MIllingworth"
type: "live"
linuxuid: 10032

save, got back to terminal running ansi-client and rerun import-live-users
need to watch output, this is the only opportunity to see what password has been set for the new user
assuming successful, go to other terminal, commit the branch and do a pull request into main

at this point, user is live on our node but not on live system
need the user's passhass
in the ansi-client window, display the new user's details
list-shiro-full /tmp/live-users.json
copy the passhash field for the new user at the end of the output

now we add the details to the live system
ssh to the controller on data
ssh [email protected]
edit the file 'passhashes' to add the user's accountname and passhash copied from the list-shiro-full output

Finally, send introductory email to the user, following form of https://github.com/wfau/gaia-dmp/blob/master/docs/emails/welcome-email.txt and including new password from output above

To remove a rogue user:
ssh [email protected]
connect to the mysql instance and remove the passhash row containing the user's name
remove (or preferably comment out with note) the user from the live users file
rerun create live users










0 comments on commit a44fe52

Please sign in to comment.