-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1316 from millingw/20240229-millingw-malcolm-hand…
…over-notes added handover notes
- Loading branch information
Showing
1 changed file
with
157 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,157 @@ | ||
Cient VM Details: | ||
Vanilla Ubuntu VM on EIDF cloud (hereafter refered to as 'the client VM') | ||
Authorisation issues encounted when using podman with normal EIDF freeipa - authenticated account. | ||
To get round this, created local unix user malcolm, with sudo access | ||
Used local user malcolm for all subsequent deployment. | ||
|
||
Reserve development cluster using shared document at https://github.com/wfau/gaia-dmp/wiki/Cloud-assignments | ||
|
||
Github setup: | ||
Followed guide at https://gist.github.com/xirixiz/b6b0c6f4917ce17a90e00f9b60566278 | ||
Private, public keys stored on client VM under malcolm account's .ssh directory | ||
|
||
Created own millingw fork of gaia-dmp via github gui | ||
On client VM, clone fork into ~malcolm | ||
cd ~malcolm | ||
git clone [email protected]:millingw/gaia-dmp.git | ||
|
||
Logged into arcus horizon interface. | ||
Created application credentials for blue, data | ||
Warning! horizon lists credentials for all projects, although each credential only applies to a specific project. | ||
Name credentials clearly, blue_credential, data_credential - lost time unnecessarily due to not realising I was overwriting credentials | ||
|
||
Broadly followed (with a few caveats) notes at https://github.com/wfau/gaia-dmp/blob/c978fd081ed4d3f25367155e9d31ef297512a1c8/notes/zrq/20240207-02-test-deploy.txt | ||
|
||
Contents of aglais.env in ~malcolm on client VM | ||
|
||
AGLAIS_REPO='[email protected]:millingw/gaia-dmp.git' | ||
AGLAIS_CODE="${HOME}/gaia-dmp" | ||
PATH="${PATH}:${AGLAIS_CODE}/bin" | ||
export PATH | ||
|
||
Contents of clouds.yaml in ~malcolm | ||
|
||
clouds: | ||
iris-gaia-blue: | ||
auth: | ||
auth_url: https://arcus.openstack.hpc.cam.ac.uk:5000 | ||
application_credential_id: "4515530e7cea40f89a4c057f658fe8bc" | ||
application_credential_secret: <****> | ||
region_name: "RegionOne" | ||
interface: "public" | ||
identity_api_version: 3 | ||
auth_type: "v3applicationcredential" | ||
|
||
iris-gaia-data: | ||
auth: | ||
auth_url: https://arcus.openstack.hpc.cam.ac.uk:5000 | ||
application_credential_id: "4825e2f9d651464b8e75d1731447e313" | ||
application_credential_secret: <*******> | ||
region_name: "RegionOne" | ||
interface: "public" | ||
identity_api_version: 3 | ||
auth_type: "v3applicationcredential" | ||
|
||
|
||
Warning: | ||
Have to run ssh-agent commands each time opening a new terminal on the client VM! | ||
Perhaps due to some weirdness with the VM authentication ... | ||
|
||
Followed client software installation | ||
|
||
source "${HOME:?}/aglais.env" | ||
|
||
check $AGLAIS_CODE does point to where we have gaia-dmp checked out | ||
|
||
sanity check live host - don't proceed if it appears we are targeting live host! | ||
|
||
sourcing aglais.env should put all scripts on the path, if not then something has gone wrong | ||
|
||
create the client, first time may take a little while as things are downloaded / cached | ||
|
||
ansi-client 'blue' | ||
(following instructions all assume executing inside ansi-client) | ||
Warning: on the client VM, exiting ansi-client leaves a listener hogging port 3000, need to manually kill before restarting the client) | ||
|
||
|
||
|
||
run the deploy - this does a complete teardown and rebuild of the targeted environment. can take a while. | ||
source /deployments/hadoop-yarn/bin/deploy.sh | ||
|
||
Note: this went through a period of not working for some days, appearing like internal routing errors between the different arcus hardware nodes | ||
If the zepellin jars fail to be downloaded then the problem has reappeared | ||
Note: the arcus metadata server can have a bad day, in which case errors may appear in deleting / creating ceph mounts ... | ||
|
||
Import the test users and run tests: | ||
|
||
source /deployments/admin/bin/create-user-tools.sh | ||
import-test-users | ||
|
||
This creates a set of canned users on the cluster, with ceph volumes and example notebooks | ||
any issues with ceph will show up at this point | ||
|
||
(Could have followed instructions to run benchmarks, but didn't due to versioning issue causing known errors) | ||
|
||
If we got here, things are looking ok | ||
|
||
enable the user creation tools, and start to create our known users | ||
|
||
source /deployments/admin/bin/create-user-tools.sh | ||
import-live-users | ||
|
||
live users are created, but only in this instance (ie red, green unaffected) | ||
verbose output file created in /tmp/live-users.json | ||
various tools to inspect the users and the unix / ceph resources | ||
|
||
list-username /tmp/live-users.json | ||
list-usernames /tmp/live-users.json | ||
less /tmp/live-users.json | ||
list-linux-info /tmp/live-users.json | ||
list-shiro-info /tmp/live-users.json | ||
list-shiro-full /tmp/live-users.json | ||
list-ceph-info /tmp/live-users.json | ||
list-note-copy /tmp/live-users.json | ||
|
||
To add a new user: | ||
In a new terminal on the client VM, cd to /home/malcolm/gaia-dmp/deployments/common/users | ||
Followed Dave's notes to create branch on millingw fork | ||
https://github.com/wfau/gaia-dmp/blob/master/notes/zrq/20240212-01-git-branch.txt | ||
|
||
edit live-users.yml file | ||
add a new user at the end, e.g. (pick next sequential linuxuid) | ||
- name: "MIllingworth" | ||
type: "live" | ||
linuxuid: 10032 | ||
|
||
save, got back to terminal running ansi-client and rerun import-live-users | ||
need to watch output, this is the only opportunity to see what password has been set for the new user | ||
assuming successful, go to other terminal, commit the branch and do a pull request into main | ||
|
||
at this point, user is live on our node but not on live system | ||
need the user's passhass | ||
in the ansi-client window, display the new user's details | ||
list-shiro-full /tmp/live-users.json | ||
copy the passhash field for the new user at the end of the output | ||
|
||
now we add the details to the live system | ||
ssh to the controller on data | ||
ssh [email protected] | ||
edit the file 'passhashes' to add the user's accountname and passhash copied from the list-shiro-full output | ||
|
||
Finally, send introductory email to the user, following form of https://github.com/wfau/gaia-dmp/blob/master/docs/emails/welcome-email.txt and including new password from output above | ||
|
||
To remove a rogue user: | ||
ssh [email protected] | ||
connect to the mysql instance and remove the passhash row containing the user's name | ||
remove (or preferably comment out with note) the user from the live users file | ||
rerun create live users | ||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|