Skip to content

Commit

Permalink
Testing connection issues on Arcus
Browse files Browse the repository at this point in the history
  • Loading branch information
Zarquan committed Feb 27, 2024
1 parent 0eacb2c commit dd9d011
Show file tree
Hide file tree
Showing 4 changed files with 645 additions and 0 deletions.
179 changes: 179 additions & 0 deletions notes/zrq/20240224-01-arcus-tests.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
#
# <meta:header>
# <meta:licence>
# Copyright (c) 2024, ROE (http://www.roe.ac.uk/)
#
# This information is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This information is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
# </meta:licence>
# </meta:header>
#
#zrq-notes-time
#zrq-notes-indent
#zrq-notes-crypto
#zrq-notes-ansible
#zrq-notes-osformat
#zrq-notes-zeppelin
#

Target:

Working with Paul Browne to diagnose issues with the Arcus cloud.
https://github.com/wfau/gaia-dmp/issues/1308
https://ucam-rcs.atlassian.net/servicedesk/customer/portal/4/HPCSSUP-67058

Result:

Work in progress ...

# -----------------------------------------------------
# Create a new branch for our test deployments.
#[user@desktop]

branchname=investigations

source "${HOME:?}/aglais.env"
pushd "${AGLAIS_CODE}"

newbranch=$(date '+%Y%m%d')-zrq-${branchname:?}

git checkout master

git checkout -b "${newbranch:?}"

git push --set-upstream 'origin' "$(git branch --show-current)"

popd


# -----------------------------------------------------
# Repair the DNS record for the red deployment.
#[user@desktop]

source "${HOME:?}/aglais.env"
ansi-client 'red'

source /deployments/admin/bin/create-user-tools.sh
ducktoken=$(getsecret 'devops.duckdns.token')

ipaddress=128.232.226.223
curl "https://www.duckdns.org/update/${cloudname:?}/${ducktoken:?}/${ipaddress:?}"


# -----------------------------------------------------
# Transfer Paul's ssh key onto the three key machines.
#[user@desktop]

sshkey="ssh-rsa AAAA....Irhz"

echo "sshkey [${sshkey}]"

echo "${sshkey}" > /tmp/pfb29.cam.ac.uk.pub

cat /tmp/pfb29.cam.ac.uk.pub

scp /tmp/pfb29.cam.ac.uk.pub \
[email protected]:.ssh/pfb29.cam.ac.uk.pub

scp /tmp/pfb29.cam.ac.uk.pub \
[email protected]:.ssh/pfb29.cam.ac.uk.pub

scp /tmp/pfb29.cam.ac.uk.pub \
[email protected]:.ssh/pfb29.cam.ac.uk.pub

scp /tmp/pfb29.cam.ac.uk.pub \
[email protected]:.ssh/pfb29.cam.ac.uk.pub


ssh [email protected]
ssh [email protected]
ssh [email protected]
ssh [email protected]


cd .ssh
cp authorized_keys authorized_keys.old

cat pfb29.cam.ac.uk.pub >> authorized_keys

cat authorized_keys


ssh [email protected]
ssh [email protected]
ssh [email protected]



ssh data.gaia-dmp.uk "date ; hostname"

curl --head 'https://object.arcus.openstack.hpc.cam.ac.uk/swift/v1/AUTH_e216e6b502134b6185380be6ccd0bf09/archive/zeppelin-0.10.1-gaia-dmp-0.1.tar.gz'


# -----------------------------------------------------

#
# Test things on sunday 25th
#

ssh desktop
[user@desktop]

ssh [email protected]

[fedora@iris-gaia-red-20240223-zeppelin ~]$

ssh data.gaia-dmp.uk "date ; hostname"

Sun 25 Feb 2024 10:35:45 PM UTC
iris-gaia-data-20220411-gitstore

curl --head 'https://object.arcus.openstack.hpc.cam.ac.uk/swift/v1/AUTH_e216e6b502134b6185380be6ccd0bf09/archive/zeppelin-0.10.1-gaia-dmp-0.1.tar.gz'

HTTP/1.1 200 OK
Content-Length: 1716996866
Accept-Ranges: bytes
....


ssh [email protected]

[fedora@iris-gaia-green-20231027-zeppelin ~]$

ssh data.gaia-dmp.uk "date ; hostname"

Sun 25 Feb 22:38:00 UTC 2024
iris-gaia-data-20220411-gitstore

curl --head 'https://object.arcus.openstack.hpc.cam.ac.uk/swift/v1/AUTH_e216e6b502134b6185380be6ccd0bf09/archive/zeppelin-0.10.1-gaia-dmp-0.1.tar.gz'

HTTP/1.1 200 OK
Content-Length: 1716996866
Accept-Ranges: bytes
....


ssh [email protected]

blue is broken
one vm from 2 days ago stuck in 'deleting'

why did blue work yesterday ?
and why does it fail today ?







87 changes: 87 additions & 0 deletions notes/zrq/20240225-01-arcus-tests.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
#
# <meta:header>
# <meta:licence>
# Copyright (c) 2024, ROE (http://www.roe.ac.uk/)
#
# This information is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This information is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
# </meta:licence>
# </meta:header>
#
#zrq-notes-time
#zrq-notes-indent
#zrq-notes-crypto
#zrq-notes-ansible
#zrq-notes-osformat
#zrq-notes-zeppelin
#

Target:

Test to see if the platform is working today.

Result:

Work in progress ...


# -----------------------------------------------------
# From previous notes [notes/zrq/20240213-01-bash-dash.txt]
# Clean deploy and import our test users.
#[user@desktop]

source "${HOME:?}/aglais.env"
ansi-client 'blue'

source /deployments/hadoop-yarn/bin/deploy.sh

> aglais:
> status:
> deployment:
> type: hadoop-yarn
> conf: zeppelin-54.86-spark-6.26.43
> name: iris-gaia-blue-20240225
> date: 20240225T225235
> hostname: zeppelin.gaia-dmp.uk
> spec:
> openstack:
> cloud:
> base: arcus
> name: iris-gaia-blue


source /deployments/admin/bin/create-user-tools.sh
import-test-users

> ....
> ....

> "msg": "
> Error mounting /user/Thozzt:
> 2024-02-25T23:42:39.818+0000 7f1c266afec0 -1
> auth: error parsing file /etc/ceph/ceph.client.iris-gaia-blue-user-Thozzt-rw.keyring:
> error setting modifier for [client.iris-gaia-blue-user-Thozzt-rw] type=key val=null:
> Malformed input [buffer:3]
> 2024-02-25T23:42:39.818+0000 7f1c266afec0 -1 auth:
> failed to load /etc/ceph/ceph.client.iris-gaia-blue-user-Thozzt-rw.keyring:
> (5) Input/output error\nmount error:
> no mds server is up or the cluster is laggy
> "

#
# Main deployment looks OK, but lots of errors with CephFS mounts.
#




Loading

0 comments on commit dd9d011

Please sign in to comment.