Thanks to mseitzer
for the original script!
This Python script allows to check for free Nvidia GPUs in remote servers. Additional features include to list the type of GPUs and who's using them. The idea is to speed up the work of finding a free GPU in institutions that share multiple GPU servers.
The script works by using your account to SSH into the servers and running nvidia-smi
.
- Show all free GPUs across servers
- Show all current users of all GPUs (
-l
or--list
) - Show all GPUs used by yourself (
-m
or--me
) - Resolve usernames to real names (
-f
or--finger
) - Show GPU utilization and memory usage per user per GPU(
-U
or--utilization
) - Filter users by what type their processes are(
-c
or--cuda
for cuda processes,-g
or--graphical
for graphical processes. Default: both)
- python3
- SSH access to some Linux servers with Nvidia GPUs
- If the server you connect to uses a different user name than your local name, you either have to specify your name on the servers using the
-s
option, or set up access as described in setup for convenience.
For checking for free GPUs on some server(s), simply add their address(es) after the script name. You might need to enter your password. To avoid that, follow the steps in setup for convenience.
> ./gpu_monitor.py myserver.com
Server myserver.com:
GPU 1, GeForce RTX 2080 Ti
If you have some set of servers that you regularily check, specify them in the file servers.txt
, one address per line.
Once you did that, running just ./gpu_monitor.py
checks all servers specified in this file by default.
If you want to list all GPUs and who currently uses them, you can use the -l
flag:
> ./gpu_monitor.py -lU myserver.com
Server myserver.com:
GPU 0 (GeForce RTX 2080 Ti, 0%, 23/10986 MiB): Used by gdm (23 MiB)
GPU 1 (GeForce RTX 2080 Ti, 33%, 4933/10989 MiB): Used by joe (4933 MiB)
GPU 3 (GeForce RTX 2080 Ti, 0%, 0/10989 MiB): Free
If you just want to see the GPUs used by yourself, you can use the --me
flag.
This requires that your user name is the same as remotely, or that you specify the name using the -s
flag.
> ./gpu_monitor.py --me myserver.com
Server myserver.com:
GPU 3 (GeForce RTX 2080 Ti): Used by joe
Finally, if you also want to see the real names of users, you can use the -f
flag.
This uses Linux's finger
command.
> ./gpu_monitor.py -fU myserver.com
Server myserver.com:
GPU 0 (GeForce RTX 2080 Ti, 0%, 23/10986 MiB): Used by gdm (Gnome Display Manager, 23 MiB)
GPU 1 (GeForce RTX 2080 Ti, 33%, 4933/10989 MiB): Used by joe (Joe Average, 4933 MiB)
GPU 3 (GeForce RTX 2080 Ti, 0%, 0/10989 MiB): Free
If you want to avoid having to enter your password all the time, you can setup an SSH key to login into your server. If you did this already, you are fine.
- Open a terminal and run
cd .ssh
- Run
ssh-keygen
and follow the instructions. It might be a good idea to not use the default file but to specify a specific filename reflecting the servers you are connecting to. - Run
ssh-copy-id <user>@<server>
, where<user>@<server>
is the server you want to connect. If you chose a different filename for your key, you need to pass the filename with the-i
option. - Repeat step 3 for every server you want to connect to (not necessary if you have a shared home directory on all the servers).
- Try to connect to the server using
ssh <user>@<server>
. The first time you connect, it should ask you for the password of the SSH key. If you are asked for the password multiple times, you might need to manually activate your SSH key usingssh-add <path_to_ssh_key>
. If it still does not work, follow with the next steps.
This will show you how to avoid having to give your user name if you use the script (and SSH).
- Go to the folder
.ssh
in your home and open the fileconfig
. If it is not there, create it. - Add something like this:
Host myserver.com
User myusername
If you are connecting to multiple servers under the same domain, you can also use Host *.mydomain.com
to indicate that you are using the same user name for all of them.
3. If you have an SSH key with a different name, you also add the line IdentityFile path_to_ssh_key
after the User
line.