Skip to content

Latest commit

 

History

History
108 lines (82 loc) · 3.01 KB

CHEAT-SHEET.md

File metadata and controls

108 lines (82 loc) · 3.01 KB

Common Simple Tasks on DEIS-MCC

Jobs are run through the slurm cluster-management software. Generally jobs a run via either srun (for a binary) or sbatch (for executing a bash script).

See also the sbatch documentation or the srun documentation

Interactive Shell

srun --partition naples -n1 --mem 1G --pty bash

will give you a single CPU and 1G of memory.

Common Options

  • --mem allocated memory, supports type-modifiers (e.g. --mem 15G for 15 gigabyte)
  • --exclusive allocate a entire node, allocates all the memory and cpus
  • --partition use a specific partition (should be set to naples unless you know what you are doing)
  • --time limits the execution-time, use dd:hh:mm:ss format.

See also the sbatch documentation or the srun documentation

Sbatch

You can set constant values in the top of your scripts for sbatch by prepending them to your script as follows:

#!/bin/bash
#SBATCH --time=1:05:00
#SBATCH [email protected]
#SBATCH --mail-type=FAIL
#SBATCH --partition=naples
#SBATCH --mem=15000
##SBATCH --mem=64G

echo "hello world"

Assume that the previous script is called helloworld.sh, executing sbatch helloworld.sh will allocate 15G memory on the naples-partition and send pgj and email on fail. The job will be forcefully terminated after 1 hour and 5 minutes. Use double # to comment out a Sbatch-comment, when experimenting with Sbatch options.

See running jobs

You can see all running jobs with

squeue

To see only your jobs

squeue -u $(whoami)

To investigate more details about the job, use

scontrol show jobid=$JOBID

where $JOBID id one of the ID's given by squeue.

Cancel Job(s)

You cancel a job by running

scancel $JOBID

Where $JOBID is the id given by, e.g. squeue

If you want to cancel a range of jobs (say from jobid 100 to 900), you can conveniently do so by this one-liner

scancel {100..900}

You can also cancel all of your jobs by

scancel --user=$(whoami)

Measuring time of a process

You can conveniently use /usr/bin/time to measure the performance of you binary. Just prepend the following command to your call

/usr/bin/time -f "@@@%e,%M@@@" echo "hello timing"

This will output the following:

hello timing
@@@0.00,1960@@@

which is @@@ followed by the timing in seconds and memory in kb.

We can dump this result into a file

/usr/bin/time -f "@@@%e,%M@@@" echo "hello timing" &> filename

You can conveniently pick this up with grep as follows:

grep -oP "(?<=@@@).*(?=@@@)" filename

which will give you 0.00,1960

Cancel all jobs on hold by failed dependency

Cancel jobs in state DependencyNeverSatisfied

squeue -u$(whoami) | grep DependencyNeverSatisfied |  squeue -u$(whoami) | grep Never | awk -F" " '{print $1}'  | xargs scancel