IMSE 8410 HPC Clusters

IMSE 8410 Advanced Computational Systems and Data Engineering

High Performance Computing (HPC) Clusters

"It does not fit on my laptop. Now what?"

Objectives

In this module we will learn about High Performance Computing (HPC) clusters and specifically the SLURM cluster workload management system. By the end of this module students should be able to do the following:

Describe the basics of a cluster and a workload manager.
Determine what resources are available on the cluster and their status (nodes, partitions, jobs and the queue, and priority) by using sinfo, squeue, scontrol, sshare, and sacctmgr.
Show the status and resource utilization of previous jobs (sacct)
Show summary information about cluster utilization users and accounts using sreport.
Use the srun command to run interactive jobs.
Create simple jobfiles and use sbatch to submit them.
Specify job resources (time, memory, cores, nodes, tasks).
Run multi-core and multi-node test jobs. Understand the difference between 'multi-task', 'multi-core', and 'multi-node' jobs and how to specify them.
Use the SLURM_ environment variables to show information about the current job and task.

Reading

Required Reading

High Performance Computing: Modern Systems and Practices (Chapter 5, The Essential Resource Management)
HPC Carpentry (Scheduler): https://hpc-carpentry.github.io/hpc-intro/13-scheduler/index.html

Additional Resources

SLURM Workload Manager: https://slurm.schedmd.com/quickstart.html
HPC Carpentry: https://hpc-carpentry.github.io/hpc-intro/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rc-08-Clusters.md

rc-08-Clusters.md

IMSE 8410 HPC Clusters

High Performance Computing (HPC) Clusters

Objectives

Reading

Files

rc-08-Clusters.md

Latest commit

History

rc-08-Clusters.md

File metadata and controls

IMSE 8410 HPC Clusters

High Performance Computing (HPC) Clusters

Objectives

Reading