From b7093feda280085fafe7980837ae0a16dc9ba06e Mon Sep 17 00:00:00 2001 From: hyandt Date: Tue, 3 Oct 2023 13:58:41 -0600 Subject: [PATCH 1/4] Initial partitions table --- docs/Documentation/Systems/Kestrel/running.md | 26 +++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/docs/Documentation/Systems/Kestrel/running.md b/docs/Documentation/Systems/Kestrel/running.md index 3cc6192e0..987bb05d7 100644 --- a/docs/Documentation/Systems/Kestrel/running.md +++ b/docs/Documentation/Systems/Kestrel/running.md @@ -1,3 +1,29 @@ --- title: Running on Kestrel --- +# Kestrel Job Partitions and Scheduling Policies +*Learn about job partitions and policies for scheduling jobs on Eagle.* + +## Partitions + +Kestrel nodes are associated with one or more partitions. Each partition is associated with one or more job characteristics, which include run time, per-node memory requirements, per-node local scratch disk requirements, and whether graphics processing units (GPUs) are needed. + +Jobs will be automatically routed to the appropriate partitions by Slurm based on node quantity, walltime, hardware features, and other aspects specified in the submission. Jobs will have access to the largest number of nodes, thus shortest wait, **if the partition is not specified during job submission.** + +The following table summarizes the partitions on Eagle. + +| Partition Name | Description | Limits | Placement Condition | +| -------------- | ------------- | ------ | ------------------- | +| ```debug``` | Nodes dedicated to developing and
troubleshooting jobs. Debug nodes
with each of the non-standard
hardware configurations are available.
The node-type distribution is:
- 4 GPU nodes
- 2 Bigmem nodes
- 7 standard nodes
- **13 total nodes** | 1 job with a
max of 2 nodes
per user
01:00:00 max walltime | ```-p debug```
or
```--partition=debug``` | +|```short``` | Nodes that prefer jobs with walltimes <= 4 hours | No partition limit.
No limit per user. | ```--time <= 4:00:00```
```--mem <= 85248 (1800 nodes)```
```--mem <= 180224 (720 nodes)```| +| ```standard``` | Nodes that prefer jobs with walltimes <= 2 days | 2100 nodes total
1050 nodes per user | ```--time <= 2-00```
```--mem <= 85248 (1800 nodes)```
```--mem <= 180224 (720 nodes)```| +| ```long``` | Nodes that prefer jobs with walltimes > 2 days
*Maximum walltime of any job is 10 days*| 525 nodes total
262 nodes per user| ```--time <= 10-00```
```--mem <= 85248 (1800 nodes)```
```--mem <= 180224 (720 nodes)```| +|```bigmem``` | Nodes that have 768 GB of RAM | 90 nodes total
45 nodes per user | ```--mem > 180224``` | + + +Use the option listed above on the ```srun```, ```sbatch```, or ```salloc``` command or in your job script to specify what resources your job requires. Sample job scripts and the syntax for specifying the queue are available on the [sample job scripts page](./sample_sbatch.md). + +## Job Scheduling Policies +The [system configuration page](https://www.nrel.gov/hpc/eagle-system-configuration.html) lists the four categories that Eagle nodes exhibit based on their hardware features. No single user can have jobs running on more than half of the nodes from each hardware category. For example, the maximum quantity of data and analysis visualization (DAV) nodes a single job can use is 25. + +Also learn how [jobs are prioritized](./eagle_job_priorities.md). \ No newline at end of file From 2b7071fd3728930a7b14373d3988722af47812e1 Mon Sep 17 00:00:00 2001 From: hyandt Date: Tue, 17 Oct 2023 12:19:50 -0600 Subject: [PATCH 2/4] partitions table limits --- docs/Documentation/Systems/Kestrel/running.md | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/docs/Documentation/Systems/Kestrel/running.md b/docs/Documentation/Systems/Kestrel/running.md index 987bb05d7..1f1d083fc 100644 --- a/docs/Documentation/Systems/Kestrel/running.md +++ b/docs/Documentation/Systems/Kestrel/running.md @@ -2,7 +2,8 @@ title: Running on Kestrel --- # Kestrel Job Partitions and Scheduling Policies -*Learn about job partitions and policies for scheduling jobs on Eagle.* + +*Learn about job partitions and policies for scheduling jobs on Kestrel.* ## Partitions @@ -10,20 +11,24 @@ Kestrel nodes are associated with one or more partitions. Each partition is ass Jobs will be automatically routed to the appropriate partitions by Slurm based on node quantity, walltime, hardware features, and other aspects specified in the submission. Jobs will have access to the largest number of nodes, thus shortest wait, **if the partition is not specified during job submission.** -The following table summarizes the partitions on Eagle. +The following table summarizes the partitions on Kestrel. | Partition Name | Description | Limits | Placement Condition | | -------------- | ------------- | ------ | ------------------- | -| ```debug``` | Nodes dedicated to developing and
troubleshooting jobs. Debug nodes
with each of the non-standard
hardware configurations are available.
The node-type distribution is:
- 4 GPU nodes
- 2 Bigmem nodes
- 7 standard nodes
- **13 total nodes** | 1 job with a
max of 2 nodes
per user
01:00:00 max walltime | ```-p debug```
or
```--partition=debug``` | -|```short``` | Nodes that prefer jobs with walltimes <= 4 hours | No partition limit.
No limit per user. | ```--time <= 4:00:00```
```--mem <= 85248 (1800 nodes)```
```--mem <= 180224 (720 nodes)```| -| ```standard``` | Nodes that prefer jobs with walltimes <= 2 days | 2100 nodes total
1050 nodes per user | ```--time <= 2-00```
```--mem <= 85248 (1800 nodes)```
```--mem <= 180224 (720 nodes)```| +| ```debug``` | Nodes dedicated to developing and
troubleshooting jobs. Debug nodes with each of the non-standard hardware configurations are available. The node-type distribution is:
- 2 Bigmem nodes
- 2 nodes with 1.7 TB NVMe
- 4 standard nodes
- **8 total nodes** | - 1 job with a
max of 2 nodes
per user
- 01:00:00 max walltime | ```-p debug```
or
```--partition=debug``` | +|```short``` | Nodes that prefer jobs with walltimes <= 4 hours | No partition limit.
No limit per user. | ```--time <= 4:00:00```
```--mem <= 250000```
```--tmp<=1700000 (256 nodes)```
2106 nodes | +| ```standard``` | Nodes that prefer jobs with walltimes <= 2 days | 2100 nodes total
1050 nodes per user | ```--mem <= 250000```
```--tmp<=1700000````| +|```bigmem``` | Nodes that have 2 TB of RAM with jobs of walltimes <= 2 days | 8 nodes total
45 nodes per user | ```--mem > 2000000``` ```--time <= 2-00`` ```--tmp > 5800000``` | +|```bigmeml``` | Nodes that have 2 TB of RAM and jobs with walltimes >2 days | 4 nodes total
3 nodes per user | ```--mem > 180224``` | | ```long``` | Nodes that prefer jobs with walltimes > 2 days
*Maximum walltime of any job is 10 days*| 525 nodes total
262 nodes per user| ```--time <= 10-00```
```--mem <= 85248 (1800 nodes)```
```--mem <= 180224 (720 nodes)```| -|```bigmem``` | Nodes that have 768 GB of RAM | 90 nodes total
45 nodes per user | ```--mem > 180224``` | + + + Use the option listed above on the ```srun```, ```sbatch```, or ```salloc``` command or in your job script to specify what resources your job requires. Sample job scripts and the syntax for specifying the queue are available on the [sample job scripts page](./sample_sbatch.md). ## Job Scheduling Policies -The [system configuration page](https://www.nrel.gov/hpc/eagle-system-configuration.html) lists the four categories that Eagle nodes exhibit based on their hardware features. No single user can have jobs running on more than half of the nodes from each hardware category. For example, the maximum quantity of data and analysis visualization (DAV) nodes a single job can use is 25. +The [Kestrel system configuration page](https://www.nrel.gov/hpc/kestrel-system-configuration.html) lists the four categories that Kestrel nodes exhibit based on their hardware features. No single user can have jobs running on more than half of the nodes from each hardware category. For example, the maximum quantity of data and analysis visualization (DAV) nodes a single job can use is 25. Also learn how [jobs are prioritized](./eagle_job_priorities.md). \ No newline at end of file From f0328fafa29b77e6382bf4c204222ec714e8077d Mon Sep 17 00:00:00 2001 From: hyandt Date: Tue, 17 Oct 2023 14:33:50 -0600 Subject: [PATCH 3/4] formatting --- docs/Documentation/Systems/Kestrel/running.md | 24 ++++++++++--------- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/docs/Documentation/Systems/Kestrel/running.md b/docs/Documentation/Systems/Kestrel/running.md index 1f1d083fc..ce43a3513 100644 --- a/docs/Documentation/Systems/Kestrel/running.md +++ b/docs/Documentation/Systems/Kestrel/running.md @@ -11,19 +11,17 @@ Kestrel nodes are associated with one or more partitions. Each partition is ass Jobs will be automatically routed to the appropriate partitions by Slurm based on node quantity, walltime, hardware features, and other aspects specified in the submission. Jobs will have access to the largest number of nodes, thus shortest wait, **if the partition is not specified during job submission.** -The following table summarizes the partitions on Kestrel. +The following table summarizes the partitions on Kestrel: + | Partition Name | Description | Limits | Placement Condition | | -------------- | ------------- | ------ | ------------------- | -| ```debug``` | Nodes dedicated to developing and
troubleshooting jobs. Debug nodes with each of the non-standard hardware configurations are available. The node-type distribution is:
- 2 Bigmem nodes
- 2 nodes with 1.7 TB NVMe
- 4 standard nodes
- **8 total nodes** | - 1 job with a
max of 2 nodes
per user
- 01:00:00 max walltime | ```-p debug```
or
```--partition=debug``` | -|```short``` | Nodes that prefer jobs with walltimes <= 4 hours | No partition limit.
No limit per user. | ```--time <= 4:00:00```
```--mem <= 250000```
```--tmp<=1700000 (256 nodes)```
2106 nodes | -| ```standard``` | Nodes that prefer jobs with walltimes <= 2 days | 2100 nodes total
1050 nodes per user | ```--mem <= 250000```
```--tmp<=1700000````| -|```bigmem``` | Nodes that have 2 TB of RAM with jobs of walltimes <= 2 days | 8 nodes total
45 nodes per user | ```--mem > 2000000``` ```--time <= 2-00`` ```--tmp > 5800000``` | -|```bigmeml``` | Nodes that have 2 TB of RAM and jobs with walltimes >2 days | 4 nodes total
3 nodes per user | ```--mem > 180224``` | -| ```long``` | Nodes that prefer jobs with walltimes > 2 days
*Maximum walltime of any job is 10 days*| 525 nodes total
262 nodes per user| ```--time <= 10-00```
```--mem <= 85248 (1800 nodes)```
```--mem <= 180224 (720 nodes)```| - - - +| ```debug``` | Nodes dedicated to developing and
troubleshooting jobs. Debug nodes
with each of the non-standard
hardware configurations are available.
The node-type distribution is:
- 2 Bigmem nodes
- 2 nodes with 1.7 TB NVMe
- 4 standard nodes
- **8 total nodes** | 1 job with a
max of 2 nodes
per user
01:00:00 max walltime | ```-p debug```
or
```--partition=debug``` | +|```short``` | Nodes that prefer jobs with walltimes <= 4 hours | 2016 nodes total.
No limit per user. | ```--time <= 4:00:00```
```--mem <= 250000```
```--tmp<=1700000 (256 nodes)```| +| ```standard``` | Nodes that prefer jobs with walltimes <= 2 days. | 2106 nodes total.
1050 nodes per user. | ```--mem <= 250000```
```--tmp <= 1700000```| +| ```long``` | Nodes that prefer jobs with walltimes > 2 days.
*Maximum walltime of any job is 10 days*| 525 nodes total
262 nodes per user| ```--time <= 10-00```
```--mem <= 250000```
```--tmp <= 1700000 (256 nodes)```| +|```bigmem``` | Nodes that have 2 TB of RAM and 5.8 TB NVMe local disk. | 8 nodes total
4 nodes per user | ```--mem > 250000```
```--time <= 2-00```
```--tmp > 1700000 ``` | +|```bigmeml``` | Bigmem nodes that prefer jobs with walltimes > 2 days.
*Maximum walltime of any job is 10 days* | 4 nodes total
3 nodes per user | ```--mem > 250000```
```--time > 2-00```
```--tmp > 1700000 ``` | Use the option listed above on the ```srun```, ```sbatch```, or ```salloc``` command or in your job script to specify what resources your job requires. Sample job scripts and the syntax for specifying the queue are available on the [sample job scripts page](./sample_sbatch.md). @@ -31,4 +29,8 @@ Use the option listed above on the ```srun```, ```sbatch```, or ```salloc``` com ## Job Scheduling Policies The [Kestrel system configuration page](https://www.nrel.gov/hpc/kestrel-system-configuration.html) lists the four categories that Kestrel nodes exhibit based on their hardware features. No single user can have jobs running on more than half of the nodes from each hardware category. For example, the maximum quantity of data and analysis visualization (DAV) nodes a single job can use is 25. -Also learn how [jobs are prioritized](./eagle_job_priorities.md). \ No newline at end of file +Also learn how [jobs are prioritized](./eagle_job_priorities.md). + + +## Job Submission Recommendations + From 4620c123b55abe8311794ef80861c834220cbb66 Mon Sep 17 00:00:00 2001 From: hyandt Date: Wed, 18 Oct 2023 16:49:35 -0600 Subject: [PATCH 4/4] partitions table --- docs/Documentation/Systems/Kestrel/running.md | 37 +++++++++++++++---- 1 file changed, 30 insertions(+), 7 deletions(-) diff --git a/docs/Documentation/Systems/Kestrel/running.md b/docs/Documentation/Systems/Kestrel/running.md index ce43a3513..7ce67483f 100644 --- a/docs/Documentation/Systems/Kestrel/running.md +++ b/docs/Documentation/Systems/Kestrel/running.md @@ -7,30 +7,53 @@ title: Running on Kestrel ## Partitions -Kestrel nodes are associated with one or more partitions. Each partition is associated with one or more job characteristics, which include run time, per-node memory requirements, per-node local scratch disk requirements, and whether graphics processing units (GPUs) are needed. +Kestrel nodes are associated with one or more partitions. Each partition is associated with one or more job characteristics, which include run time, per-node memory requirements, and per-node local scratch disk requirements. Jobs will be automatically routed to the appropriate partitions by Slurm based on node quantity, walltime, hardware features, and other aspects specified in the submission. Jobs will have access to the largest number of nodes, thus shortest wait, **if the partition is not specified during job submission.** +The [Kestrel system configuration page](https://www.nrel.gov/hpc/kestrel-system-configuration.html) lists the four categories that Kestrel nodes exhibit based on their hardware features. + The following table summarizes the partitions on Kestrel: | Partition Name | Description | Limits | Placement Condition | | -------------- | ------------- | ------ | ------------------- | | ```debug``` | Nodes dedicated to developing and
troubleshooting jobs. Debug nodes
with each of the non-standard
hardware configurations are available.
The node-type distribution is:
- 2 Bigmem nodes
- 2 nodes with 1.7 TB NVMe
- 4 standard nodes
- **8 total nodes** | 1 job with a
max of 2 nodes
per user
01:00:00 max walltime | ```-p debug```
or
```--partition=debug``` | -|```short``` | Nodes that prefer jobs with walltimes <= 4 hours | 2016 nodes total.
No limit per user. | ```--time <= 4:00:00```
```--mem <= 250000```
```--tmp<=1700000 (256 nodes)```| +|```short``` | Nodes that prefer jobs with walltimes <= 4 hours | 2016 nodes total.
No limit per user. | ```--time <= 4:00:00```
```--mem <= 250000```
```--tmp <= 1700000 (256 nodes)```| | ```standard``` | Nodes that prefer jobs with walltimes <= 2 days. | 2106 nodes total.
1050 nodes per user. | ```--mem <= 250000```
```--tmp <= 1700000```| | ```long``` | Nodes that prefer jobs with walltimes > 2 days.
*Maximum walltime of any job is 10 days*| 525 nodes total
262 nodes per user| ```--time <= 10-00```
```--mem <= 250000```
```--tmp <= 1700000 (256 nodes)```| |```bigmem``` | Nodes that have 2 TB of RAM and 5.8 TB NVMe local disk. | 8 nodes total
4 nodes per user | ```--mem > 250000```
```--time <= 2-00```
```--tmp > 1700000 ``` | |```bigmeml``` | Bigmem nodes that prefer jobs with walltimes > 2 days.
*Maximum walltime of any job is 10 days* | 4 nodes total
3 nodes per user | ```--mem > 250000```
```--time > 2-00```
```--tmp > 1700000 ``` | -Use the option listed above on the ```srun```, ```sbatch```, or ```salloc``` command or in your job script to specify what resources your job requires. Sample job scripts and the syntax for specifying the queue are available on the [sample job scripts page](./sample_sbatch.md). +Use the option listed above on the ```srun```, ```sbatch```, or ```salloc``` command or in your job script to specify what resources your job requires. -## Job Scheduling Policies -The [Kestrel system configuration page](https://www.nrel.gov/hpc/kestrel-system-configuration.html) lists the four categories that Kestrel nodes exhibit based on their hardware features. No single user can have jobs running on more than half of the nodes from each hardware category. For example, the maximum quantity of data and analysis visualization (DAV) nodes a single job can use is 25. +!!! note + For now, more information on Slurm and job submission script examples can be found under the [Eagle Running Jobs section](../Eagle/Running/index.md). + -Also learn how [jobs are prioritized](./eagle_job_priorities.md). +## Job Submission Recommendations +#### OpenMP + +When running codes with OpenMP enabled, we recommend manually setting one of the following environment variables: + +``` + +export OMP_PROC_BIND=spread # for non-intel built codes + +export KMP_AFFINITY=balanced # for codes built with intel compilers + +``` +You may need to export these variables even if you are not running your job with threading, i.e., with `OMP_NUM_THREADS=1` + +#### Scaling + +Currently, some applications on Kestrel are not scaling with the expected performance. For these applications, we recommend: + +1. Submitting jobs with the fewest number of nodes possible. + +1. For hybrid MPI/OpenMP codes, requesting more threads per task than you tend to request on Eagle. This may yield performance improvements. +1. Building and running with Intel MPI or Cray MPICH, rather than OpenMPI. -## Job Submission Recommendations