From f30f4b0179ff6fbfedbe5d34d2f82a196efea4df Mon Sep 17 00:00:00 2001 From: Kenneth Hoste Date: Thu, 7 Sep 2023 11:08:20 +0200 Subject: [PATCH 01/11] add documentation on EESSI test suite (WIP) --- docs/test-suite.md | 361 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 361 insertions(+) create mode 100644 docs/test-suite.md diff --git a/docs/test-suite.md b/docs/test-suite.md new file mode 100644 index 000000000..96a6f6ad8 --- /dev/null +++ b/docs/test-suite.md @@ -0,0 +1,361 @@ +# EESSI test suite + +[toc] + +## Installation + +### Requirements + +The EESSI test suite requires Python >= 3.6 and [ReFrame](https://reframe-hpc.readthedocs.io). + +### Installing Reframe (incl. `hpctestlib`) + +You need to make sure that [ReFrame](https://reframe-hpc.readthedocs.io) is available - that is, the `reframe` command should work: + +```bash +reframe --version +``` + +General instructions for installing ReFrame are available in the [ReFrame documentation](https://reframe-hpc.readthedocs.io/en/stable/started.html). +The EESSI test suite requires ReFrame v4.0 or newer. + +#### `hpctestlib` ReFrame component + +The EESSI test suite requires the [`hpctestlib`](https://github.com/reframe-hpc/reframe/tree/develop/hpctestlib) component of ReFrame, +which is currently not included in a standard installation. + +We recommend installing ReFrame using [EasyBuild](https://easybuild.io/), +or using a ReFrame installation that is available in EESSI (pilot version 2023.06, or newer). + +For example: + +```bash +source /cvmfs/pilot.eessi-hpc.org/versions/2023.06/init/bash +module load ReFrame/4.2.0 +``` + +To check whether the `hpctestlib` component of ReFrame is available, +try importing the Python package: + +```bash +python3 -c 'import hpctestlib' +``` + +### Installing the EESSI test suite + +To install the EESSI test suite, you can either use `pip` or clone the GitHub repository directly: + +#### Using `pip` + +```bash +pip install git+https://github.com/EESSI/test-suite.git +``` + +#### Cloning the repository + +```bash +git clone https://github.com/EESSI/test-suite EESSI-test-suite +cd EESSI-test-suite +export PYTHONPATH=$PWD:$PYTHONPATH +``` + +#### Check installation + +To check whether the EESSI test suite installed correctly, +try importing the `eessi.testsuite` Python package: + +```bash +python3 -c 'import eessi.testsuite' +``` + +## Configuration + +Before you can run the EESSI test suite, you need to create a configuration file for ReFrame that is specific to the system on which the tests will be run. + +Example configuration files are available [in the `EESSI/test-suite` GitHub repository in the `config` subdirectory](https://github.com/EESSI/test-suite/tree/main/config), which you can use as a template to create your own. + +### Configuring ReFrame + +We recommend configuring ReFrame by setting a couple of `$RFM_*` environment variables, to avoid that you need to include particular options to the `reframe` command over and over again. + +#### ReFrame configuration file (`$RFM_CONFIG_FILES`) + +*(see also [ReFrame docs](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#envvar-RFM_CONFIG_FILES))* + +``` +export RFM_CONFIG_FILES=$HOME/EESSI-test-suite/config/example.py +``` + +#### Search path for tests (`$RFM_CHECK_SEARCH_PATH`) + +*(see also [ReFrame docs](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#envvar-RFM_CHECK_SEARCH_PATH))* + +``` +export RFM_CHECK_SEARCH_PATH=$HOME/EESSI-test-suite/eessi/testsuite/tests +export RFM_CHECK_SEARCH_RECURSIVE=1 +``` + +**FIXME** explain why recursive needs to be enabled + +### System configuration file + +**FIXME** see Vega as reference example? + +* partitions (incl. features, access, launcher, scheduler, name + partition (cfr. `--system`)) + +### Auto-detection of processor information + +You can let ReFrame [auto-detect the processor information](https://reframe-hpc.readthedocs.io/en/stable/configure.html#proc-autodetection) for your system. + +ReFrame will automatically use auto-detection if the `partitions` section +of you configuration file does not specify `processor` information for a +particular partition, and `remote_detect` is enabled. + +To trigger the auto-detection of processor information, it is sufficient to +let ReFrame list the available tests: + +``` +reframe --list +``` + +ReFrame will store the processor information for your system in `~/.reframe/topology/-/processor.json`. + +#### Note + +If you are using Slurm, you may need to temporarily change the launcher to `srun` in your configuration for auto-detection of processor information to work correctly. + +See the [example AWS configuration file](https://github.com/EESSI/test-suite/blob/main/config/aws_citc.py), and [ReFrame issue #2926](https://github.com/reframe-hpc/reframe/issues/2926) for more information. + +In addition, auto-detection does not work if ReFrame was installed directly +from PyPI, see [ReFrame issue #2914](https://github.com/reframe-hpc/reframe/issues/2914). +**FIXME** auto-detection also doesn't work with installation in EasyBuild/EESSI + +## Running tests + +### Listing available tests + +To list the tests that are available in the EESSI test suite, +use `reframe --list` (or `reframe -L` for short). + +If you have properly [configured ReFrame](#Configuring-ReFrame), you should +see a (potentially long) list of checks in the output: + +``` +$ reframe --list +... +[List of matched checks] +- ... +Found 1234 tests +``` + +**FIXME Kenneth** checks are only generated for available modules + +### Performing a dry-run + +To perform a dry run of the EESSI test suite, use `reframe --dry-run`: + +``` +$ reframe --dry-run +... +[==========] Running 1234 check(s) + +[----------] start processing checks +[ DRY ] GROMACS_EESSI ... +``` + +**FIXME Kenneth** explain why this can be useful, contrast with `--list` (which doesn't take into account partitions) + +### Running the (full) test suite + +To actually run the (full) EESSI test suite and let ReFrame +produce a performance report, use `reframe --run --performance-report`. + +We recommend filtering the tests that will be run however, [see below](#Filtering-tests). + +### ReFrame output and log files + +**FIXME** + +- `--prefix` to control where output goes +- +- relation with common logging setup +- ReFrame log, perf log, output dirs, staging dirs, ... + +### Filtering tests + +By default, ReFrame will automatically generate checks for each system partition, +based on the tests available in the EESSI test suite, available software modules, and tags defined in the EESSI test suite. + +To avoid being overwhelmed by checks, it is recommend to +[apply filters](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#test-filtering) so ReFrame only generates the checks you are interested in. + +#### Filtering by test name + +**FIXME** `--name` + +#### Filtering by system (partition) + +**FIXME** Cover both for specific system/partition + +By default, ReFrame will generate checks for each system partition +that is listed in your configuration file. + +To only let ReFrame checks for a particular system partition, +you can use the [`--system` option](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#cmdoption-system). + +For example, to let ReFrame only generate checks for the `part_one` partition +of the system named `example`, use: + +``` +reframe --system example:part_one ... +``` + +Use the `--dry-run` option to check the impact of this. + +#### Filtering by tags + +To filter tests using one or more tags, you can use the [`--tag` option](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#cmdoption-0). + +Using `--list-tags` you can get a list of known tags. + +To check the impact of this on generated checks by ReFrame, use `--list`. + +##### `CI` tag + +For each software that is supported by the test suite, +a small test is tagged with `CI` to indicate it can be used in a Continuous Integration (CI) environment. + +Hence, you can use this tag to let ReFrame only generate checks for small test cases: + +``` +reframe --tag CI +``` + +For example: + +``` +$ reframe --name GROMACS --tag CI +... +FIXME OUTPUT +``` + +##### `scale` tags + +The EESSI test suite defines a set of custom tags that control the *scale* +of tests, that is how many resources will be used for running it. + +| tag name | description | +|:--------:|-------------| +| `1_core` | using a single CPU core, or single GPU | +| `2_cores` | using 2 CPU cores, or 2 GPUs | +| `4_cores` | using 4 CPU cores, or 4 GPUs | +| `1_8_node` | using 1/8th of a node (12.5% of available cores/GPUs, 1 at minimum) | +| `1_4_node` | using a quarter of a node (25% of available cores/GPUs, 1 at minimum) | +| `1_2_node` | using half of a node (50% of available cores/GPUs, 1 at minimum) | +| `1_node` | using a full node (all available cores/GPUs) | +| `2_nodes` | using 2 full nodes | +| `4_nodes` | using 4 full nodes | +| `8_nodes` | using 8 full nodes | +| `16_nodes` | using 16 full nodes | + +##### Using multiple tags + +To filter tests using multiple tags, you can: + +* use `|` as separator to indicate that one of the specified tags must match (logical OR, for example `--tag='1_core|2_cores'`); +* use the `--tag` option multiple times to indicate that all specified tags must match (logical AND, for example `--tag CI --tag 1_core`); + +#### Filtering by modules + +**FIXME** This is not really filtering, but overriding default behaviour (see also https://github.com/EESSI/test-suite#changing-the-default-test-behavior-on-the-cmd-line), should use `--name` instead - add warning that this is advanced usage + +By default, ReFrame will generate checks for each available software module +that can be used to run a particular test (for example, all available GROMACS modules will be used once to run each GROMACS test). + +To only run the tests with specific modules, use the `--setvar modules=...` option. + +You can use the `--list` option to check the impact on checks that ReFrame generates. + +For example: + +``` +reframe --setvar modules=GROMACS/2021.3-foss-2021a --list +``` + +### Example commands + +#### Running all GROMACS tests on 4 cores + +``` +reframe --name GROMACS --tag 4_cores --run +``` + +#### Running all GROMACS tests using a specific GROMACS module + +``` +reframe --setvar modules=GROMACS/2021.3-foss-2021a --run +``` + +## Available tests + +The EESSI test suite currently includes tests for: + +* [GROMACS](#GROMACS) +* [TensorFlow](#TensorFlow) + +For a complete overview of all available tests in the EESSI test suite, see . + +### GROMACS + +using GROMACS test in ReFrame test library + +https://www.hecbiosim.ac.uk/access-hpc/benchmarks + +Example run: + + +### TensorFlow + +Example run: + +``` +[ReFrame Setup] + version: 4.2.0 + command: '/readonly/dodrio/apps/RHEL8/zen2-ib/software/ReFrame/4.2.0/bin/reframe --config-file config/vsc_hortense.py --checkpath eessi/testsuite/tests/apps/tensorflow --name TensorFlow/2.11 --tag 1_core --system hortense:cpu_rome_512gb --run --performance-report' + launched by: vsc46128@login55.dodrio.os + working directory: '/dodrio/scratch/projects/gadminforever/vsc46128/test-suite' + settings files: '', 'config/vsc_hortense.py' + check search path: '/dodrio/scratch/projects/gadminforever/vsc46128/test-suite/eessi/testsuite/tests/apps/tensorflow' + stage directory: '/dodrio/scratch/projects/gadminforever/vsc46128/test-suite/stage' + output directory: '/dodrio/scratch/projects/gadminforever/vsc46128/test-suite/output' + log files: '/dodrio/scratch/projects/gadminforever/vsc46128/test-suite/reframe_20230828_101237.log', '/dodrio/scratch/projects/gadminforever/vsc46128/test-suite/reframe_20230828_101237.out' + +[==========] Running 1 check(s) +[==========] Started on Mon Aug 28 10:12:38 2023 + +[----------] start processing checks +[ [32mRUN [0m ] TENSORFLOW_EESSI %scale=1_core %module_name=TensorFlow/2.11.0-foss-2022a %device_type=cpu /af8226d5 @hortense:cpu_rome_512gb+default +[ [32m OK[0m ] (1/1) TENSORFLOW_EESSI %scale=1_core %module_name=TensorFlow/2.11.0-foss-2022a %device_type=cpu /af8226d5 @hortense:cpu_rome_512gb+default +P: perf: 2770.757396498742 img/s (r:0, l:None, u:None) +[----------] all spawned checks have finished + +[ [32m PASSED [0m ] Ran 1/1 test case(s) from 1 check(s) (0 failure(s), 0 skipped, 0 aborted) +[==========] Finished on Mon Aug 28 10:16:32 2023 + +========================================================================================================================================================= +PERFORMANCE REPORT +--------------------------------------------------------------------------------------------------------------------------------------------------------- +[TENSORFLOW_EESSI %scale=1_core %module_name=TensorFlow/2.11.0-foss-2022a %device_type=cpu /af8226d5 @hortense:cpu_rome_512gb:default] + num_cpus_per_task: 1 + num_tasks_per_node: 1 + num_tasks: 1 + performance: + - perf: 2770.757396498742 img/s (r: 0 img/s l: -inf% u: +inf%) +--------------------------------------------------------------------------------------------------------------------------------------------------------- +Log file(s) saved in '/dodrio/scratch/projects/gadminforever/vsc46128/test-suite/reframe_20230828_101237.log', '/dodrio/scratch/projects/gadminforever/vsc46128/test-suite/reframe_20230828_101237.out' +``` + +## Release notes + +v0.1.0 +- ... From 42154b28f2d7bc37b68a5bdb16c77c4514f16136 Mon Sep 17 00:00:00 2001 From: Kenneth Hoste Date: Wed, 20 Sep 2023 20:46:28 +0200 Subject: [PATCH 02/11] update draft of test suite docs --- docs/test-suite-full.md | 599 ++++++++++++++++++++++++++++++++++++++++ docs/test-suite.md | 361 ------------------------ 2 files changed, 599 insertions(+), 361 deletions(-) create mode 100644 docs/test-suite-full.md delete mode 100644 docs/test-suite.md diff --git a/docs/test-suite-full.md b/docs/test-suite-full.md new file mode 100644 index 000000000..023456f14 --- /dev/null +++ b/docs/test-suite-full.md @@ -0,0 +1,599 @@ +# EESSI test suite + +## Installation + +### Requirements + +The EESSI test suite requires Python >= 3.6 and [ReFrame](https://reframe-hpc.readthedocs.io). + +### Installing Reframe (incl. `hpctestlib`) + +You need to make sure that [ReFrame](https://reframe-hpc.readthedocs.io) is available - that is, the `reframe` command should work: + +```bash +reframe --version +``` + +General instructions for installing ReFrame are available in the [ReFrame documentation](https://reframe-hpc.readthedocs.io/en/stable/started.html). +The EESSI test suite requires ReFrame v4.3.3 (or newer). + +#### `hpctestlib` ReFrame component + +The EESSI test suite requires the [`hpctestlib`](https://github.com/reframe-hpc/reframe/tree/develop/hpctestlib) component of ReFrame, +which is currently not included in a standard installation of ReFrame. + +We recommend installing ReFrame using [EasyBuild](https://easybuild.io/) (version 4.8.1, or newer), +or using a ReFrame installation that is available in EESSI (pilot version 2023.06, or newer). + +For example (using EESSI): + +```bash +source /cvmfs/pilot.eessi-hpc.org/versions/2023.06/init/bash +module load ReFrame/4.2.0 +``` + +To check whether the `hpctestlib` component of ReFrame is available, +try importing the Python package: + +```bash +python3 -c 'import hpctestlib' +``` + +### Installing the EESSI test suite + +To install the EESSI test suite, you can either use `pip` or clone the GitHub repository directly: + +#### Using `pip` + +```bash +pip install git+https://github.com/EESSI/test-suite.git +``` + +#### Cloning the repository + +```bash +git clone https://github.com/EESSI/test-suite $HOME/EESSI-test-suite +cd EESSI-test-suite +export PYTHONPATH=$PWD:$PYTHONPATH +``` + +#### Check installation + +To check whether the EESSI test suite installed correctly, +try importing the `eessi.testsuite` Python package: + +```bash +python3 -c 'import eessi.testsuite' +``` + +## Configuration + +Before you can run the EESSI test suite, you need to create a configuration file for ReFrame that is specific to the system on which the tests will be run. + +Example configuration files are available [in the `EESSI/test-suite` GitHub repository in the `config` subdirectory](https://github.com/EESSI/test-suite/tree/main/config), which you can use as a template to create your own. + +### Configuring ReFrame environment variables + +We recommend setting a couple of `$RFM_*` environment variables to configure ReFrame, to avoid needing to include particular options to the `reframe` command over and over again. + +#### ReFrame configuration file (`$RFM_CONFIG_FILES`) + +*(see also [`RFM_CONFIG_FILES` in ReFrame docs](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#envvar-RFM_CONFIG_FILES))* + +Define `$RFM_CONFIG_FILES` to tell ReFrame which configuration file to use, for example: + +```bash +export RFM_CONFIG_FILES=$HOME/EESSI-test-suite/config/example.py +``` + +#### Search path for tests (`$RFM_CHECK_SEARCH_PATH`) + +*(see also [`RFM_CHECK_SEARCH_PATH` in ReFrame docs](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#envvar-RFM_CHECK_SEARCH_PATH))* + +Define `$RFM_CHECK_SEARCH_PATH` to tell ReFrame which directory to search for tests. In addition, define `$RFM_CHECK_SEARCH_RECURSIVE` to ensure that ReFrame searches `$RFM_CHECK_SEARCH_PATH` recursively (i.e. so that also tests in subdirectories are found). + +For example: + +```bash +export RFM_CHECK_SEARCH_PATH=$HOME/EESSI-test-suite/eessi/testsuite/tests +export RFM_CHECK_SEARCH_RECURSIVE=1 +``` + +#### ReFrame prefix (`$RFM_PREFIX`) + +*(see also [`RFM_PREFIX` in ReFrame docs](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#envvar-RFM_PREFIX))* + +Define `$RFM_PREFIX` to tell ReFrame where to store the files it produces. E.g. + +``` +export RFM_PREFIX=$HOME/reframe_runs +``` +This involves: + +* test output directories (which contain e.g. the job script, stderr and stdout for each of the test jobs) +* staging directories (unless otherwise specified by `staging`, see below); +* performance logs; + +If our common logging configuration ([see "Logging" below](#logging)) is used, the regular ReFrame log file will also end up in the location specified by `$RFM_PREFIX`. + +Note that the default is for ReFrame to use the current directory as prefix. We recommend setting a prefix so that logs are not scattered around and nicely appended for each run. + +### The ReFrame configuration file + +In order for ReFrame to run tests on your system, it needs to know some properties about your system. For example, it needs to know what kind of scheduler you have, which partitions the system has, how to submit to those partitions, etc. All of this has to be described in a *ReFrame configuration file* (see also the section on `$RFM_CONFIG_FILES` above). + +The [official ReFrame documentation](https://reframe-hpc.readthedocs.io/en/stable/configure.html) provides the full description on configuring ReFrame for your site. However, there are some configuration settings that are specifically required for the EESSI test suite. Also, there are a large amount of settings you can configure, which makes the official documentation potentially a bit overwhelming. + +Here, we will describe how to create a configuration file that works with the EESSI test suite, starting from an [example configuration file `settings_example.py`](https://github.com/EESSI/test-suite/tree/main/config/settings_example.py), which defines the most common configuration settings. You can look at other example configurations in the [config directory](https://github.com/EESSI/test-suite/tree/main/config/) for more inspiration. + +#### Python imports + +The EESSI test suite standardizes a few string-based values, as well as the logging format used by ReFrame. Every ReFrame configuration file used for running the EESSI test suite should therefore start with the following import statements: + +```python +from eessi.testsuite.common_config import common_logging_config +from eessi.testsuite.constants import * +``` + +#### High-level system info (`systems`) + +First, we describe the system at its highest level through the [`systems`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#systems) keyword. Note that you can define multiple systems in a single configuration file (`systems` is a Python list value). We recommend defining just a single system in each configuration file, as it makes the configuration file a bit easier to digest (for humans). + +An example of the `systems` section of the configuration file would be: + +```python +site_configuration = { + 'systems': [ + # We could list multiple systems. Here, we just define one + { + 'name': 'example', + 'descr': 'Example cluster', + 'modules_system': 'lmod', + 'hostnames': ['*'], + 'stagedir': f'/some/shared/dir/{os.environ.get("USER")}/reframe_output/staging', + 'partitions': [...], + } + ] +} +``` + +The most common configuration items defined at this level are: + +- [`name`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.name): The name of the system. Pick whatever makes sense for you. +- [`descr`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.descr): Description of the system. Again, pick whatever you like. +- [`modules_system`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.modules_system): The modules system used on your system. EESSI provides modules in `lmod` format (no need to change this, unless you want to run tests from the EESSI test suite with non-EESSI modules). +- [`hostnames`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.hostnames): The names of the hosts on which you will run the ReFrame command, as regular expression. Using these names, ReFrame can automatically determine which of the listed configurations in the `systems` list to use, which is useful if you're defining multiple systems in a single configuration file. If you follow our recommendation to limit yourself to one system per configuration file, simply define `'hostnames': ['*']`. +- [`prefix`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.prefix): Directory prefix for a ReFrame run on this system. Any directories or files produced by ReFrame will use this prefix, if not specified otherwise. We don't recommend setting `prefix`, but instead to set the environment variable `$RFM_PREFIX`, as our common logging configuration (see description below) can pick up on it. +- [`stagedir`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.stagedir): A shared directory that is available on all nodes that will execute ReFrame tests. This is used for storing (temporary) files related to the test. Typically, you want to set this to a path on a (shared) scratch filesystem. Defining this is optional: the default is a '`stage`' directory inside the `prefix` directory. +- [`partitions`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions): Details on system partitions, see below. + + + +#### System partitions (`systems.partitions`) + +The next step is to add the system partitions to the configuration files. This is again a Python list, as a system can have multiple partitions. + +The [`partitions`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions) section of the system config for a system with two Slurm partitions (one CPU partition, and one GPU partition) could for example look something like this: + +```python +site_configuration = { + 'systems': [ + { + ... + 'partitions': [ + { + 'name': 'cpu_partition', + 'descr': 'CPU partition' + 'scheduler': 'slurm', + 'prepare_cmds': ['source /cvmfs/pilot.eessi-hpc.org/latest/init/bash'], + 'launcher': 'mpirun', + 'access': ['-p cpu'], + 'environs': ['default'], + 'max_jobs': 4, + 'features': [FEATURES[CPU]], + }, + { + 'name': 'gpu_partition', + 'descr': 'GPU partition' + 'scheduler': 'slurm', + 'prepare_cmds': ['source /cvmfs/pilot.eessi-hpc.org/latest/init/bash'], + 'launcher': 'mpirun', + 'access': ['-p gpu'], + 'environs': ['default'], + 'max_jobs': 4, + 'resources': [ + { + 'name': '_rfm_gpu', + 'options': ['--gpus-per-node={num_gpus_per_node}'], + } + ], + 'devices': [ + { + 'type': DEVICE_TYPES[GPU], + 'num_devices': 4, + } + ], + 'features': [ + FEATURES[CPU], + FEATURES[GPU], + ], + 'extras': { + GPU_VENDOR: GPU_VENDORS[NVIDIA], + }, + }, + ] + } + ] +} +``` + +The most common configuration items defined at this level are: + +- [`name`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.name): The name of the partition. Pick anything you like. +- [`descr`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.descr): Description of the partition. Pick anything you like. +- [`scheduler`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.scheduler): The scheduler used to submit to this partition, for example `slurm`. All valid options can be found [in the ReFrame documentation](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.scheduler). +- [`launcher`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.launcher): The parallel launcher used on this partition, for example `mpirun` or `srun`. All valid options can be found [in the ReFrame documentation](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.launcher). +- [`access`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.access): A list of arguments that you would normally pass to the scheduler when submitting to this partition (for example '`-p cpu`' for submitting to a Slurm partition called `cpu`). If supported by your scheduler, we recommend to _not_ export the submission environment (for example by using '`--export=None`' with Slurm). This avoids test failures due to environment variables set in the submission environment that are passed down to submitted jobs. +- [`prepare_cmds`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.prepare_cmds): Commands to execute at the start of every job that runs a test. If your batch scheduler does not export the environment of the submit host, this is typically where you can initialize the EESSI environment. +- [`environs`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.environs): The names of the programming environments (to be defined later in the configuration file) that may be used on this partition. A programming environment is required for tests that are compiled first, before they can run. The EESSI test suite however only tests existing software installations, so no compilation (or specific programming environment) is needed. Simply specify `'environs': ['default']`, since ReFrame requires _a_ default environment to be defined. +- [`max_jobs`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.max_jobs): The maximum amount of jobs ReFrame is allowed to submit in parallel. Some batch systems limit how many jobs users are allowed to have in the queue. You can use this to make sure ReFrame doesn't exceed that limit. +- [`resources`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#custom-job-scheduler-resources) This field defines how additional resources can be requested in a batch job. Specifically, on a GPU partition, you have to define a resource with the name `_rfm_gpu`. The `options` field should then contain the argument to be passed to the batch scheduler in order to request a certain number of GPUs _per node_. This could be different for different batch schedulers. For example, for SLURM, one would specify: +```python= +'resources': [ + { + 'name': '_rfm_gpu', + 'options': ['--gpus-per-node={num_gpus_per_node}'], + } +], +``` +**FIXME Kenneth: check if resources description is clear** +- [`processor`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.processor): **FIXME Kenneth link to autodetection section** We recommend to NOT define this field, unless CPU autodetection (see below) is not working for you. The EESSI test suite relies on information about your processor topology to run. Using CPU autodetection is the easiest way to ensure that _all_ processor-related information needed by the EESSI test suite are defined. Only if CPU autodetection is failing for you do we advice you to set the [`processor` field](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.processor) in the partition configuration as an alternative. Although additional fields might be used by future EESSI tests, at this point you'll have to specify _at least_ the following fields: + ```python + 'processor': { + 'num_cpus': 64, # Total number of CPU cores in a node + 'num_sockets': 2, # Number of sockets in a node + 'num_cpus_per_socket': 32, # Number of CPU cores per socket + 'num_cpus_per_core': 1, # Number of hardware threads per CPU core + } + ``` +- [`features`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.features): The `features` field is used by the EESSI test suite to run tests _only_ on a partition if it supports a certain _feature_ (for example if GPUs are available). Feature names are standardized in the EESSI test suite in [`eessi.testsuite.constants.FEATURES`](https://github.com/EESSI/test-suite/blob/main/eessi/testsuite/constants.py) dictionary. + Typically, you want to define `features: [FEATURES[CPU]]` for CPU based partitions, and `features: [FEATURES[GPU]]` for GPU based partitions. The first tells the EESSI test suite that this partition can only run CPU-based tests, whereas second indicates that this partition can only run GPU-based tests. + You _can_ define a single partition to have _both_ the CPU and GPU features (since `features` is a Python list). However, since the CPU-based tests will not ask your batch scheduler for GPU resources, this _may_ fail on batch systems that force you to ask for at least one GPU on GPU-based nodes. Also, running CPU-only code on a GPU node is typically considered bad practice, thus testing its functionality is typically not relevant. +- [`devices`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.devices): This field specifies information on devices (for example) present in the partition. Device types are standardized in the EESSI test suite in the [`eessi.testsuite.constants.DEVICE_TYPES`](https://github.com/EESSI/test-suite/blob/main/eessi/testsuite/constants.py) dictionary. This is used by the EESSI test suite to determine how many of these devices it can/should use per node. + Typically, there is no need to define `devices` for CPU partitions. + For GPU partitions, you want to define something like: + ```python + 'devices': { + 'type': DEVICE_TYPES[GPU], + 'num_devices': 4, # or however many GPUs you have per node + } + ``` +- [`extras`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.extras): This field specifies extra information on the partition, such as the GPU vendor. Valid fields for `extras` are standardized as constants in [`eessi.testsuite.constants`](https://github.com/EESSI/test-suite/blob/main/eessi/testsuite/constants.py) (for example `GPU_VENDOR`). This is used by the EESSI test suite to decide if a partition can run a test that _specifically_ requires a certain brand of GPU. + Typically, there is no need to define `extras` for CPU partitions. + For GPU partitions, you typically want to specify the GPU vendor, for example: + ```python + 'extras': { + GPU_VENDOR: GPU_VENDORS[NVIDIA] + } + ``` + +Note that as more tests are added to the EESSI test suite, the use of `features`, `devices` and `extras` by the EESSI test suite may be extended, which may require an update of your configuration file to define newly recognized fields. + +!!! note + + Another thing to note is that ReFrame partitions are _virtual_ entities: they may or may not correspond to a partition as it is configured in your batch system. One might for example have a single partition in the batch system, but configure it as two separate partitions in the ReFrame configuration file based on additional constraints that are passed to the scheduler, see for example the [AWS CitC example configuration](https://github.com/EESSI/test-suite/blob/main/config/aws_citc.py). The EESSI test suite (and more generally: ReFrame) assumes the hardware _within_ a partition defined in the ReFrame configuration file is _homogeneous_. + +#### Environments + +ReFrame needs a programming environment to be defined in its configuration file for tests that need to be compiled before they are run. While we don't have such tests in the EESSI test suite, ReFrame requires _some_ programming environment to be defined: + +```python +site_configuration = { + ... + 'environments': [ + { + 'name': 'default', # Note: needs to match whatever we set for 'environs' in the partition + 'cc': 'cc', + 'cxx': '', + 'ftn': '', + } + ] +} +``` +Note that the `name` here needs to match whatever we specified for the `environs` property of the partitions. + +#### Logging + +ReFrame allows a large degree of control over what gets logged, and where. For convenience, we have created a common logging configuration in `eessi.testsuite.common_config` that provides a reasonable default. It can be used by defining: +```python +site_configuration = { + ... + 'logging': common_logging_config(), +} +``` +If combined by setting `RFM_PREFIX`, the output, performance log, and regular ReFrame logs all end up in the directory specified by `RFM_PREFIX`. This is the setup we would recommend. + +Alternatively, a prefix can be passed to `common_logging_config(prefix)` which will control where the regular ReFrame log ends up. Note that the performance logs do not respect that argument: they will still end up in the standard ReFrame prefix (by default the current directory, unless otherwise set with `RFM_prefix` or `--prefix`). + +#### Auto-detection of processor information + +You can let ReFrame [auto-detect the processor information](https://reframe-hpc.readthedocs.io/en/stable/configure.html#proc-autodetection) for your system. + +ReFrame will automatically use auto-detection if two conditions are true: +1. the `partitions` section of you configuration file does not specify `processor` information for a particular partition (as per our recommendation in the previous section) +2. `remote_detect` is enabled in the `general` configuration + +To enable `remote_detect` in the `general` part of the configation file: +```python +site_configuration = { + ... + 'general': [ + { + 'remote_detect': True + } + ] +} +``` + +To trigger the auto-detection of processor information, it is sufficient to +let ReFrame list the available tests: + +``` +reframe --list +``` + +ReFrame will store the processor information for your system in `~/.reframe/topology/-/processor.json`. + +##### Note + +Two important bugs were resolved in ReFrame's CPU autodetect functionality [in version 4.3.3](https://github.com/reframe-hpc/reframe/pull/2978). _We strongly recommend you use `ReFrame >= 4.3.3`_. + +If you are using `ReFrame < 4.3.3`, you may encounter two issues: +1. ReFrame will try to use the parallel launcher command configured for each partition (e.g. `mpirun`) when doing the remote autodetect. If there is no system-version of `mpirun` available, that will fail. See [ReFrame issue #2926](https://github.com/reframe-hpc/reframe/issues/2926). +2. CPU autodetection only worked when using a clone of the ReFrame repository, _not_ when it was installed with `pip` or `EasyBuild` (as is also the case for the ReFrame shipped with EESSI). See [ReFrame issue #2914](https://github.com/reframe-hpc/reframe/issues/2914). + + +## Running tests + +### Listing available tests + +To list the tests that are available in the EESSI test suite, +use `reframe --list` (or `reframe -L` for short). + +If you have properly [configured ReFrame](#Configuring-ReFrame), you should +see a (potentially long) list of checks in the output: + +``` +$ reframe --list +... +[List of matched checks] +- ... +Found 1234 tests +``` + +**FIXME Kenneth** checks are only generated for available modules + +### Performing a dry-run + +To perform a dry run of the EESSI test suite, use `reframe --dry-run`: + +``` +$ reframe --dry-run +... +[==========] Running 1234 check(s) + +[----------] start processing checks +[ DRY ] GROMACS_EESSI ... +``` + +**FIXME Kenneth** explain why this can be useful, contrast with `--list` (which doesn't take into account partitions) + +### Running the (full) test suite + +To actually run the (full) EESSI test suite and let ReFrame +produce a performance report, use `reframe --run --performance-report`. + +We recommend filtering the tests that will be run however, [see below](#Filtering-tests). + +### ReFrame output and log files + +**FIXME** + +- `--prefix` to control where output goes +- relation with common logging setup +- ReFrame log, perf log, output dirs, staging dirs, ... +- example of output files and where they can be found + + +### Filtering tests + +By default, ReFrame will automatically generate checks for each system partition, +based on the tests available in the EESSI test suite, available software modules, and tags defined in the EESSI test suite. + +To avoid being overwhelmed by checks, it is recommend to +[apply filters](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#test-filtering) so ReFrame only generates the checks you are interested in. + +#### Filtering by test name + +**FIXME** `--name` + +#### Filtering by system (partition) + +**FIXME** Cover both for specific system/partition + +By default, ReFrame will generate checks for each system partition +that is listed in your configuration file. + +To only let ReFrame checks for a particular system partition, +you can use the [`--system` option](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#cmdoption-system). + +For example, to let ReFrame only generate checks for the `part_one` partition +of the system named `example`, use: + +``` +reframe --system example:part_one ... +``` + +Use the `--dry-run` option to check the impact of this. + +#### Filtering by tags + +To filter tests using one or more tags, you can use the [`--tag` option](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#cmdoption-0). + +Using `--list-tags` you can get a list of known tags. + +To check the impact of this on generated checks by ReFrame, use `--list`. + +##### `CI` tag + +For each software that is supported by the test suite, +a small test is tagged with `CI` to indicate it can be used in a Continuous Integration (CI) environment. + +Hence, you can use this tag to let ReFrame only generate checks for small test cases: + +``` +reframe --tag CI +``` + +For example: + +``` +$ reframe --name GROMACS --tag CI +... +FIXME OUTPUT +``` + +##### `scale` tags + +The EESSI test suite defines a set of custom tags that control the *scale* +of tests, that is how many resources will be used for running it. + +| tag name | description | +|:--------:|-------------| +| `1_core` | using a single CPU core, or single GPU | +| `2_cores` | using 2 CPU cores, or 2 GPUs | +| `4_cores` | using 4 CPU cores, or 4 GPUs | +| `1_8_node` | using 1/8th of a node (12.5% of available cores/GPUs, 1 at minimum) | +| `1_4_node` | using a quarter of a node (25% of available cores/GPUs, 1 at minimum) | +| `1_2_node` | using half of a node (50% of available cores/GPUs, 1 at minimum) | +| `1_node` | using a full node (all available cores/GPUs) | +| `2_nodes` | using 2 full nodes | +| `4_nodes` | using 4 full nodes | +| `8_nodes` | using 8 full nodes | +| `16_nodes` | using 16 full nodes | + +##### Using multiple tags + +To filter tests using multiple tags, you can: + +* use `|` as separator to indicate that one of the specified tags must match (logical OR, for example `--tag='1_core|2_cores'`); +* use the `--tag` option multiple times to indicate that all specified tags must match (logical AND, for example `--tag CI --tag 1_core`); + +#### Filtering by modules + +**FIXME** This is not really filtering, but overriding default behaviour (see also https://github.com/EESSI/test-suite#changing-the-default-test-behavior-on-the-cmd-line), should use `--name` instead - add warning that this is advanced usage + +By default, ReFrame will generate checks for each available software module +that can be used to run a particular test (for example, all available GROMACS modules will be used once to run each GROMACS test). + +To only run the tests with specific modules, use the `--setvar modules=...` option. + +You can use the `--list` option to check the impact on checks that ReFrame generates. + +For example: + +``` +reframe --setvar modules=GROMACS/2021.3-foss-2021a --list +``` + +### Overriding test parameters (ADVANCED) + +- use of `--setvar` +- recommend to only do this for specific tests, like `--setvar GROMACS_EESSI.modules=GROMACS/2021.6-foss-2022a` + +### Example commands + +#### Running all GROMACS tests on 4 cores + +``` +reframe --name GROMACS --tag 4_cores --run --performance-report +``` + +**FIXME** explain options being used + +#### Running all GROMACS tests using a specific GROMACS module + +``` +reframe --setvar modules=GROMACS/2021.3-foss-2021a --run +``` + +**FIXME use `--name` to filter** + +#### Re-running a specific test (using hash) + +**FIXME** + +## Available tests + +The EESSI test suite currently includes tests for: + +* [GROMACS](#GROMACS) +* [TensorFlow](#TensorFlow) + +For a complete overview of all available tests in the EESSI test suite, see . + +### GROMACS + +using GROMACS test in ReFrame test library + +https://www.hecbiosim.ac.uk/access-hpc/benchmarks + + +### TensorFlow + +- minimal TensorFlow version +- info on workload being run + +## Example run + +``` +[ReFrame Setup] + version: 4.2.0 + command: '/readonly/dodrio/apps/RHEL8/zen2-ib/software/ReFrame/4.2.0/bin/reframe --config-file config/vsc_hortense.py --checkpath eessi/testsuite/tests/apps/tensorflow --name TensorFlow/2.11 --tag 1_core --system hortense:cpu_rome_512gb --run --performance-report' + launched by: vsc46128@login55.dodrio.os + working directory: '/dodrio/scratch/projects/gadminforever/vsc46128/test-suite' + settings files: '', 'config/vsc_hortense.py' + check search path: '/dodrio/scratch/projects/gadminforever/vsc46128/test-suite/eessi/testsuite/tests/apps/tensorflow' + stage directory: '/dodrio/scratch/projects/gadminforever/vsc46128/test-suite/stage' + output directory: '/dodrio/scratch/projects/gadminforever/vsc46128/test-suite/output' + log files: '/dodrio/scratch/projects/gadminforever/vsc46128/test-suite/reframe_20230828_101237.log', '/dodrio/scratch/projects/gadminforever/vsc46128/test-suite/reframe_20230828_101237.out' + +[==========] Running 1 check(s) +[==========] Started on Mon Aug 28 10:12:38 2023 + +[----------] start processing checks +[ RUN  ] TENSORFLOW_EESSI %scale=1_core %module_name=TensorFlow/2.11.0-foss-2022a %device_type=cpu /af8226d5 @hortense:cpu_rome_512gb+default +[  OK ] (1/1) TENSORFLOW_EESSI %scale=1_core %module_name=TensorFlow/2.11.0-foss-2022a %device_type=cpu /af8226d5 @hortense:cpu_rome_512gb+default +P: perf: 2770.757396498742 img/s (r:0, l:None, u:None) +[----------] all spawned checks have finished + +[  PASSED  ] Ran 1/1 test case(s) from 1 check(s) (0 failure(s), 0 skipped, 0 aborted) +[==========] Finished on Mon Aug 28 10:16:32 2023 + +========================================================================================================================================================= +PERFORMANCE REPORT +--------------------------------------------------------------------------------------------------------------------------------------------------------- +[TENSORFLOW_EESSI %scale=1_core %module_name=TensorFlow/2.11.0-foss-2022a %device_type=cpu /af8226d5 @hortense:cpu_rome_512gb:default] + num_cpus_per_task: 1 + num_tasks_per_node: 1 + num_tasks: 1 + performance: + - perf: 2770.757396498742 img/s (r: 0 img/s l: -inf% u: +inf%) +--------------------------------------------------------------------------------------------------------------------------------------------------------- +Log file(s) saved in '/dodrio/scratch/projects/gadminforever/vsc46128/test-suite/reframe_20230828_101237.log', '/dodrio/scratch/projects/gadminforever/vsc46128/test-suite/reframe_20230828_101237.out' +``` + +## Release notes + +v0.1.0 +- ... diff --git a/docs/test-suite.md b/docs/test-suite.md deleted file mode 100644 index 96a6f6ad8..000000000 --- a/docs/test-suite.md +++ /dev/null @@ -1,361 +0,0 @@ -# EESSI test suite - -[toc] - -## Installation - -### Requirements - -The EESSI test suite requires Python >= 3.6 and [ReFrame](https://reframe-hpc.readthedocs.io). - -### Installing Reframe (incl. `hpctestlib`) - -You need to make sure that [ReFrame](https://reframe-hpc.readthedocs.io) is available - that is, the `reframe` command should work: - -```bash -reframe --version -``` - -General instructions for installing ReFrame are available in the [ReFrame documentation](https://reframe-hpc.readthedocs.io/en/stable/started.html). -The EESSI test suite requires ReFrame v4.0 or newer. - -#### `hpctestlib` ReFrame component - -The EESSI test suite requires the [`hpctestlib`](https://github.com/reframe-hpc/reframe/tree/develop/hpctestlib) component of ReFrame, -which is currently not included in a standard installation. - -We recommend installing ReFrame using [EasyBuild](https://easybuild.io/), -or using a ReFrame installation that is available in EESSI (pilot version 2023.06, or newer). - -For example: - -```bash -source /cvmfs/pilot.eessi-hpc.org/versions/2023.06/init/bash -module load ReFrame/4.2.0 -``` - -To check whether the `hpctestlib` component of ReFrame is available, -try importing the Python package: - -```bash -python3 -c 'import hpctestlib' -``` - -### Installing the EESSI test suite - -To install the EESSI test suite, you can either use `pip` or clone the GitHub repository directly: - -#### Using `pip` - -```bash -pip install git+https://github.com/EESSI/test-suite.git -``` - -#### Cloning the repository - -```bash -git clone https://github.com/EESSI/test-suite EESSI-test-suite -cd EESSI-test-suite -export PYTHONPATH=$PWD:$PYTHONPATH -``` - -#### Check installation - -To check whether the EESSI test suite installed correctly, -try importing the `eessi.testsuite` Python package: - -```bash -python3 -c 'import eessi.testsuite' -``` - -## Configuration - -Before you can run the EESSI test suite, you need to create a configuration file for ReFrame that is specific to the system on which the tests will be run. - -Example configuration files are available [in the `EESSI/test-suite` GitHub repository in the `config` subdirectory](https://github.com/EESSI/test-suite/tree/main/config), which you can use as a template to create your own. - -### Configuring ReFrame - -We recommend configuring ReFrame by setting a couple of `$RFM_*` environment variables, to avoid that you need to include particular options to the `reframe` command over and over again. - -#### ReFrame configuration file (`$RFM_CONFIG_FILES`) - -*(see also [ReFrame docs](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#envvar-RFM_CONFIG_FILES))* - -``` -export RFM_CONFIG_FILES=$HOME/EESSI-test-suite/config/example.py -``` - -#### Search path for tests (`$RFM_CHECK_SEARCH_PATH`) - -*(see also [ReFrame docs](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#envvar-RFM_CHECK_SEARCH_PATH))* - -``` -export RFM_CHECK_SEARCH_PATH=$HOME/EESSI-test-suite/eessi/testsuite/tests -export RFM_CHECK_SEARCH_RECURSIVE=1 -``` - -**FIXME** explain why recursive needs to be enabled - -### System configuration file - -**FIXME** see Vega as reference example? - -* partitions (incl. features, access, launcher, scheduler, name + partition (cfr. `--system`)) - -### Auto-detection of processor information - -You can let ReFrame [auto-detect the processor information](https://reframe-hpc.readthedocs.io/en/stable/configure.html#proc-autodetection) for your system. - -ReFrame will automatically use auto-detection if the `partitions` section -of you configuration file does not specify `processor` information for a -particular partition, and `remote_detect` is enabled. - -To trigger the auto-detection of processor information, it is sufficient to -let ReFrame list the available tests: - -``` -reframe --list -``` - -ReFrame will store the processor information for your system in `~/.reframe/topology/-/processor.json`. - -#### Note - -If you are using Slurm, you may need to temporarily change the launcher to `srun` in your configuration for auto-detection of processor information to work correctly. - -See the [example AWS configuration file](https://github.com/EESSI/test-suite/blob/main/config/aws_citc.py), and [ReFrame issue #2926](https://github.com/reframe-hpc/reframe/issues/2926) for more information. - -In addition, auto-detection does not work if ReFrame was installed directly -from PyPI, see [ReFrame issue #2914](https://github.com/reframe-hpc/reframe/issues/2914). -**FIXME** auto-detection also doesn't work with installation in EasyBuild/EESSI - -## Running tests - -### Listing available tests - -To list the tests that are available in the EESSI test suite, -use `reframe --list` (or `reframe -L` for short). - -If you have properly [configured ReFrame](#Configuring-ReFrame), you should -see a (potentially long) list of checks in the output: - -``` -$ reframe --list -... -[List of matched checks] -- ... -Found 1234 tests -``` - -**FIXME Kenneth** checks are only generated for available modules - -### Performing a dry-run - -To perform a dry run of the EESSI test suite, use `reframe --dry-run`: - -``` -$ reframe --dry-run -... -[==========] Running 1234 check(s) - -[----------] start processing checks -[ DRY ] GROMACS_EESSI ... -``` - -**FIXME Kenneth** explain why this can be useful, contrast with `--list` (which doesn't take into account partitions) - -### Running the (full) test suite - -To actually run the (full) EESSI test suite and let ReFrame -produce a performance report, use `reframe --run --performance-report`. - -We recommend filtering the tests that will be run however, [see below](#Filtering-tests). - -### ReFrame output and log files - -**FIXME** - -- `--prefix` to control where output goes -- -- relation with common logging setup -- ReFrame log, perf log, output dirs, staging dirs, ... - -### Filtering tests - -By default, ReFrame will automatically generate checks for each system partition, -based on the tests available in the EESSI test suite, available software modules, and tags defined in the EESSI test suite. - -To avoid being overwhelmed by checks, it is recommend to -[apply filters](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#test-filtering) so ReFrame only generates the checks you are interested in. - -#### Filtering by test name - -**FIXME** `--name` - -#### Filtering by system (partition) - -**FIXME** Cover both for specific system/partition - -By default, ReFrame will generate checks for each system partition -that is listed in your configuration file. - -To only let ReFrame checks for a particular system partition, -you can use the [`--system` option](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#cmdoption-system). - -For example, to let ReFrame only generate checks for the `part_one` partition -of the system named `example`, use: - -``` -reframe --system example:part_one ... -``` - -Use the `--dry-run` option to check the impact of this. - -#### Filtering by tags - -To filter tests using one or more tags, you can use the [`--tag` option](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#cmdoption-0). - -Using `--list-tags` you can get a list of known tags. - -To check the impact of this on generated checks by ReFrame, use `--list`. - -##### `CI` tag - -For each software that is supported by the test suite, -a small test is tagged with `CI` to indicate it can be used in a Continuous Integration (CI) environment. - -Hence, you can use this tag to let ReFrame only generate checks for small test cases: - -``` -reframe --tag CI -``` - -For example: - -``` -$ reframe --name GROMACS --tag CI -... -FIXME OUTPUT -``` - -##### `scale` tags - -The EESSI test suite defines a set of custom tags that control the *scale* -of tests, that is how many resources will be used for running it. - -| tag name | description | -|:--------:|-------------| -| `1_core` | using a single CPU core, or single GPU | -| `2_cores` | using 2 CPU cores, or 2 GPUs | -| `4_cores` | using 4 CPU cores, or 4 GPUs | -| `1_8_node` | using 1/8th of a node (12.5% of available cores/GPUs, 1 at minimum) | -| `1_4_node` | using a quarter of a node (25% of available cores/GPUs, 1 at minimum) | -| `1_2_node` | using half of a node (50% of available cores/GPUs, 1 at minimum) | -| `1_node` | using a full node (all available cores/GPUs) | -| `2_nodes` | using 2 full nodes | -| `4_nodes` | using 4 full nodes | -| `8_nodes` | using 8 full nodes | -| `16_nodes` | using 16 full nodes | - -##### Using multiple tags - -To filter tests using multiple tags, you can: - -* use `|` as separator to indicate that one of the specified tags must match (logical OR, for example `--tag='1_core|2_cores'`); -* use the `--tag` option multiple times to indicate that all specified tags must match (logical AND, for example `--tag CI --tag 1_core`); - -#### Filtering by modules - -**FIXME** This is not really filtering, but overriding default behaviour (see also https://github.com/EESSI/test-suite#changing-the-default-test-behavior-on-the-cmd-line), should use `--name` instead - add warning that this is advanced usage - -By default, ReFrame will generate checks for each available software module -that can be used to run a particular test (for example, all available GROMACS modules will be used once to run each GROMACS test). - -To only run the tests with specific modules, use the `--setvar modules=...` option. - -You can use the `--list` option to check the impact on checks that ReFrame generates. - -For example: - -``` -reframe --setvar modules=GROMACS/2021.3-foss-2021a --list -``` - -### Example commands - -#### Running all GROMACS tests on 4 cores - -``` -reframe --name GROMACS --tag 4_cores --run -``` - -#### Running all GROMACS tests using a specific GROMACS module - -``` -reframe --setvar modules=GROMACS/2021.3-foss-2021a --run -``` - -## Available tests - -The EESSI test suite currently includes tests for: - -* [GROMACS](#GROMACS) -* [TensorFlow](#TensorFlow) - -For a complete overview of all available tests in the EESSI test suite, see . - -### GROMACS - -using GROMACS test in ReFrame test library - -https://www.hecbiosim.ac.uk/access-hpc/benchmarks - -Example run: - - -### TensorFlow - -Example run: - -``` -[ReFrame Setup] - version: 4.2.0 - command: '/readonly/dodrio/apps/RHEL8/zen2-ib/software/ReFrame/4.2.0/bin/reframe --config-file config/vsc_hortense.py --checkpath eessi/testsuite/tests/apps/tensorflow --name TensorFlow/2.11 --tag 1_core --system hortense:cpu_rome_512gb --run --performance-report' - launched by: vsc46128@login55.dodrio.os - working directory: '/dodrio/scratch/projects/gadminforever/vsc46128/test-suite' - settings files: '', 'config/vsc_hortense.py' - check search path: '/dodrio/scratch/projects/gadminforever/vsc46128/test-suite/eessi/testsuite/tests/apps/tensorflow' - stage directory: '/dodrio/scratch/projects/gadminforever/vsc46128/test-suite/stage' - output directory: '/dodrio/scratch/projects/gadminforever/vsc46128/test-suite/output' - log files: '/dodrio/scratch/projects/gadminforever/vsc46128/test-suite/reframe_20230828_101237.log', '/dodrio/scratch/projects/gadminforever/vsc46128/test-suite/reframe_20230828_101237.out' - -[==========] Running 1 check(s) -[==========] Started on Mon Aug 28 10:12:38 2023 - -[----------] start processing checks -[ [32mRUN [0m ] TENSORFLOW_EESSI %scale=1_core %module_name=TensorFlow/2.11.0-foss-2022a %device_type=cpu /af8226d5 @hortense:cpu_rome_512gb+default -[ [32m OK[0m ] (1/1) TENSORFLOW_EESSI %scale=1_core %module_name=TensorFlow/2.11.0-foss-2022a %device_type=cpu /af8226d5 @hortense:cpu_rome_512gb+default -P: perf: 2770.757396498742 img/s (r:0, l:None, u:None) -[----------] all spawned checks have finished - -[ [32m PASSED [0m ] Ran 1/1 test case(s) from 1 check(s) (0 failure(s), 0 skipped, 0 aborted) -[==========] Finished on Mon Aug 28 10:16:32 2023 - -========================================================================================================================================================= -PERFORMANCE REPORT ---------------------------------------------------------------------------------------------------------------------------------------------------------- -[TENSORFLOW_EESSI %scale=1_core %module_name=TensorFlow/2.11.0-foss-2022a %device_type=cpu /af8226d5 @hortense:cpu_rome_512gb:default] - num_cpus_per_task: 1 - num_tasks_per_node: 1 - num_tasks: 1 - performance: - - perf: 2770.757396498742 img/s (r: 0 img/s l: -inf% u: +inf%) ---------------------------------------------------------------------------------------------------------------------------------------------------------- -Log file(s) saved in '/dodrio/scratch/projects/gadminforever/vsc46128/test-suite/reframe_20230828_101237.log', '/dodrio/scratch/projects/gadminforever/vsc46128/test-suite/reframe_20230828_101237.out' -``` - -## Release notes - -v0.1.0 -- ... From 48799cd9a17f514add35bde6eb8ec486bec2bf57 Mon Sep 17 00:00:00 2001 From: Kenneth Hoste Date: Thu, 21 Sep 2023 17:57:12 +0200 Subject: [PATCH 03/11] break up test suite docs into separate pages for installation + configuration, usage, and release notes --- docs/test-suite-full.md | 599 ------------------ docs/test-suite/index.md | 12 + docs/test-suite/installation-configuration.md | 497 +++++++++++++++ docs/test-suite/release-notes.md | 21 + docs/test-suite/usage.md | 307 +++++++++ mkdocs.yml | 5 + 6 files changed, 842 insertions(+), 599 deletions(-) delete mode 100644 docs/test-suite-full.md create mode 100644 docs/test-suite/index.md create mode 100644 docs/test-suite/installation-configuration.md create mode 100644 docs/test-suite/release-notes.md create mode 100644 docs/test-suite/usage.md diff --git a/docs/test-suite-full.md b/docs/test-suite-full.md deleted file mode 100644 index 023456f14..000000000 --- a/docs/test-suite-full.md +++ /dev/null @@ -1,599 +0,0 @@ -# EESSI test suite - -## Installation - -### Requirements - -The EESSI test suite requires Python >= 3.6 and [ReFrame](https://reframe-hpc.readthedocs.io). - -### Installing Reframe (incl. `hpctestlib`) - -You need to make sure that [ReFrame](https://reframe-hpc.readthedocs.io) is available - that is, the `reframe` command should work: - -```bash -reframe --version -``` - -General instructions for installing ReFrame are available in the [ReFrame documentation](https://reframe-hpc.readthedocs.io/en/stable/started.html). -The EESSI test suite requires ReFrame v4.3.3 (or newer). - -#### `hpctestlib` ReFrame component - -The EESSI test suite requires the [`hpctestlib`](https://github.com/reframe-hpc/reframe/tree/develop/hpctestlib) component of ReFrame, -which is currently not included in a standard installation of ReFrame. - -We recommend installing ReFrame using [EasyBuild](https://easybuild.io/) (version 4.8.1, or newer), -or using a ReFrame installation that is available in EESSI (pilot version 2023.06, or newer). - -For example (using EESSI): - -```bash -source /cvmfs/pilot.eessi-hpc.org/versions/2023.06/init/bash -module load ReFrame/4.2.0 -``` - -To check whether the `hpctestlib` component of ReFrame is available, -try importing the Python package: - -```bash -python3 -c 'import hpctestlib' -``` - -### Installing the EESSI test suite - -To install the EESSI test suite, you can either use `pip` or clone the GitHub repository directly: - -#### Using `pip` - -```bash -pip install git+https://github.com/EESSI/test-suite.git -``` - -#### Cloning the repository - -```bash -git clone https://github.com/EESSI/test-suite $HOME/EESSI-test-suite -cd EESSI-test-suite -export PYTHONPATH=$PWD:$PYTHONPATH -``` - -#### Check installation - -To check whether the EESSI test suite installed correctly, -try importing the `eessi.testsuite` Python package: - -```bash -python3 -c 'import eessi.testsuite' -``` - -## Configuration - -Before you can run the EESSI test suite, you need to create a configuration file for ReFrame that is specific to the system on which the tests will be run. - -Example configuration files are available [in the `EESSI/test-suite` GitHub repository in the `config` subdirectory](https://github.com/EESSI/test-suite/tree/main/config), which you can use as a template to create your own. - -### Configuring ReFrame environment variables - -We recommend setting a couple of `$RFM_*` environment variables to configure ReFrame, to avoid needing to include particular options to the `reframe` command over and over again. - -#### ReFrame configuration file (`$RFM_CONFIG_FILES`) - -*(see also [`RFM_CONFIG_FILES` in ReFrame docs](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#envvar-RFM_CONFIG_FILES))* - -Define `$RFM_CONFIG_FILES` to tell ReFrame which configuration file to use, for example: - -```bash -export RFM_CONFIG_FILES=$HOME/EESSI-test-suite/config/example.py -``` - -#### Search path for tests (`$RFM_CHECK_SEARCH_PATH`) - -*(see also [`RFM_CHECK_SEARCH_PATH` in ReFrame docs](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#envvar-RFM_CHECK_SEARCH_PATH))* - -Define `$RFM_CHECK_SEARCH_PATH` to tell ReFrame which directory to search for tests. In addition, define `$RFM_CHECK_SEARCH_RECURSIVE` to ensure that ReFrame searches `$RFM_CHECK_SEARCH_PATH` recursively (i.e. so that also tests in subdirectories are found). - -For example: - -```bash -export RFM_CHECK_SEARCH_PATH=$HOME/EESSI-test-suite/eessi/testsuite/tests -export RFM_CHECK_SEARCH_RECURSIVE=1 -``` - -#### ReFrame prefix (`$RFM_PREFIX`) - -*(see also [`RFM_PREFIX` in ReFrame docs](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#envvar-RFM_PREFIX))* - -Define `$RFM_PREFIX` to tell ReFrame where to store the files it produces. E.g. - -``` -export RFM_PREFIX=$HOME/reframe_runs -``` -This involves: - -* test output directories (which contain e.g. the job script, stderr and stdout for each of the test jobs) -* staging directories (unless otherwise specified by `staging`, see below); -* performance logs; - -If our common logging configuration ([see "Logging" below](#logging)) is used, the regular ReFrame log file will also end up in the location specified by `$RFM_PREFIX`. - -Note that the default is for ReFrame to use the current directory as prefix. We recommend setting a prefix so that logs are not scattered around and nicely appended for each run. - -### The ReFrame configuration file - -In order for ReFrame to run tests on your system, it needs to know some properties about your system. For example, it needs to know what kind of scheduler you have, which partitions the system has, how to submit to those partitions, etc. All of this has to be described in a *ReFrame configuration file* (see also the section on `$RFM_CONFIG_FILES` above). - -The [official ReFrame documentation](https://reframe-hpc.readthedocs.io/en/stable/configure.html) provides the full description on configuring ReFrame for your site. However, there are some configuration settings that are specifically required for the EESSI test suite. Also, there are a large amount of settings you can configure, which makes the official documentation potentially a bit overwhelming. - -Here, we will describe how to create a configuration file that works with the EESSI test suite, starting from an [example configuration file `settings_example.py`](https://github.com/EESSI/test-suite/tree/main/config/settings_example.py), which defines the most common configuration settings. You can look at other example configurations in the [config directory](https://github.com/EESSI/test-suite/tree/main/config/) for more inspiration. - -#### Python imports - -The EESSI test suite standardizes a few string-based values, as well as the logging format used by ReFrame. Every ReFrame configuration file used for running the EESSI test suite should therefore start with the following import statements: - -```python -from eessi.testsuite.common_config import common_logging_config -from eessi.testsuite.constants import * -``` - -#### High-level system info (`systems`) - -First, we describe the system at its highest level through the [`systems`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#systems) keyword. Note that you can define multiple systems in a single configuration file (`systems` is a Python list value). We recommend defining just a single system in each configuration file, as it makes the configuration file a bit easier to digest (for humans). - -An example of the `systems` section of the configuration file would be: - -```python -site_configuration = { - 'systems': [ - # We could list multiple systems. Here, we just define one - { - 'name': 'example', - 'descr': 'Example cluster', - 'modules_system': 'lmod', - 'hostnames': ['*'], - 'stagedir': f'/some/shared/dir/{os.environ.get("USER")}/reframe_output/staging', - 'partitions': [...], - } - ] -} -``` - -The most common configuration items defined at this level are: - -- [`name`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.name): The name of the system. Pick whatever makes sense for you. -- [`descr`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.descr): Description of the system. Again, pick whatever you like. -- [`modules_system`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.modules_system): The modules system used on your system. EESSI provides modules in `lmod` format (no need to change this, unless you want to run tests from the EESSI test suite with non-EESSI modules). -- [`hostnames`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.hostnames): The names of the hosts on which you will run the ReFrame command, as regular expression. Using these names, ReFrame can automatically determine which of the listed configurations in the `systems` list to use, which is useful if you're defining multiple systems in a single configuration file. If you follow our recommendation to limit yourself to one system per configuration file, simply define `'hostnames': ['*']`. -- [`prefix`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.prefix): Directory prefix for a ReFrame run on this system. Any directories or files produced by ReFrame will use this prefix, if not specified otherwise. We don't recommend setting `prefix`, but instead to set the environment variable `$RFM_PREFIX`, as our common logging configuration (see description below) can pick up on it. -- [`stagedir`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.stagedir): A shared directory that is available on all nodes that will execute ReFrame tests. This is used for storing (temporary) files related to the test. Typically, you want to set this to a path on a (shared) scratch filesystem. Defining this is optional: the default is a '`stage`' directory inside the `prefix` directory. -- [`partitions`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions): Details on system partitions, see below. - - - -#### System partitions (`systems.partitions`) - -The next step is to add the system partitions to the configuration files. This is again a Python list, as a system can have multiple partitions. - -The [`partitions`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions) section of the system config for a system with two Slurm partitions (one CPU partition, and one GPU partition) could for example look something like this: - -```python -site_configuration = { - 'systems': [ - { - ... - 'partitions': [ - { - 'name': 'cpu_partition', - 'descr': 'CPU partition' - 'scheduler': 'slurm', - 'prepare_cmds': ['source /cvmfs/pilot.eessi-hpc.org/latest/init/bash'], - 'launcher': 'mpirun', - 'access': ['-p cpu'], - 'environs': ['default'], - 'max_jobs': 4, - 'features': [FEATURES[CPU]], - }, - { - 'name': 'gpu_partition', - 'descr': 'GPU partition' - 'scheduler': 'slurm', - 'prepare_cmds': ['source /cvmfs/pilot.eessi-hpc.org/latest/init/bash'], - 'launcher': 'mpirun', - 'access': ['-p gpu'], - 'environs': ['default'], - 'max_jobs': 4, - 'resources': [ - { - 'name': '_rfm_gpu', - 'options': ['--gpus-per-node={num_gpus_per_node}'], - } - ], - 'devices': [ - { - 'type': DEVICE_TYPES[GPU], - 'num_devices': 4, - } - ], - 'features': [ - FEATURES[CPU], - FEATURES[GPU], - ], - 'extras': { - GPU_VENDOR: GPU_VENDORS[NVIDIA], - }, - }, - ] - } - ] -} -``` - -The most common configuration items defined at this level are: - -- [`name`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.name): The name of the partition. Pick anything you like. -- [`descr`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.descr): Description of the partition. Pick anything you like. -- [`scheduler`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.scheduler): The scheduler used to submit to this partition, for example `slurm`. All valid options can be found [in the ReFrame documentation](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.scheduler). -- [`launcher`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.launcher): The parallel launcher used on this partition, for example `mpirun` or `srun`. All valid options can be found [in the ReFrame documentation](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.launcher). -- [`access`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.access): A list of arguments that you would normally pass to the scheduler when submitting to this partition (for example '`-p cpu`' for submitting to a Slurm partition called `cpu`). If supported by your scheduler, we recommend to _not_ export the submission environment (for example by using '`--export=None`' with Slurm). This avoids test failures due to environment variables set in the submission environment that are passed down to submitted jobs. -- [`prepare_cmds`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.prepare_cmds): Commands to execute at the start of every job that runs a test. If your batch scheduler does not export the environment of the submit host, this is typically where you can initialize the EESSI environment. -- [`environs`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.environs): The names of the programming environments (to be defined later in the configuration file) that may be used on this partition. A programming environment is required for tests that are compiled first, before they can run. The EESSI test suite however only tests existing software installations, so no compilation (or specific programming environment) is needed. Simply specify `'environs': ['default']`, since ReFrame requires _a_ default environment to be defined. -- [`max_jobs`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.max_jobs): The maximum amount of jobs ReFrame is allowed to submit in parallel. Some batch systems limit how many jobs users are allowed to have in the queue. You can use this to make sure ReFrame doesn't exceed that limit. -- [`resources`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#custom-job-scheduler-resources) This field defines how additional resources can be requested in a batch job. Specifically, on a GPU partition, you have to define a resource with the name `_rfm_gpu`. The `options` field should then contain the argument to be passed to the batch scheduler in order to request a certain number of GPUs _per node_. This could be different for different batch schedulers. For example, for SLURM, one would specify: -```python= -'resources': [ - { - 'name': '_rfm_gpu', - 'options': ['--gpus-per-node={num_gpus_per_node}'], - } -], -``` -**FIXME Kenneth: check if resources description is clear** -- [`processor`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.processor): **FIXME Kenneth link to autodetection section** We recommend to NOT define this field, unless CPU autodetection (see below) is not working for you. The EESSI test suite relies on information about your processor topology to run. Using CPU autodetection is the easiest way to ensure that _all_ processor-related information needed by the EESSI test suite are defined. Only if CPU autodetection is failing for you do we advice you to set the [`processor` field](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.processor) in the partition configuration as an alternative. Although additional fields might be used by future EESSI tests, at this point you'll have to specify _at least_ the following fields: - ```python - 'processor': { - 'num_cpus': 64, # Total number of CPU cores in a node - 'num_sockets': 2, # Number of sockets in a node - 'num_cpus_per_socket': 32, # Number of CPU cores per socket - 'num_cpus_per_core': 1, # Number of hardware threads per CPU core - } - ``` -- [`features`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.features): The `features` field is used by the EESSI test suite to run tests _only_ on a partition if it supports a certain _feature_ (for example if GPUs are available). Feature names are standardized in the EESSI test suite in [`eessi.testsuite.constants.FEATURES`](https://github.com/EESSI/test-suite/blob/main/eessi/testsuite/constants.py) dictionary. - Typically, you want to define `features: [FEATURES[CPU]]` for CPU based partitions, and `features: [FEATURES[GPU]]` for GPU based partitions. The first tells the EESSI test suite that this partition can only run CPU-based tests, whereas second indicates that this partition can only run GPU-based tests. - You _can_ define a single partition to have _both_ the CPU and GPU features (since `features` is a Python list). However, since the CPU-based tests will not ask your batch scheduler for GPU resources, this _may_ fail on batch systems that force you to ask for at least one GPU on GPU-based nodes. Also, running CPU-only code on a GPU node is typically considered bad practice, thus testing its functionality is typically not relevant. -- [`devices`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.devices): This field specifies information on devices (for example) present in the partition. Device types are standardized in the EESSI test suite in the [`eessi.testsuite.constants.DEVICE_TYPES`](https://github.com/EESSI/test-suite/blob/main/eessi/testsuite/constants.py) dictionary. This is used by the EESSI test suite to determine how many of these devices it can/should use per node. - Typically, there is no need to define `devices` for CPU partitions. - For GPU partitions, you want to define something like: - ```python - 'devices': { - 'type': DEVICE_TYPES[GPU], - 'num_devices': 4, # or however many GPUs you have per node - } - ``` -- [`extras`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.extras): This field specifies extra information on the partition, such as the GPU vendor. Valid fields for `extras` are standardized as constants in [`eessi.testsuite.constants`](https://github.com/EESSI/test-suite/blob/main/eessi/testsuite/constants.py) (for example `GPU_VENDOR`). This is used by the EESSI test suite to decide if a partition can run a test that _specifically_ requires a certain brand of GPU. - Typically, there is no need to define `extras` for CPU partitions. - For GPU partitions, you typically want to specify the GPU vendor, for example: - ```python - 'extras': { - GPU_VENDOR: GPU_VENDORS[NVIDIA] - } - ``` - -Note that as more tests are added to the EESSI test suite, the use of `features`, `devices` and `extras` by the EESSI test suite may be extended, which may require an update of your configuration file to define newly recognized fields. - -!!! note - - Another thing to note is that ReFrame partitions are _virtual_ entities: they may or may not correspond to a partition as it is configured in your batch system. One might for example have a single partition in the batch system, but configure it as two separate partitions in the ReFrame configuration file based on additional constraints that are passed to the scheduler, see for example the [AWS CitC example configuration](https://github.com/EESSI/test-suite/blob/main/config/aws_citc.py). The EESSI test suite (and more generally: ReFrame) assumes the hardware _within_ a partition defined in the ReFrame configuration file is _homogeneous_. - -#### Environments - -ReFrame needs a programming environment to be defined in its configuration file for tests that need to be compiled before they are run. While we don't have such tests in the EESSI test suite, ReFrame requires _some_ programming environment to be defined: - -```python -site_configuration = { - ... - 'environments': [ - { - 'name': 'default', # Note: needs to match whatever we set for 'environs' in the partition - 'cc': 'cc', - 'cxx': '', - 'ftn': '', - } - ] -} -``` -Note that the `name` here needs to match whatever we specified for the `environs` property of the partitions. - -#### Logging - -ReFrame allows a large degree of control over what gets logged, and where. For convenience, we have created a common logging configuration in `eessi.testsuite.common_config` that provides a reasonable default. It can be used by defining: -```python -site_configuration = { - ... - 'logging': common_logging_config(), -} -``` -If combined by setting `RFM_PREFIX`, the output, performance log, and regular ReFrame logs all end up in the directory specified by `RFM_PREFIX`. This is the setup we would recommend. - -Alternatively, a prefix can be passed to `common_logging_config(prefix)` which will control where the regular ReFrame log ends up. Note that the performance logs do not respect that argument: they will still end up in the standard ReFrame prefix (by default the current directory, unless otherwise set with `RFM_prefix` or `--prefix`). - -#### Auto-detection of processor information - -You can let ReFrame [auto-detect the processor information](https://reframe-hpc.readthedocs.io/en/stable/configure.html#proc-autodetection) for your system. - -ReFrame will automatically use auto-detection if two conditions are true: -1. the `partitions` section of you configuration file does not specify `processor` information for a particular partition (as per our recommendation in the previous section) -2. `remote_detect` is enabled in the `general` configuration - -To enable `remote_detect` in the `general` part of the configation file: -```python -site_configuration = { - ... - 'general': [ - { - 'remote_detect': True - } - ] -} -``` - -To trigger the auto-detection of processor information, it is sufficient to -let ReFrame list the available tests: - -``` -reframe --list -``` - -ReFrame will store the processor information for your system in `~/.reframe/topology/-/processor.json`. - -##### Note - -Two important bugs were resolved in ReFrame's CPU autodetect functionality [in version 4.3.3](https://github.com/reframe-hpc/reframe/pull/2978). _We strongly recommend you use `ReFrame >= 4.3.3`_. - -If you are using `ReFrame < 4.3.3`, you may encounter two issues: -1. ReFrame will try to use the parallel launcher command configured for each partition (e.g. `mpirun`) when doing the remote autodetect. If there is no system-version of `mpirun` available, that will fail. See [ReFrame issue #2926](https://github.com/reframe-hpc/reframe/issues/2926). -2. CPU autodetection only worked when using a clone of the ReFrame repository, _not_ when it was installed with `pip` or `EasyBuild` (as is also the case for the ReFrame shipped with EESSI). See [ReFrame issue #2914](https://github.com/reframe-hpc/reframe/issues/2914). - - -## Running tests - -### Listing available tests - -To list the tests that are available in the EESSI test suite, -use `reframe --list` (or `reframe -L` for short). - -If you have properly [configured ReFrame](#Configuring-ReFrame), you should -see a (potentially long) list of checks in the output: - -``` -$ reframe --list -... -[List of matched checks] -- ... -Found 1234 tests -``` - -**FIXME Kenneth** checks are only generated for available modules - -### Performing a dry-run - -To perform a dry run of the EESSI test suite, use `reframe --dry-run`: - -``` -$ reframe --dry-run -... -[==========] Running 1234 check(s) - -[----------] start processing checks -[ DRY ] GROMACS_EESSI ... -``` - -**FIXME Kenneth** explain why this can be useful, contrast with `--list` (which doesn't take into account partitions) - -### Running the (full) test suite - -To actually run the (full) EESSI test suite and let ReFrame -produce a performance report, use `reframe --run --performance-report`. - -We recommend filtering the tests that will be run however, [see below](#Filtering-tests). - -### ReFrame output and log files - -**FIXME** - -- `--prefix` to control where output goes -- relation with common logging setup -- ReFrame log, perf log, output dirs, staging dirs, ... -- example of output files and where they can be found - - -### Filtering tests - -By default, ReFrame will automatically generate checks for each system partition, -based on the tests available in the EESSI test suite, available software modules, and tags defined in the EESSI test suite. - -To avoid being overwhelmed by checks, it is recommend to -[apply filters](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#test-filtering) so ReFrame only generates the checks you are interested in. - -#### Filtering by test name - -**FIXME** `--name` - -#### Filtering by system (partition) - -**FIXME** Cover both for specific system/partition - -By default, ReFrame will generate checks for each system partition -that is listed in your configuration file. - -To only let ReFrame checks for a particular system partition, -you can use the [`--system` option](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#cmdoption-system). - -For example, to let ReFrame only generate checks for the `part_one` partition -of the system named `example`, use: - -``` -reframe --system example:part_one ... -``` - -Use the `--dry-run` option to check the impact of this. - -#### Filtering by tags - -To filter tests using one or more tags, you can use the [`--tag` option](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#cmdoption-0). - -Using `--list-tags` you can get a list of known tags. - -To check the impact of this on generated checks by ReFrame, use `--list`. - -##### `CI` tag - -For each software that is supported by the test suite, -a small test is tagged with `CI` to indicate it can be used in a Continuous Integration (CI) environment. - -Hence, you can use this tag to let ReFrame only generate checks for small test cases: - -``` -reframe --tag CI -``` - -For example: - -``` -$ reframe --name GROMACS --tag CI -... -FIXME OUTPUT -``` - -##### `scale` tags - -The EESSI test suite defines a set of custom tags that control the *scale* -of tests, that is how many resources will be used for running it. - -| tag name | description | -|:--------:|-------------| -| `1_core` | using a single CPU core, or single GPU | -| `2_cores` | using 2 CPU cores, or 2 GPUs | -| `4_cores` | using 4 CPU cores, or 4 GPUs | -| `1_8_node` | using 1/8th of a node (12.5% of available cores/GPUs, 1 at minimum) | -| `1_4_node` | using a quarter of a node (25% of available cores/GPUs, 1 at minimum) | -| `1_2_node` | using half of a node (50% of available cores/GPUs, 1 at minimum) | -| `1_node` | using a full node (all available cores/GPUs) | -| `2_nodes` | using 2 full nodes | -| `4_nodes` | using 4 full nodes | -| `8_nodes` | using 8 full nodes | -| `16_nodes` | using 16 full nodes | - -##### Using multiple tags - -To filter tests using multiple tags, you can: - -* use `|` as separator to indicate that one of the specified tags must match (logical OR, for example `--tag='1_core|2_cores'`); -* use the `--tag` option multiple times to indicate that all specified tags must match (logical AND, for example `--tag CI --tag 1_core`); - -#### Filtering by modules - -**FIXME** This is not really filtering, but overriding default behaviour (see also https://github.com/EESSI/test-suite#changing-the-default-test-behavior-on-the-cmd-line), should use `--name` instead - add warning that this is advanced usage - -By default, ReFrame will generate checks for each available software module -that can be used to run a particular test (for example, all available GROMACS modules will be used once to run each GROMACS test). - -To only run the tests with specific modules, use the `--setvar modules=...` option. - -You can use the `--list` option to check the impact on checks that ReFrame generates. - -For example: - -``` -reframe --setvar modules=GROMACS/2021.3-foss-2021a --list -``` - -### Overriding test parameters (ADVANCED) - -- use of `--setvar` -- recommend to only do this for specific tests, like `--setvar GROMACS_EESSI.modules=GROMACS/2021.6-foss-2022a` - -### Example commands - -#### Running all GROMACS tests on 4 cores - -``` -reframe --name GROMACS --tag 4_cores --run --performance-report -``` - -**FIXME** explain options being used - -#### Running all GROMACS tests using a specific GROMACS module - -``` -reframe --setvar modules=GROMACS/2021.3-foss-2021a --run -``` - -**FIXME use `--name` to filter** - -#### Re-running a specific test (using hash) - -**FIXME** - -## Available tests - -The EESSI test suite currently includes tests for: - -* [GROMACS](#GROMACS) -* [TensorFlow](#TensorFlow) - -For a complete overview of all available tests in the EESSI test suite, see . - -### GROMACS - -using GROMACS test in ReFrame test library - -https://www.hecbiosim.ac.uk/access-hpc/benchmarks - - -### TensorFlow - -- minimal TensorFlow version -- info on workload being run - -## Example run - -``` -[ReFrame Setup] - version: 4.2.0 - command: '/readonly/dodrio/apps/RHEL8/zen2-ib/software/ReFrame/4.2.0/bin/reframe --config-file config/vsc_hortense.py --checkpath eessi/testsuite/tests/apps/tensorflow --name TensorFlow/2.11 --tag 1_core --system hortense:cpu_rome_512gb --run --performance-report' - launched by: vsc46128@login55.dodrio.os - working directory: '/dodrio/scratch/projects/gadminforever/vsc46128/test-suite' - settings files: '', 'config/vsc_hortense.py' - check search path: '/dodrio/scratch/projects/gadminforever/vsc46128/test-suite/eessi/testsuite/tests/apps/tensorflow' - stage directory: '/dodrio/scratch/projects/gadminforever/vsc46128/test-suite/stage' - output directory: '/dodrio/scratch/projects/gadminforever/vsc46128/test-suite/output' - log files: '/dodrio/scratch/projects/gadminforever/vsc46128/test-suite/reframe_20230828_101237.log', '/dodrio/scratch/projects/gadminforever/vsc46128/test-suite/reframe_20230828_101237.out' - -[==========] Running 1 check(s) -[==========] Started on Mon Aug 28 10:12:38 2023 - -[----------] start processing checks -[ RUN  ] TENSORFLOW_EESSI %scale=1_core %module_name=TensorFlow/2.11.0-foss-2022a %device_type=cpu /af8226d5 @hortense:cpu_rome_512gb+default -[  OK ] (1/1) TENSORFLOW_EESSI %scale=1_core %module_name=TensorFlow/2.11.0-foss-2022a %device_type=cpu /af8226d5 @hortense:cpu_rome_512gb+default -P: perf: 2770.757396498742 img/s (r:0, l:None, u:None) -[----------] all spawned checks have finished - -[  PASSED  ] Ran 1/1 test case(s) from 1 check(s) (0 failure(s), 0 skipped, 0 aborted) -[==========] Finished on Mon Aug 28 10:16:32 2023 - -========================================================================================================================================================= -PERFORMANCE REPORT ---------------------------------------------------------------------------------------------------------------------------------------------------------- -[TENSORFLOW_EESSI %scale=1_core %module_name=TensorFlow/2.11.0-foss-2022a %device_type=cpu /af8226d5 @hortense:cpu_rome_512gb:default] - num_cpus_per_task: 1 - num_tasks_per_node: 1 - num_tasks: 1 - performance: - - perf: 2770.757396498742 img/s (r: 0 img/s l: -inf% u: +inf%) ---------------------------------------------------------------------------------------------------------------------------------------------------------- -Log file(s) saved in '/dodrio/scratch/projects/gadminforever/vsc46128/test-suite/reframe_20230828_101237.log', '/dodrio/scratch/projects/gadminforever/vsc46128/test-suite/reframe_20230828_101237.out' -``` - -## Release notes - -v0.1.0 -- ... diff --git a/docs/test-suite/index.md b/docs/test-suite/index.md new file mode 100644 index 000000000..be67c25d8 --- /dev/null +++ b/docs/test-suite/index.md @@ -0,0 +1,12 @@ +# EESSI test suite + +The [EESSI test suite](https://github.com/EESSI/test-suite) is a collection of tests that are run using +[ReFrame](https://reframe-hpc.readthedocs.io/). +It is used to check whether the software installations included in the [EESSI software layer](../software_layer) +are working and performing as expected. + +To get started, you should look into the [installation and configuration guidelines](installation-configuration.md) first. + +For more information on using the EESSI test suite, see [here](usage.md). + +See also [release notes for the EESSI test suite](release-notes.md). diff --git a/docs/test-suite/installation-configuration.md b/docs/test-suite/installation-configuration.md new file mode 100644 index 000000000..78e58929c --- /dev/null +++ b/docs/test-suite/installation-configuration.md @@ -0,0 +1,497 @@ +# Installing and configuring the EESSI test suite + +This page covers the installation and configuration of the [EESSI test suite](https://github.com/EESSI/test-suite). + +For information on *using* the test suite, see [here](usage.md). + + +## Installation { #installation } + +### Requirements { #requirements } + +The EESSI test suite requires Python >= 3.6 and [ReFrame](https://reframe-hpc.readthedocs.io) v4.3.3 (or newer). + +??? note "(for more details on the ReFrame version requirement, click here)" + + Two important bugs were resolved in ReFrame's CPU autodetect functionality [in version 4.3.3](https://github.com/reframe-hpc/reframe/pull/2978). + + _We strongly recommend you use `ReFrame >= 4.3.3`_. + + If you are using an older version of ReFrame, you may encounter some issues: + + * ReFrame will try to use the parallel launcher command configured for each partition (e.g. `mpirun`) when doing + the remote autodetect. If there is no system-version of `mpirun` available, that will fail + (see [ReFrame issue #2926](https://github.com/reframe-hpc/reframe/issues/2926)). + * CPU autodetection only worked when using a clone of the ReFrame repository, _not_ when it was installed + with `pip` or `EasyBuild` (as is also the case for the ReFrame shipped with EESSI) + (see [ReFrame issue #2914](https://github.com/reframe-hpc/reframe/issues/2914)). + + +### Installing Reframe (incl. test library) + +You need to make sure that [ReFrame](https://reframe-hpc.readthedocs.io) is available - that is, the `reframe` command should work: + +```bash +reframe --version +``` + +General instructions for installing ReFrame are available in the [ReFrame documentation](https://reframe-hpc.readthedocs.io/en/stable/started.html). + +#### ReFrame test library (`hpctestlib`) + +The EESSI test suite requires that the [ReFrame test library (`hpctestlib`)](https://reframe-hpc.readthedocs.io/en/stable/hpctestlib.html) +is available, which is currently not included in a standard installation of ReFrame. + +We recommend installing ReFrame using [EasyBuild](https://easybuild.io/) (version 4.8.1, or newer), +or using a ReFrame installation that is available in EESSI (pilot version 2023.06, or newer). + +For example (using EESSI): + +```bash +source /cvmfs/pilot.eessi-hpc.org/versions/2023.06/init/bash +module load ReFrame/4.2.0 +``` + +To check whether the ReFrame test library is available, try importing a submodule of the `hpctestlib` Python package: + +```bash +python3 -c 'import hpctestlib.sciapps.gromacs' +``` + +### Installing the EESSI test suite + +To install the EESSI test suite, you can either use `pip` or clone the GitHub repository directly: + +#### Using `pip` { #pip-install } + +```bash +pip install git+https://github.com/EESSI/test-suite.git +``` + +#### Cloning the repository + +```bash +git clone https://github.com/EESSI/test-suite $HOME/EESSI-test-suite +cd EESSI-test-suite +export PYTHONPATH=$PWD:$PYTHONPATH +``` + +#### Verify installation + +To check whether the EESSI test suite installed correctly, +try importing the `eessi.testsuite` Python package: + +```bash +python3 -c 'import eessi.testsuite' +``` + + +## Configuration + +Before you can run the EESSI test suite, you need to create a configuration file for ReFrame that is specific to the system on which the tests will be run. + +Example configuration files are available [n the `config` subdirectory of the `EESSI/test-suite` GitHub repository](https://github.com/EESSI/test-suite/tree/main/config), +which you can use as a template to create your own. + +### Configuring ReFrame environment variables + +We recommend setting a couple of `$RFM_*` environment variables to configure ReFrame, to avoid needing to include particular options to the `reframe` command over and over again. + +#### ReFrame configuration file (`$RFM_CONFIG_FILES`) { #RFM_CONFIG_FILES } + +*(see also [`RFM_CONFIG_FILES` in ReFrame docs](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#envvar-RFM_CONFIG_FILES))* + +Define the `$RFM_CONFIG_FILES` environment variable to instruct ReFrame which configuration file to use, for example: + +```bash +export RFM_CONFIG_FILES=$HOME/EESSI-test-suite/config/example.py +``` + +Alternatively, you can use the `--config-file` (or `-C`) `reframe` option. + +See the [section on the ReFrame configuration file](#reframe-config-file) below for more information. + +#### Search path for tests (`$RFM_CHECK_SEARCH_PATH`) + +*(see also [`RFM_CHECK_SEARCH_PATH` in ReFrame docs](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#envvar-RFM_CHECK_SEARCH_PATH))* + +Define the `$RFM_CHECK_SEARCH_PATH` environment variable to tell ReFrame which directory to search for tests. + +In addition, define `$RFM_CHECK_SEARCH_RECURSIVE` to ensure that ReFrame searches `$RFM_CHECK_SEARCH_PATH` recursively +(i.e. so that also tests in subdirectories are found). + +For example: + +```bash +export RFM_CHECK_SEARCH_PATH=$HOME/EESSI-test-suite/eessi/testsuite/tests +export RFM_CHECK_SEARCH_RECURSIVE=1 +``` + +Alternatively, you can use the `--checkpath` (or `-c`) and `--recursive` (or `-R`) `reframe` options. + +#### ReFrame prefix (`$RFM_PREFIX`) { #RFM_PREFIX } + +*(see also [`RFM_PREFIX` in ReFrame docs](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#envvar-RFM_PREFIX))* + +Define the `$RFM_PREFIX` environment variable to tell ReFrame where to store the files it produces. E.g. + +``` +export RFM_PREFIX=$HOME/reframe_runs +``` + +This involves: + +* test output directories (which contain e.g. the job script, stderr and stdout for each of the test jobs) +* staging directories (unless otherwise specified by `staging`, see below); +* performance logs; + +Note that the default is for ReFrame to use the current directory as prefix. +We recommend setting a prefix so that logs are not scattered around and nicely appended for each run. + +If our [common logging configuration](#logging) is used, the regular ReFrame log file will +also end up in the location specified by `$RFM_PREFIX`. + +!!! warning + + Using the `--prefix` option in your `reframe` command is *not* equivalent to setting `$RFM_PREFIX`, + since our [common logging configuration](https://github.com/EESSI/test-suite/blob/main/eessi/testsuite/common_config.py) + only picks up on the `$RFM_PREFIX` environment variable to determine the location for the ReFrame log file. + +### ReFrame configuration file { #reframe-config-file } + +In order for ReFrame to run tests on your system, it needs to know some properties about your system. +For example, it needs to know what kind of job scheduler you have, which partitions the system has, +how to submit to those partitions, etc. +All of this has to be described in a *ReFrame configuration file* (see also the [section on `$RFM_CONFIG_FILES` above](#RFM_CONFIG_FILES)). + +The [official ReFrame documentation](https://reframe-hpc.readthedocs.io/en/stable/configure.html) provides the full +description on configuring ReFrame for your site. However, there are some configuration settings that are specifically +required for the EESSI test suite. Also, there are a large amount of configuration settings available in ReFrame, +which makes the official documentation potentially a bit overwhelming. + +Here, we will describe how to create a configuration file that works with the EESSI test suite, starting from an +[example configuration file `settings_example.py`](https://github.com/EESSI/test-suite/tree/main/config/settings_example.py), +which defines the most common configuration settings. + +You can look at other example configurations in the [config directory](https://github.com/EESSI/test-suite/tree/main/config/) for more inspiration. + +#### Python imports + +The EESSI test suite standardizes a few string-based values as constants, as well as the logging format used by ReFrame. +Every ReFrame configuration file used for running the EESSI test suite should therefore start with the following import statements: + +```python +from eessi.testsuite.common_config import common_logging_config +from eessi.testsuite.constants import * +``` + +#### High-level system info (`systems`) + +First, we describe the system at its highest level through the +[`systems`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#systems) keyword. + +You can define multiple systems in a single configuration file (`systems` is a Python list value). +We recommend defining just a single system in each configuration file, as it makes the configuration file a bit easier to digest (for humans). + +An example of the `systems` section of the configuration file would be: + +```python +site_configuration = { + 'systems': [ + # We could list multiple systems. Here, we just define one + { + 'name': 'example', + 'descr': 'Example cluster', + 'modules_system': 'lmod', + 'hostnames': ['*'], + 'stagedir': f'/some/shared/dir/{os.environ.get("USER")}/reframe_output/staging', + 'partitions': [...], + } + ] +} +``` + +The most common configuration items defined at this level are: + +- [`name`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.name): + The name of the system. Pick whatever makes sense for you. +- [`descr`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.descr): + Description of the system. Again, pick whatever you like. +- [`modules_system`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.modules_system): + The modules system used on your system. EESSI provides modules in `lmod` format. There is no need to change this, + unless you want to run tests from the EESSI test suite with non-EESSI modules. +- [`hostnames`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.hostnames): + The names of the hosts on which you will run the ReFrame command, as regular expression. Using these names, + ReFrame can automatically determine which of the listed configurations in the `systems` list to use, which is useful + if you're defining multiple systems in a single configuration file. If you follow our recommendation to limit + yourself to one system per configuration file, simply define `'hostnames': ['*']`. +- [`prefix`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.prefix): + Prefix directory for a ReFrame run on this system. Any directories or files produced by ReFrame will use this prefix, + if not specified otherwise. + We recommend setting the `$RFM_PREFIX` environment variable rather than specifying `prefix` in + your configuration file, so our [common logging configuration](#logging) can pick up on it + (see also [`$RFM_PREFIX`](#RFM_PREFIX)). +- [`stagedir`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.stagedir): A shared directory that is available on all nodes that will execute ReFrame tests. This is used for storing (temporary) files related to the test. Typically, you want to set this to a path on a (shared) scratch filesystem. Defining this is optional: the default is a '`stage`' directory inside the `prefix` directory. +- [`partitions`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions): Details on system partitions, see below. + + + +#### System partitions (`systems.partitions`) { #partitions } + +The next step is to add the system partitions to the configuration files, +which is also specified as a Python list since a system can have multiple partitions. + +The [`partitions`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions) +section of the configuration for a system with two [Slurm](https://slurm.schedmd.com/) partitions (one CPU partition, +and one GPU partition) could for example look something like this: + +```python +site_configuration = { + 'systems': [ + { + ... + 'partitions': [ + { + 'name': 'cpu_partition', + 'descr': 'CPU partition' + 'scheduler': 'slurm', + 'prepare_cmds': ['source /cvmfs/pilot.eessi-hpc.org/latest/init/bash'], + 'launcher': 'mpirun', + 'access': ['-p cpu'], + 'environs': ['default'], + 'max_jobs': 4, + 'features': [FEATURES[CPU]], + }, + { + 'name': 'gpu_partition', + 'descr': 'GPU partition' + 'scheduler': 'slurm', + 'prepare_cmds': ['source /cvmfs/pilot.eessi-hpc.org/latest/init/bash'], + 'launcher': 'mpirun', + 'access': ['-p gpu'], + 'environs': ['default'], + 'max_jobs': 4, + 'resources': [ + { + 'name': '_rfm_gpu', + 'options': ['--gpus-per-node={num_gpus_per_node}'], + } + ], + 'devices': [ + { + 'type': DEVICE_TYPES[GPU], + 'num_devices': 4, + } + ], + 'features': [ + FEATURES[CPU], + FEATURES[GPU], + ], + 'extras': { + GPU_VENDOR: GPU_VENDORS[NVIDIA], + }, + }, + ] + } + ] +} +``` + +The most common configuration items defined at this level are: + +- [`name`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.name): + The name of the partition. Pick anything you like. +- [`descr`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.descr): + Description of the partition. Again, pick whatever you like. +- [`scheduler`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.scheduler): + The scheduler used to submit to this partition, for example `slurm`. All valid options can be found + [in the ReFrame documentation](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.scheduler). +- [`launcher`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.launcher): + The parallel launcher used on this partition, for example `mpirun` or `srun`. All valid options can be found + [in the ReFrame documentation](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.launcher). +- [`access`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.access): + A list of arguments that you would normally pass to the scheduler when submitting to this partition + (for example '`-p cpu`' for submitting to a Slurm partition called `cpu`). + If supported by your scheduler, we recommend to _not_ export the submission environment + (for example by using '`--export=None`' with Slurm). This avoids test failures due to environment variables set + in the submission environment that are passed down to submitted jobs. +- [`prepare_cmds`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.prepare_cmds): + Commands to execute at the start of every job that runs a test. If your batch scheduler does not export + the environment of the submit host, this is typically where you can initialize the EESSI environment. +- [`environs`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.environs): + The names of the *programming environments* (to be defined later in the configuration file via [`environments`](#environments)) + that may be used on this partition. A programming environment is required for tests that are compiled first, + before they can run. The EESSI test suite however only tests existing software installations, so no compilation + (or specific programming environment) is needed. Simply specify `'environs': ['default']`, + since ReFrame requires that *a* default environment is defined. +- [`max_jobs`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.max_jobs): + The maximum amount of jobs ReFrame is allowed to submit in parallel. Some batch systems limit how many jobs users + are allowed to have in the queue. You can use this to make sure ReFrame doesn't exceed that limit. +- [`resources`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#custom-job-scheduler-resources): + This field defines how additional resources can be requested in a batch job. Specifically, on a GPU partition, + you have to define a resource with the name '`_rfm_gpu`'. The `options` field should then contain the argument to be + passed to the batch scheduler in order to request a certain number of GPUs _per node_, which could be different for + different batch schedulers. For example, when using Slurm you would specify: + ```python + 'resources': [ + { + 'name': '_rfm_gpu', + 'options': ['--gpus-per-node={num_gpus_per_node}'], + }, + ], + ``` +- [`processor`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.processor): + We recommend to *NOT* define this field, unless [CPU autodetection](#cpu-auto-detection) is not working for you. + The EESSI test suite relies on information about your processor topology to run. Using CPU autodetection is the + easiest way to ensure that _all_ processor-related information needed by the EESSI test suite are defined. + Only if CPU autodetection is failing for you do we advice you to set the `processor` in the partition configuration + as an alternative. Although additional fields might be used by future EESSI tests, at this point you'll have to + specify _at least_ the following fields: + ```python + 'processor': { + 'num_cpus': 64, # Total number of CPU cores in a node + 'num_sockets': 2, # Number of sockets in a node + 'num_cpus_per_socket': 32, # Number of CPU cores per socket + 'num_cpus_per_core': 1, # Number of hardware threads per CPU core + } + ``` +- [`features`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.features): + The `features` field is used by the EESSI test suite to run tests _only_ on a partition if it supports a certain + _feature_ (for example if GPUs are available). Feature names are standardized in the EESSI test suite in + [`eessi.testsuite.constants.FEATURES`](https://github.com/EESSI/test-suite/blob/main/eessi/testsuite/constants.py) + dictionary. + Typically, you want to define `features: [FEATURES[CPU]]` for CPU based partitions, and `features: [FEATURES[GPU]]` + for GPU based partitions. The first tells the EESSI test suite that this partition can only run CPU-based tests, + whereas second indicates that this partition can only run GPU-based tests. + You _can_ define a single partition to have _both_ the CPU and GPU features (since `features` is a Python list). + However, since the CPU-based tests will not ask your batch scheduler for GPU resources, this _may_ fail on batch + systems that force you to ask for at least one GPU on GPU-based nodes. Also, running CPU-only code on a GPU node is + typically considered bad practice, thus testing its functionality is typically not relevant. +- [`devices`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.devices): This field specifies information on devices (for example) present in the partition. Device types are standardized in the EESSI test suite in the [`eessi.testsuite.constants.DEVICE_TYPES`](https://github.com/EESSI/test-suite/blob/main/eessi/testsuite/constants.py) dictionary. This is used by the EESSI test suite to determine how many of these devices it can/should use per node. + Typically, there is no need to define `devices` for CPU partitions. + For GPU partitions, you want to define something like: + ```python + 'devices': { + 'type': DEVICE_TYPES[GPU], + 'num_devices': 4, # or however many GPUs you have per node + } + ``` +- [`extras`](https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.systems.partitions.extras): This field specifies extra information on the partition, such as the GPU vendor. Valid fields for `extras` are standardized as constants in [`eessi.testsuite.constants`](https://github.com/EESSI/test-suite/blob/main/eessi/testsuite/constants.py) (for example `GPU_VENDOR`). This is used by the EESSI test suite to decide if a partition can run a test that _specifically_ requires a certain brand of GPU. + Typically, there is no need to define `extras` for CPU partitions. + For GPU partitions, you typically want to specify the GPU vendor, for example: + ```python + 'extras': { + GPU_VENDOR: GPU_VENDORS[NVIDIA] + } + ``` + +Note that as more tests are added to the EESSI test suite, the use of `features`, `devices` and `extras` by the EESSI test suite may be extended, which may require an update of your configuration file to define newly recognized fields. + +!!! note + + Keep in mind that ReFrame partitions are _virtual_ entities: they may or may not correspond to a partition as it is + configured in your batch system. One might for example have a single partition in the batch system, but configure + it as two separate partitions in the ReFrame configuration file based on additional constraints that are passed to + the scheduler, see for example the [AWS CitC example configuration](https://github.com/EESSI/test-suite/blob/main/config/aws_citc.py). + + The EESSI test suite (and more generally, ReFrame) assumes the hardware _within_ a partition defined in the ReFrame configuration file is _homogeneous_. + +#### Environments { #environments } + +ReFrame needs a programming environment to be defined in its configuration file for tests that need to be compiled before they are run. While we don't have such tests in the EESSI test suite, ReFrame requires _some_ programming environment to be defined: + +```python +site_configuration = { + ... + 'environments': [ + { + 'name': 'default', # Note: needs to match whatever we set for 'environs' in the partition + 'cc': 'cc', + 'cxx': '', + 'ftn': '', + } + ] +} +``` + +!!! note + + The `name` here needs to match whatever we specified for [the `environs` property of the partitions](#partitions). + +#### Logging + +ReFrame allows a large degree of control over what gets logged, and where. For convenience, we have created a common logging +configuration in [`eessi.testsuite.common_config`](https://github.com/EESSI/test-suite/blob/main/eessi/testsuite/common_config.py) +that provides a reasonable default. It can be used by importing `common_logging_config` and calling it as a function +to define the '`logging` setting: +```python +from eessi.testsuite.common_config import common_logging_config + +site_configuration = { + ... + 'logging': common_logging_config(), +} +``` +When combined by setting the [`$RFM_PREFIX` environment variable](#RFM_PREFIX), the output, performance log, and +regular ReFrame logs will all end up in the directory specified by `$RFM_PREFIX`, which we recommend doing. + +Alternatively, a prefix can be passed as an argument like `common_logging_config(prefix)`, which will control where +the regular ReFrame log ends up. Note that the performance logs do *not* respect this prefix: they will still end up +in the standard ReFrame prefix (by default the current directory, unless otherwise set with `$RFM_PREFIX` or `--prefix`). + +#### Auto-detection of processor information { #cpu-auto-detection } + +You can let ReFrame [auto-detect the processor information](https://reframe-hpc.readthedocs.io/en/stable/configure.html#proc-autodetection) for your system. + +ReFrame will automatically use auto-detection when two conditions are met: + +1. The [`partitions` section of you configuration file](#partitions) does *not* specify `processor` information for a + particular partition (as per our recommendation [in the previous section](#partitions)); +2. The `remote_detect` option is enabled in the `general` part of the configuration, as follows: + ```python + site_configuration = { + 'systems': ... + 'logging': ... + 'general': [ + { + 'remote_detect': True, + } + ] + } + ``` + +To trigger the auto-detection of processor information, it is sufficient to +let ReFrame list the available tests: + +``` +reframe --list +``` + +ReFrame will store the processor information for your system in `~/.reframe/topology/-/processor.json`. + +### Verifying your ReFrame configuration + +To verify the ReFrame configuration, you can [query the configuration using `--show-config`](https://reframe-hpc.readthedocs.io/en/stable/configure.html#querying-configuration-options). + +To see the full configuration, use: + +```bash +reframe --show-config +``` + +To only show the configuration of a particular system partition, you can use the [`--system` option](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#cmdoption-system). +To query a specific setting, you can pass an argument to `--show-config`. + +For example, to show the configuration of the `gpu` partition of the `example` system: + +```bash +reframe --system example:gpu --show-config systems/0/partitions +``` + +You can drill it down further to only show the value of a particular configuration setting. + +For example, to only show the `launcher` value for the `gpu` partition of the `example` system: + +```bash +reframe --system example:gpu --show-config systems/0/partitions/@gpu/launcher +``` diff --git a/docs/test-suite/release-notes.md b/docs/test-suite/release-notes.md new file mode 100644 index 000000000..dd2254164 --- /dev/null +++ b/docs/test-suite/release-notes.md @@ -0,0 +1,21 @@ +# Release notes for EESSI test suite + +## 0.1.0 + +Version 0.1.0 is the first release of the EESSI test suite. + +It includes: + +* A well-structured `eessi.testsuite` Python package that provides [constants](https://github.com/EESSI/test-suite/blob/main/eessi/testsuite/constants.py), + [utilities](https://github.com/EESSI/test-suite/blob/main/eessi/testsuite/utils.py), + [hooks](https://github.com/EESSI/test-suite/blob/main/eessi/testsuite/hooks.py), + and [tests](https://github.com/EESSI/test-suite/blob/main/eessi/testsuite/tests/), + which can be [installed with "`pip install`"](installation-configuration.md#pip-install). +* Tests for [GROMACS](usage.md#gromacs) and [TensorFlow](usage.md#tensorflow) in [`eessi.testsuite.tests.apps`](https://github.com/EESSI/test-suite/blob/main/eessi/testsuite/tests/apps) + that leverage the functionality provided by `eessi.testsuite.*`. +* Examples of [ReFrame configuration files](installation-configuration.md#reframe-config-file) for various systems in + the [`config` subdirectory](https://github.com/EESSI/test-suite/tree/main/config). +* A [`common_logging_config()`](installation-configuration.md#logging) function to facilitate the ReFrame logging configuration. +* A set of standard *device types* and *features* that can be used in the [`partitions` section of the ReFrame configuration file](installation-configuration.md#partitions). +* A set of [*tags* (`CI` + `scale`) that can be used to filter checks](usage.md#filter-tag). +* [Scripts](https://github.com/EESSI/test-suite/tree/main/scripts) that show how to run the test suite. diff --git a/docs/test-suite/usage.md b/docs/test-suite/usage.md new file mode 100644 index 000000000..f69652f20 --- /dev/null +++ b/docs/test-suite/usage.md @@ -0,0 +1,307 @@ +# Using the EESSI test suite + +This page covers the usage of the [EESSI test suite](https://github.com/EESSI/test-suite). + +We assume you have already [installed and configured](installation-configuration.md) the EESSI test suite on your +system. + +## Listing available tests + +To list the tests that are available in the EESSI test suite, +use `reframe --list` (or `reframe -L` for short). + +If you have properly [configured ReFrame](#Configuring-ReFrame), you should +see a (potentially long) list of checks in the output: + +``` +$ reframe --list +... +[List of matched checks] +- ... +Found 123 check(s) +``` + +!!! note + When using `--list`, checks are only generated based on available modules. + + The system partitions specified in your ReFrame configuration file are *not* taken into account when using `--list`. + + So, if `--list` produces an overview of 50 checks, and you have 4 system partitions in your configuration file, + actually running the test suite may result in (up to) 200 checks being executed. + +## Performing a dry run { #dry-run } + +To perform a dry run of the EESSI test suite, use `reframe --dry-run`: + +``` +$ reframe --dry-run +... +[==========] Running 1234 check(s) + +[----------] start processing checks +[ DRY ] GROMACS_EESSI ... +... +[----------] all spawned checks have finished + +[ PASSED ] Ran 1234/1234 test case(s) from 1234 check(s) (0 failure(s), 0 skipped, 0 aborted) +``` + +!!! note + + When using `--dry-run`, the systems partitions listed in your ReFrame configuration file are also taken into + account when generating checks, next to available modules and test parameters, which is *not* the case when using `--list`. + +## Running the (full) test suite + +To actually run the (full) EESSI test suite and let ReFrame +produce a performance report, use `reframe --run --performance-report`. + +We strongly recommend filtering the checks that will be run by using additional options +like `--system`, `--name`, `--tag` (see the 'Filtering tests' section below](#filtering-tests)), +and doing a [dry run](#dry-run) first to make sure that the generated checks correspond to what you have in mind. + +## ReFrame output and log files + +ReFrame will generate various output and log files: + +* a general ReFrame log file with debug logging on the ReFrame run (incl. selection of tests, generating checks, + test results, etc.); +* stage directories for each generated check, in which the checks are run; +* output directories for each generated check, which include the test output; +* performance log files for each test, which include performance results for the test runs; + +We strongly recommend controlling where these files go by using the [common logging configuration that +is provided by the EESSI test suite in your ReFrame configuration file](installation-configuration.md#logging) +and setting [`$RFM_PREFIX`](installation-configuration.md#RFM_PREFIX). + +If you do, and if you use [ReFrame v4.3.3 or more newer](installation-configuration.md#requirements), +you should find the output and log files at: + +* general ReFrame log file at `$RFM_PREFIX/logs/reframe__.log`; +* stage directories in `$RFM_PREFIX/stage////`; +* output directories in `$RFM_PREFIX/output////`; +* performance log files in `$RFM_PREFIX/perflogs////`; + +In the stage and output directories, there will be a subdirectory for each check that was run, +which are tagged with a unique hash (like `d3adb33f`) that is determined based on the specific parameters for that check +(see the [ReFrame documentation for more details on the test naming scheme](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#test-naming-scheme)). + +## Filtering tests { #filtering-tests } + +By default, ReFrame will automatically generate checks for each system partition, +based on the tests available in the EESSI test suite, available software modules, +and tags defined in the EESSI test suite. + +To avoid being overwhelmed by checks, it is recommend to [apply filters](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#test-filtering) +so ReFrame only generates the checks you are interested in. + +### Filtering by test name { #filter-name } + +You can filter checks based on the full test name using the [`--name` option (or `-n`)](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#cmdoption-n), +which includes the value for all test parameters. + +Here's an example of a full test name: + +``` +GROMACS_EESSI %benchmark_info=HECBioSim/Crambin %nb_impl=cpu %scale=1_node %module_name=GROMACS/2023.1-foss-2022a /d3adb33f @example:gpu+default +``` + +To let ReFrame only generate checks for GROMACS, you can use: + +```bash +reframe --name GROMACS +``` + +To only run GROMACS checks with a particular version of GROMACS, you can use `--name` to only retain specific `GROMACS` +modules: + +```bash +reframe --name %module_name=GROMACS/2023.1 +``` + +Likewise, you can filter on any part of the test name. + +You can also select one specific check using the corresponding [test hash](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#test-naming-scheme), +which is also part of the full test name (see `/d3adb33f` in the example above): +for example: + +```bash +reframe --name /d3adb33f +``` + +The argument passed to `--name` is interpreted as a Python regular expression, so you can use wildcards like `.*`, +character ranges like `[0-9]`, use `^` to specify that the pattern should match from the start of the test name, etc. + +Use `--list` or `--dry-run` to check the impact of using the `--name` option. + +### Filtering by system (partition) { #filter-system-partition } + +By default, ReFrame will generate checks for each system partition that is listed in your configuration file. + +To only let ReFrame checks for a particular system or system partition, +you can use the [`--system` option](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#cmdoption-system). + +For example: + +* To let ReFrame only generate checks for the system named `example`, use: + ``` + reframe --system example ... + ``` +* To let ReFrame only generate checks for the `gpu` partition of the system named `example`, use: + ``` + reframe --system example:gpu ... + ``` + +Use `--dry-run` to check the impact of using the `--system` option. + + +### Filtering by tags { #filter-tag } + +To filter tests using one or more tags, you can use the [`--tag` option](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#cmdoption-0). + +Using `--list-tags` you can get a list of known tags. + +To check the impact of this on generated checks by ReFrame, use `--list` or `--dry-run`. + +#### `CI` tag + +For each software that is included in the EESSI test suite, +a small test is tagged with `CI` to indicate it can be used in a Continuous Integration (CI) environment. + +Hence, you can use this tag to let ReFrame only generate checks for small test cases: + +``` +reframe --tag CI +``` + +For example: + +``` +$ reframe --name GROMACS --tag CI +... +``` + +#### `scale` tags + +The EESSI test suite defines a set of custom tags that control the *scale* of checks, +which specify many cores/nodes should be used for running a check. + +| tag name | description | +|:--------:|-------------| +| `1_core` | using a single CPU core, or single GPU | +| `2_cores` | using 2 CPU cores, or 2 GPUs | +| `4_cores` | using 4 CPU cores, or 4 GPUs | +| `1_8_node` | using 1/8th of a node (12.5% of available cores/GPUs, 1 at minimum) | +| `1_4_node` | using a quarter of a node (25% of available cores/GPUs, 1 at minimum) | +| `1_2_node` | using half of a node (50% of available cores/GPUs, 1 at minimum) | +| `1_node` | using a full node (all available cores/GPUs) | +| `2_nodes` | using 2 full nodes | +| `4_nodes` | using 4 full nodes | +| `8_nodes` | using 8 full nodes | +| `16_nodes` | using 16 full nodes | + +#### Using multiple tags + +To filter tests using multiple tags, you can: + +* use `|` as separator to indicate that *one* of the specified tags must match (logical OR, for example `--tag='1_core|2_cores'`); +* use the `--tag` option multiple times to indicate that *all* specified tags must match (logical AND, for example `--tag CI --tag 1_core`); + +## Overriding test parameters *(advanced)* + +You can override test parameters using the [`--setvar` option (or `-S`)](https://reframe-hpc.readthedocs.io/en/stable/manpage.html#cmdoption-S). + +This can be done either globally (for all tests), or only for specific tests (which is recommended when using `--setvar`). + +For example, to run all GROMACS checks with a specific GROMACS module, you can use: + +``` +reframe --setvar GROMACS_EESSI.modules=GROMACS/2023.1-foss-2022a ... +``` + +!!! warning + + We do not recommend using `--setvar`, since it is quite easy to make unintended changes to test parameters + this way that can result in broken checks. + + You should try filtering tests using the [`--name`](#filter-name) or [`--tag`](#filter-tag) options instead. + + +## Example commands + +### Running all GROMACS tests on 4 cores on the `cpu` partition + +``` +reframe --run --system example:cpu --name GROMACS --tag 4_cores --performance-report +``` + +### List all checks for TensorFlow 2.11 using a single node + +``` +reframe --list --name %module_name=TensorFlow/2.11 --tag 1_node +``` + +### Dry run of TensorFlow CI checks on a quarter node (on all system partitions) + +``` +reframe --dry-run --name 'TensorFlow.*CUDA' --tag 1_4_node --tag CI +``` + +## Available tests { #available-tests } + +The EESSI test suite currently includes tests for: + +* [GROMACS](#gromacs) +* [TensorFlow](#tensorflow) + +For a complete overview of all available tests in the EESSI test suite, see the +[`eessi/testsuite/tests` subdirectory in the `EESSI/test-suite` GitHub repository](https://github.com/EESSI/test-suite/tree/main/eessi/testsuite/tests). + +### GROMACS { #gromacs } + +Several tests for [GROMACS](https://www.gromacs.org), a software package to perform molecular dynamics simulations, +are included, which use the systems included in the [HECBioSim benchmark suite](https://www.hecbiosim.ac.uk/access-hpc/benchmarks): + +* `Crambin` (20K atom system) +* `Glutamine-Binding-Protein` (61K atom system) +- `hEGFRDimer` (465K atom system) +- `hEGFRDimerSmallerPL` (465K atom system, only 10k steps) +- `hEGFRDimerPair` (1.4M atom system) +- `hEGFRtetramerPair` (3M atom system) + +It is implemented in [`tests/apps/gromacs.py`](https://github.com/EESSI/test-suite/blob/main/eessi/testsuite/tests/apps/gromacs.py), +on top of the GROMACS test that is included in the [ReFrame test library `hpctestlib`](https://reframe-hpc.readthedocs.io/en/stable/hpctestlib.html). + +To run this GROMACS test with all HECBioSim systems, use: + +```bash +reframe --run --name GROMACS +``` + +To run this GROMACS test only for a specific HECBioSim system, use for example: + +```bash +reframe --run --name 'GROMACS.*HECBioSim/hEGFRDimerPair' +``` + +To run this GROMACS test with the smallest HECBioSim system (`Crambin`), you can use the `CI` tag: + +```bash +reframe --run --name GROMACS --tag CI +``` + +### TensorFlow { #tensorflow } + +A test for [TensorFlow](https://www.tensorflow.org), a machine learning framework, is included, +which is based on the ["Multi-worker training with Keras" TensorFlow tutorial](https://www.tensorflow.org/tutorials/distribute/multi_worker_with_keras). + +It is implemented in [`tests/apps/tensorflow/`](https://github.com/EESSI/test-suite/tree/main/eessi/testsuite/tests/apps/tensorflow). + +!!! warning + This test requires TensorFlow v2.11 or newer, using an older TensorFlow version will not work! + +To run this TensorFlow test, use: + +```bash +reframe --run --name TensorFlow +``` diff --git a/mkdocs.yml b/mkdocs.yml index e9112f689..fbb1cf657 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -26,6 +26,11 @@ nav: - software_layer/cpu_targets.md - software_layer/build_nodes.md - software_layer/adding_software.md + - Test suite: + - Overview: test-suite/index.md + - Installation & configuration: test-suite/installation-configuration.md + - Usage: test-suite/usage.md + - Release notes: test-suite/release-notes.md - Build-test-deploy bot: bot.md - Pilot repository: pilot.md - Getting access to EESSI: From 1a0cfb179c434cb11c013998fa7c01b0fe9c5837 Mon Sep 17 00:00:00 2001 From: Kenneth Hoste Date: Thu, 21 Sep 2023 17:59:59 +0200 Subject: [PATCH 04/11] clean up software_testing page, link to test-suite overview page --- docs/software_testing.md | 62 ++-------------------------------------- mkdocs.yml | 1 - 2 files changed, 2 insertions(+), 61 deletions(-) diff --git a/docs/software_testing.md b/docs/software_testing.md index 3a0cde29d..1cf8c38bb 100644 --- a/docs/software_testing.md +++ b/docs/software_testing.md @@ -1,61 +1,3 @@ -#Software testing +# Software testing -**WARNING: development of the software test suite has only just started and is a work in progress. This page describes how the test suite _will_ be designed, but many things are not implemented yet and the design may still change.** - -##Description of the software test suite - -###Framework -The EESSI project uses the [ReFrame framework](https://reframe-hpc.readthedocs.io/en/stable/index.html) for software testing. ReFrame is designed particularly for testing HPC software and thus has well integrated support for interacting with schedulers, as well as various launchers for MPI programs. - -###Test variants -The EESSI software stack can be used in various ways, e.g. by using the [container](../pilot/#accessing-the-eessi-pilot-repository-through-singularity) or when the CVMFS software stack is mounted natively. This means the commands that need to be run to test an application are different in both cases. Similarly, systems may have different hardware (CPUs v.s. GPUs, system size, etc). Thus, tests - e.g. a GROMACS test - may have different variants: one designed to run on CPUs, one on GPUs, one designed to run through the container, etc. - -The main goal of the EESSI test suite is to test the software stack on systems that have the EESSI CVMFS mounted natively. Some tests may also have variants that can run the same test through the container, but note that this setup is technically much more difficult. Thus, the main focus is on tests that run with a native CVMFS mount of the EESSI stack. - -By default, ReFrame runs all test variants it find. Thus, in our test suite, we prespecify a number of tags that can be used to select an appropriate subset of tests for your system. We recognize the following tags: - -- container: tests that use the EESSI container to run the software. E.g. one variant of our GROMACS test uses `singularity exec` to launch the EESSI container, load the GROMACS module, and run the GROMACS test. -- `native`: tests that rely on the EESSI software stack being available through the modules system. E.g. one variant of the GROMACS test loads the GROMACS module and runs the GROMACS test. -- `singlecore`: tests designed to run on a single core -- `singlenode`: tests designed to run on a single (multicore) node (note: may still use MPI for multiprocessing) -- `small`: tests designed to run on 2-8 nodes. -- `large`: tests designed to run on >9 nodes. -- `cpu`: test designed to run on CPU. -- `gpu`, gpu_nvidia, gpu_amd: test designed to run on GPUs / nvidia GPUs / AMD GPUs. - -##How to run the test suite - -### General requirements - -- A copy of the `tests` directory from [software repository](https://github.com/EESSI/software-layer) - -### Requirements for container-based tests -Specifically for container-based tests, there are some requirements on the host system: - -- An installation of ReFrame -- An MPI installation (to launch MPI tests) or PMIx-based launcher (e.g. SLURM compiled with PMIx support) -- Singularity - -The container based tests will use a so-called shared alien CVMFS cache to store temporary data. In addition, they use a local CVMFS cache for speed. For this reason, the container tests need to be pointed to one directory that is shared between nodes on your system, and one directory that is node-specific (preferably a local disk). The `shared_alien_cache_minimal.sh` script that is part of the test suite defines these, and sets up the correct CVMFS configuration. You will have to adapt the `SHAREDSPACE` and `LOCALSPACE` variables in that script for your system, and point them to a shared and node-local directory. - -### Setting up a ReFrame configuration file -Once the prerequisites have been met, you'll need to create a ReFrame configuration file that matches your system (see the [ReFrame documentation](https://reframe-hpc.readthedocs.io/en/stable/configure.html)). If you want to use the container-based tests, you *have* to define a partition programming environment called `container` and make sure it loads any modules needed to provide the MPI installation and singularity command. For an example configuration file, check the `tests/reframe/config/settings.py` in the [software-layer repository](https://github.com/EESSI/software-layer). Other than (potential) adaptations to the `container` environment, you should only really need to change the `systems` part. - -### Adapting the tests to your system -For now, you will have to adapt the number of tasks specified in full-node tests to match the number of cores your machine has in a single node (in the future, you should be able to do this through the reframe configuration file). To do so, change all `self.num_tasks_per_node` you find in the various tests to that core count (unless they are 1, in which case the test specifically intended for only 1 process per node). - - -### An example run -In this example, we assume your current directory is the `tests/reframe` folder. To list e.g. all single node, cpu-based application tests on a system that has the EESSI software environment available natively, you execute: -``` -reframe --config-file=config/settings.py --checkpath eessi-checks/applications/ -l -t native -t single -t cpu -``` -(assuming you adapted the config file in `config/settings.py` for your system). This should list the tests that are selected based on the provided tags. To run the tests, change the `-l` argument into a `-r`: -``` -reframe --config-file=config/settings.py --checkpath eessi-checks/applications/ -l -t native -t single -t cpu --performance-report -``` -To run the same tests with using the EESSI container, run: -``` -reframe --config-file=config/settings.py --checkpath eessi-checks/applications/ -l -t container -t single -t cpu --performance-report -``` -Note that not all tests necessarily have implementations to run using the EESSI container: the primary focus of the test suite is for HPC sites to check the performance of their software suite. Such sites should have CVMFS mounted natively for optimal performance anyway. +**This page has been replaced with [test-suite](test-suite/index.md), update your bookmarks!** diff --git a/mkdocs.yml b/mkdocs.yml index fbb1cf657..175854a4c 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -41,7 +41,6 @@ nav: - using_eessi/setting_up_environment.md - using_eessi/basic_commands.md - using_eessi/eessi_demos.md - - Software testing: software_testing.md - Meetings: - Overview: meetings.md - Community meeting (Sept'22): meetings/2022-09-amsterdam.md From 85f5c6a917ee30ed7abc7dd4b252b82c230d0b9c Mon Sep 17 00:00:00 2001 From: Kenneth Hoste Date: Mon, 2 Oct 2023 09:01:32 +0200 Subject: [PATCH 05/11] clarify "available" modules in section on --list option Co-authored-by: Sam Moors --- docs/test-suite/usage.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/test-suite/usage.md b/docs/test-suite/usage.md index f69652f20..d33cf6d7f 100644 --- a/docs/test-suite/usage.md +++ b/docs/test-suite/usage.md @@ -22,7 +22,7 @@ Found 123 check(s) ``` !!! note - When using `--list`, checks are only generated based on available modules. + When using `--list`, checks are only generated based on modules that are available in the system where the `reframe` command is invoked. The system partitions specified in your ReFrame configuration file are *not* taken into account when using `--list`. From 45755754d1d41e79089fca60bbef3e1d6c0a5a75 Mon Sep 17 00:00:00 2001 From: Kenneth Hoste Date: Mon, 2 Oct 2023 09:02:01 +0200 Subject: [PATCH 06/11] add recommendation to now use --prefix option Co-authored-by: Sam Moors --- docs/test-suite/usage.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/test-suite/usage.md b/docs/test-suite/usage.md index d33cf6d7f..a1b527609 100644 --- a/docs/test-suite/usage.md +++ b/docs/test-suite/usage.md @@ -72,7 +72,7 @@ ReFrame will generate various output and log files: We strongly recommend controlling where these files go by using the [common logging configuration that is provided by the EESSI test suite in your ReFrame configuration file](installation-configuration.md#logging) -and setting [`$RFM_PREFIX`](installation-configuration.md#RFM_PREFIX). +and setting [`$RFM_PREFIX`](installation-configuration.md#RFM_PREFIX) (avoid using the cmd line option `--prefix`). If you do, and if you use [ReFrame v4.3.3 or more newer](installation-configuration.md#requirements), you should find the output and log files at: From 477ecbb537e98741a2e759da1a72c72bba3cc878 Mon Sep 17 00:00:00 2001 From: Kenneth Hoste Date: Mon, 2 Oct 2023 09:02:26 +0200 Subject: [PATCH 07/11] clarify "quarter" node Co-authored-by: Sam Moors --- docs/test-suite/usage.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/test-suite/usage.md b/docs/test-suite/usage.md index a1b527609..d6ac023f0 100644 --- a/docs/test-suite/usage.md +++ b/docs/test-suite/usage.md @@ -241,7 +241,7 @@ reframe --run --system example:cpu --name GROMACS --tag 4_cores --performance-re reframe --list --name %module_name=TensorFlow/2.11 --tag 1_node ``` -### Dry run of TensorFlow CI checks on a quarter node (on all system partitions) +### Dry run of TensorFlow CI checks on a quarter (1/4) of a node (on all system partitions) ``` reframe --dry-run --name 'TensorFlow.*CUDA' --tag 1_4_node --tag CI From 9ac32dc1aeb4b57596cca3c2651cdd21ed81f10b Mon Sep 17 00:00:00 2001 From: Kenneth Hoste Date: Mon, 2 Oct 2023 09:02:50 +0200 Subject: [PATCH 08/11] fix table for `scale` tags: only 1 GPU is requested Co-authored-by: Sam Moors --- docs/test-suite/usage.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/test-suite/usage.md b/docs/test-suite/usage.md index d6ac023f0..a12d4a995 100644 --- a/docs/test-suite/usage.md +++ b/docs/test-suite/usage.md @@ -188,9 +188,9 @@ which specify many cores/nodes should be used for running a check. | tag name | description | |:--------:|-------------| -| `1_core` | using a single CPU core, or single GPU | -| `2_cores` | using 2 CPU cores, or 2 GPUs | -| `4_cores` | using 4 CPU cores, or 4 GPUs | +| `1_core` | using 1 CPU core and 1 GPU (if running a GPU test) | +| `2_cores` | using 2 CPU cores and 1 GPU (if running a GPU test) | +| `4_cores` | using 4 CPU cores, or 4 GPUs and 1 GPU (if running a GPU test) | | `1_8_node` | using 1/8th of a node (12.5% of available cores/GPUs, 1 at minimum) | | `1_4_node` | using a quarter of a node (25% of available cores/GPUs, 1 at minimum) | | `1_2_node` | using half of a node (50% of available cores/GPUs, 1 at minimum) | From 7ba66498cd9f3789e441c259946753777cf72df9 Mon Sep 17 00:00:00 2001 From: Kenneth Hoste Date: Mon, 2 Oct 2023 09:39:38 +0200 Subject: [PATCH 09/11] fix description for `4_cores` scale tag Co-authored-by: Sam Moors --- docs/test-suite/usage.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/test-suite/usage.md b/docs/test-suite/usage.md index a12d4a995..aeda63bb2 100644 --- a/docs/test-suite/usage.md +++ b/docs/test-suite/usage.md @@ -190,7 +190,7 @@ which specify many cores/nodes should be used for running a check. |:--------:|-------------| | `1_core` | using 1 CPU core and 1 GPU (if running a GPU test) | | `2_cores` | using 2 CPU cores and 1 GPU (if running a GPU test) | -| `4_cores` | using 4 CPU cores, or 4 GPUs and 1 GPU (if running a GPU test) | +| `4_cores` | using 4 CPU cores and 1 GPU (if running a GPU test) | | `1_8_node` | using 1/8th of a node (12.5% of available cores/GPUs, 1 at minimum) | | `1_4_node` | using a quarter of a node (25% of available cores/GPUs, 1 at minimum) | | `1_2_node` | using half of a node (50% of available cores/GPUs, 1 at minimum) | From dfacaa313eb347d6e3a67f0bf6f5bb8cdf4e73ef Mon Sep 17 00:00:00 2001 From: Kenneth Hoste Date: Wed, 4 Oct 2023 15:04:43 +0200 Subject: [PATCH 10/11] fix typo Co-authored-by: Caspar van Leeuwen <33718780+casparvl@users.noreply.github.com> --- docs/test-suite/installation-configuration.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/test-suite/installation-configuration.md b/docs/test-suite/installation-configuration.md index 78e58929c..d07480ffe 100644 --- a/docs/test-suite/installation-configuration.md +++ b/docs/test-suite/installation-configuration.md @@ -90,7 +90,7 @@ python3 -c 'import eessi.testsuite' Before you can run the EESSI test suite, you need to create a configuration file for ReFrame that is specific to the system on which the tests will be run. -Example configuration files are available [n the `config` subdirectory of the `EESSI/test-suite` GitHub repository](https://github.com/EESSI/test-suite/tree/main/config), +Example configuration files are available in the `config` subdirectory of the `EESSI/test-suite` GitHub repository](https://github.com/EESSI/test-suite/tree/main/config), which you can use as a template to create your own. ### Configuring ReFrame environment variables From 35812868ebfaad96dce0b03231ef2d24ca737366 Mon Sep 17 00:00:00 2001 From: Kenneth Hoste Date: Wed, 4 Oct 2023 15:09:50 +0200 Subject: [PATCH 11/11] fix typos in README caught by codespell --- talks/20210119_EESSI_behind_the_scenes/README.md | 2 +- talks/20210202_CernVM_Workshop/README.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/talks/20210119_EESSI_behind_the_scenes/README.md b/talks/20210119_EESSI_behind_the_scenes/README.md index 9300d35ed..afac14678 100644 --- a/talks/20210119_EESSI_behind_the_scenes/README.md +++ b/talks/20210119_EESSI_behind_the_scenes/README.md @@ -77,7 +77,7 @@ prepared with the help from Terje Kvernes (@terjekv). We should ask the CVMFS developers about this too (see also https://cvmfs.readthedocs.io/en/stable/apx-security.html). -* Q: LTS for Gentoo? Lifetime? Major upgrade -> EasyBuild complete rebuild? How long can we re-use the previous "trees"? +* Q: LTS for Gentoo? Lifetime? Major upgrade -> EasyBuild complete rebuild? How long can we reuse the previous "trees"? * A: (question answered on stream, see recording). Short answer: we haven't decided this yet. diff --git a/talks/20210202_CernVM_Workshop/README.md b/talks/20210202_CernVM_Workshop/README.md index a703b2c07..e04071e79 100644 --- a/talks/20210202_CernVM_Workshop/README.md +++ b/talks/20210202_CernVM_Workshop/README.md @@ -12,7 +12,7 @@ https://indico.cern.ch/event/885212/overview * A: Jülich and CSCS are examples for large centers which are part of EESSI (not sure if they are part of PRACE or EuroHPC). * A: LUMI has shown signs of interest. -* Q (Valentin Volkl): Key4HEP is already using Gitlab CI for publising to CVMFS and would be interested in a GitHub PR based workflow as envisaged by EESSI, interested in collaboration; perhaps the Github action runner can help. +* Q (Valentin Volkl): Key4HEP is already using Gitlab CI for publishing to CVMFS and would be interested in a GitHub PR based workflow as envisaged by EESSI, interested in collaboration; perhaps the Github action runner can help. * A: We would be very interested to discuss this more. We also expect that the CVMFS ephemeral publish container would help. * Q (Dave Dykstra): Rocky Linux already uses building based on github PRs.