Add LAMMPS tests to EESSI test-suite #131

laraPPr · 2024-04-02T14:56:23Z

No description provided.

eessi/testsuite/tests/apps/lammps/data.rhodo

eessi/testsuite/tests/apps/lammps/lammps.py

laraPPr · 2024-04-04T10:11:51Z

These tests are CPU only to run LAMMPS with GPU you need the execuatble lmp_machine instead of lmp see https://docs.lammps.org/Speed_gpu.html

eessi/testsuite/tests/apps/lammps/lammps.py

laraPPr · 2024-04-17T14:47:47Z

The test are now compatible with the lammps package in EESSI and https://github.com/easybuilders/easybuild-easyconfigs/blob/develop/easybuild/easyconfigs/l/LAMMPS/LAMMPS-2Aug2023_update2-foss-2023a-kokkos-CUDA-12.1.1.eb.

So I think that it is ready for review.

boegel · 2024-04-17T16:27:55Z

@laraPPr Can we somehow avoid including the *.rhodo files in the repository?
One of them is rather big (though it's basically a text file, but one with 191k lines 😅)

laraPPr · 2024-04-17T17:50:04Z

@laraPPr Can we somehow avoid including the *.rhodo files in the repository?

One of them is rather big (though it's basically a text file, but one with 191k lines 😅)

We had discussed this in the last test-suite sync. And said that it was oke for this one. I've already minimized its impact as much as possible by using a reframe option that it does not copy it to the stage directory. But we can discuss it again tomorrow.

boegel · 2024-04-18T11:47:25Z

Can you add a small README file that mentions where we got the input files, when they were obtained, from which version, what the SHA256 checksum is, etc.?

eessi/testsuite/tests/apps/lammps/lammps.py

smoors · 2024-08-05T18:54:41Z

eessi/testsuite/tests/apps/lammps/lammps.py

+
+    @performance_function('img/s')
+    def perf(self):
+        regex = r'^(?P<perf>[.0-9]+)% CPU use with [0-9]+ MPI tasks x [0-9]+ OpenMP threads'


this doesn't look right, this is the %CPU usage.

performance should be one of the following:

Performance: 0.823 ns/day, 29.175 hours/ns, 4.761 timesteps/s, 152.338 katom-step/s

Performance: 205379.307 tau/day, 475.415 timesteps/s, 15.213 Matom-step/s this than?

your line comes from the lj test, while mine comes from the rhodo test.
let's take timesteps/s, as this unit is available for both tests?

I've now added tau/day for lj and ns/day for rhodo but I can also take the timesteps

i just checked, and both tau/day for lj and ns/day for rhodo scale in exactly the same way as timesteps/s, so it doesn't really matter.

i do have a slight preference for timesteps/s as it is easy to understand. otherwise, can you add a comment explaining what exactly tau/day means?

I also do not know what tau/day is and google does not have a ready explanation. So changed it to the timesteps/s for lj and rhodo

eessi/testsuite/tests/apps/lammps/lammps.py

laraPPr · 2024-08-06T08:53:03Z

Just nocticed something going really wrong when testing lammps related to this issue #132

ERROR on proc 0: Cannot open input script in.rhodo: No such file or directory (src/lammps.cpp:542)
Last command: (unknown)

I'm also seeing this one but I'm not sure if I'm triggering it because I don't see it when their are no duplicate modules and I also cannot find this error in any of the test-reports from the CI that I am running (TypeError: can only join an iterable)

sbatch: error: QOSMaxSubmitJobPerUserLimit
sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)

EDIT: The sbatch; error: triggers a bug in reframe older than 4.3 (I am using 4.2) See reframe-hpc/reframe#2885

smoors · 2024-08-08T10:19:53Z

seems to work well, though haven't tested on GPUs, our GPU nodes are too busy at the moment.

[ RUN      ] EESSI_LAMMPS_rhodo %scale=1_8_node %device_type=cpu %module_name=LAMMPS/2Aug2023_update2-foss-2023a-kokkos /90d91dda @hydra:zen4+default
[ RUN      ] EESSI_LAMMPS_lj %scale=1_8_node %device_type=cpu %module_name=LAMMPS/2Aug2023_update2-foss-2023a-kokkos /28a69ba4 @hydra:zen4+default
[       OK ] (1/2) EESSI_LAMMPS_lj %scale=1_8_node %device_type=cpu %module_name=LAMMPS/2Aug2023_update2-foss-2023a-kokkos /28a69ba4 @hydra:zen4+default
P: perf: 562.325 timesteps/s (r:0, l:None, u:None)
==> setup: 0.116s compile: 0.010s run: 49.726s sanity: 0.022s performance: 0.002s total: 49.982s
[       OK ] (2/2) EESSI_LAMMPS_rhodo %scale=1_8_node %device_type=cpu %module_name=LAMMPS/2Aug2023_update2-foss-2023a-kokkos /90d91dda @hydra:zen4+default
P: perf: 33.704 timesteps/s (r:0, l:None, u:None)
==> setup: 0.126s compile: 0.008s run: 57.376s sanity: 0.016s performance: 0.004s total: 57.656s
[----------] all spawned checks have finished

[  PASSED  ] Ran 2/2 test case(s) from 2 check(s) (0 failure(s), 0 skipped, 0 aborted)
[==========] Finished on Thu Aug  8 11:06:54 2024 

===================================================================================================================================================================================================================
PERFORMANCE REPORT
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[EESSI_LAMMPS_rhodo %scale=1_8_node %device_type=cpu %module_name=LAMMPS/2Aug2023_update2-foss-2023a-kokkos /90d91dda @hydra:zen4:default]
  num_tasks_per_node: 8
  num_tasks: 8
  num_cpus_per_task: 1
  performance:
    - perf: 33.704 timesteps/s (r: 0 timesteps/s l: -inf% u: +inf%)
[EESSI_LAMMPS_lj %scale=1_8_node %device_type=cpu %module_name=LAMMPS/2Aug2023_update2-foss-2023a-kokkos /28a69ba4 @hydra:zen4:default]
  num_tasks_per_node: 8
  num_tasks: 8
  num_cpus_per_task: 1
  performance:
    - perf: 562.325 timesteps/s (r: 0 timesteps/s l: -inf% u: +inf%)

smoors · 2024-08-08T10:22:56Z

the only thing i'm still missing is an energy sanity check. i would use TotEng (total energy) for this.
for lj the total energy seems to be always exactly the same after 100 steps:

   Step          Temp          E_pair         E_mol          TotEng         Press·····
         0   1.44          -6.7733681      0             -4.6134356     -5.0197073····
       100   0.7574531     -5.7585055      0             -4.6223613      0.20726105···

for rhodo, the total energy is also very stable, with only the last digit showing slight differences:

------------ Step            100 ----- CPU =    0.4625225 (sec) -------------
TotEng   =    -25290.7300 KinEng   =     21591.9085 Temp     =       301.0906·
PotEng   =    -46882.6385 E_bond   =      2567.9807 E_angle  =     10781.9571·
E_dihed  =      5198.7492 E_impro  =       216.7864 E_vdwl   =     -1902.6618·
E_coul   =    206659.5228 E_long   =   -270404.9730 Press    =         6.7407·
Volume   =    308134.2285

casparvl · 2024-08-08T11:56:36Z

I see Sam gave great comments on the content. I'm not familiar with LAMMPS, but at least checked that the runs succeeded.

All succeeded both on Karolina:

[----------] all spawned checks have finished

[  PASSED  ] Ran 26/26 test case(s) from 26 check(s) (0 failure(s), 0 skipped, 0 aborted)
[==========] Finished on Thu Aug  8 13:34:46 2024+0200
Log file(s) saved in '/home/it4i-casparl/EESSI/reframe_runs/logs/reframe_20240808_133124.log'

And on Snellius:

[----------] all spawned checks have finished

[  FAILED  ] Ran 52/52 test case(s) from 26 check(s) (1 failure(s), 0 skipped, 0 aborted)
[==========] Finished on Thu Aug  8 13:44:38 2024+0200

The failure was due to a node failure, a rerun of that particular test succeeded. So... everything looks good from my side :)

casparvl · 2024-08-08T12:15:02Z

Still todo: add check that a CUDA-build is built with Kokkos, this test requires it.

laraPPr · 2024-08-08T12:18:05Z

eessi/testsuite/tests/apps/lammps/lammps.py

+        num_default = 0  # If this test already has executable opts, they must have come from the command line
+        hooks.check_custom_executable_opts(self, num_default=num_default)
+        if not self.has_custom_executable_opts:
+            # should also check if the lammps is installed with kokkos.


Reminder for me to also look at this one again

casparvl · 2024-08-08T13:00:56Z

Karolina, LAMMPS/2Aug2023_update2-foss-2023a-kokkos

1 rank, rhodo:

------------ Step            100 ----- CPU =     37.46569 (sec) -------------
TotEng   =    -25290.7299 KinEng   =     21591.9085 Temp     =       301.0906

128 ranks, rhodo:

------------ Step            100 ----- CPU =    0.5564088 (sec) -------------
TotEng   =    -25290.7302 KinEng   =     21591.9084 Temp     =       301.0906

16*128 ranks, rhodo:

------------ Step            100 ----- CPU =    0.4018642 (sec) -------------
TotEng   =    -25290.7300 KinEng   =     21591.9085 Temp     =       301.0906

1 rank, lj:

       100   0.7574531     -5.7585055      0             -4.6223613      0.20726105

128 ranks, lj:

       100   0.7574531     -5.7585055      0             -4.6223613      0.20726105

16*128 ranks, lj:

       100   0.7574531     -5.7585055      0             -4.6223613      0.20726105

Snellius, LAMMPS/2Aug2023_update2-foss-2023a-kokkos

------------ Step            100 ----- CPU =     30.80414 (sec) -------------
TotEng   =    -25290.7299 KinEng   =     21591.9085 Temp     =       301.0906

128 ranks, rhodo:

------------ Step            100 ----- CPU =    0.4229304 (sec) -------------
TotEng   =    -25290.7302 KinEng   =     21591.9084 Temp     =       301.0906

16*128 ranks, rhodo:

------------ Step            100 ----- CPU =     47.82593 (sec) -------------
TotEng   =    -25290.7300 KinEng   =     21591.9085 Temp     =       301.0906

1 rank, lj:

       100   0.7574531     -5.7585055      0             -4.6223613      0.20726105

128 ranks, lj:

       100   0.7574531     -5.7585055      0             -4.6223613      0.20726105

16*128 ranks, lj:

       100   0.7574531     -5.7585055      0             -4.6223613      0.20726105

casparvl · 2024-08-08T13:37:58Z

Just as another check, I ran with an older version of LAMMPS (LAMMPS/23Jun2022-foss-2022a-kokkos):

128 ranks,lj:

       100   0.7574531     -5.7585055      0             -4.6223613      0.20726105

128 ranks, rhodo:

------------ Step            100 ----- CPU =    0.4350029 (sec) -------------
TotEng   =    -25290.7300 KinEng   =     21591.9085 Temp     =       301.0906

Seems the total energy is consistent across versions. But, I am getting:

WARNING: skipping evaluation of performance variable 'perf': not enough matches of pattern '^Performance: [.0-9]+ tau/day, (?P<perf>[.0-9]+) timesteps/s,' in file 'rfm_job.out' so as to extract item 0

I'm not entirely sure why, because I do see:

Performance: 1200769.669 tau/day, 2779.559 timesteps/s

And that seems to match...?

casparvl · 2024-08-08T13:42:03Z

With yet another module, LAMMPS/2Aug2023_update2-foss-2023a-kokkos-CUDA-12.1.1, on H100 GPUs:
4 ranks (gpus), lj:

       100   0.7574531     -5.7585055      0             -4.6223613      0.20726105

4 ranks (gpus), rhodo:

------------ Step            100 ----- CPU =    0.5430899 (sec) -------------
TotEng   =    -25290.7302 KinEng   =     21591.9084 Temp     =       301.0906

So, also consistent results for TotEng on GPU.

laraPPr · 2024-08-08T13:47:39Z

Just as another check, I ran with an older version of LAMMPS (LAMMPS/23Jun2022-foss-2022a-kokkos):

128 ranks,lj:
       100   0.7574531     -5.7585055      0             -4.6223613      0.20726105
128 ranks, rhodo:
------------ Step            100 ----- CPU =    0.4350029 (sec) -------------
TotEng   =    -25290.7300 KinEng   =     21591.9085 Temp     =       301.0906
Seems the total energy is consistent across versions. But, I am getting:
WARNING: skipping evaluation of performance variable 'perf': not enough matches of pattern '^Performance: [.0-9]+ tau/day, (?P<perf>[.0-9]+) timesteps/s,' in file 'rfm_job.out' so as to extract item 0
I'm not entirely sure why, because I do see:
Performance: 1200769.669 tau/day, 2779.559 timesteps/s
And that seems to match...?

Its the comma can remove that one

casparvl · 2024-08-08T13:58:14Z

Ah, the trailing comma, you're right!

casparvl · 2024-08-08T14:02:08Z

I also tried to run on GPUs. Works fine, except multinode rhodo - but that seems to be a problem in our UCX stack https://bugs.launchpad.net/ubuntu/+source/ucx/+bug/2055222
I'm not sure why I'm not encountering that for the lj case, but the error is so clearly the one in that bug report, that I don't believe the issue is in this test itself.

eessi/testsuite/tests/apps/lammps/lammps.py

laraPPr · 2024-08-09T14:21:14Z

Added the energy check and added support for the LAMMPS with the GPU package (no kokkos)

casparvl

Since this is pure MPI, it's probably good to call hooks.set_compact_process_binding.
Also, please call req_memory_per_node and specify how much memory the test needs. This makes sure the test gets skipped on systems with insufficient memory.

casparvl

Ok, did two more successful runs: on Snellius & Karolina. All good. Going in, thanks @laraPPr !

lara added 2 commits March 28, 2024 12:07

add lammps files for running bench mark

9a86786

add lammps.py test

b80e3f2

laraPPr marked this pull request as draft April 2, 2024 14:56

laraPPr commented Apr 2, 2024

View reviewed changes

eessi/testsuite/tests/apps/lammps/data.rhodo Outdated Show resolved Hide resolved

laraPPr commented Apr 2, 2024

View reviewed changes

eessi/testsuite/tests/apps/lammps/lammps.py Outdated Show resolved Hide resolved

laraPPr commented Apr 2, 2024

View reviewed changes

eessi/testsuite/tests/apps/lammps/lammps.py Outdated Show resolved Hide resolved

laraPPr commented Apr 2, 2024

View reviewed changes

eessi/testsuite/tests/apps/lammps/lammps.py Outdated Show resolved Hide resolved

lara added 8 commits April 3, 2024 14:09

implement lammps sanity check

f07c81a

update assign_tasks_per_compute_unit

d0818d0

fix line too long

78c3780

clean up code

332c5fd

remove example scripts

3e2f1af

update source files and add rhodo test

b7d1548

clean up lammps.py

0656d2f

clean up lammps.py

e777ddb

laraPPr commented Apr 5, 2024

View reviewed changes

eessi/testsuite/tests/apps/lammps/lammps.py Show resolved Hide resolved

laraPPr commented Apr 5, 2024

View reviewed changes

eessi/testsuite/tests/apps/lammps/lammps.py Outdated Show resolved Hide resolved

vsc46128 vscuser added 7 commits April 16, 2024 20:10

add gpu support to lj test

8a8f838

add gpu support to lj test

afe68b1

update rhodo test

868375a

make rhod test compatible with kokkos gpu package

36a5dbf

fix style

cd93e6c

fix style

fbffb2d

fix style

aaadf7d

laraPPr marked this pull request as ready for review April 17, 2024 14:47

smoors reviewed Aug 5, 2024

View reviewed changes

eessi/testsuite/tests/apps/lammps/lammps.py Show resolved Hide resolved

smoors reviewed Aug 5, 2024

View reviewed changes

vsc46128 vscuser added 3 commits August 7, 2024 13:53

get other Performance idicators

fbadd82

Merge branch 'main' into LAMMPS_test

acd7da0

use timestep/s as a performance metric

4cbacb5

add check for total energy

03ed65d

laraPPr commented Aug 8, 2024

View reviewed changes

eessi/testsuite/tests/apps/lammps/lammps.py Outdated Show resolved Hide resolved

vsc46128 vscuser added 5 commits August 9, 2024 13:47

add check for total energy of lj test

15fb164

add check for total energy of rhodo test

dd8a821

add test for GPU package

65f0c3b

add test for GPU package

e6b4b5d

add test for GPU package

7bd0131

casparvl requested changes Aug 12, 2024

View reviewed changes

vsc46128 vscuser added 2 commits August 12, 2024 15:39

add binding hook

d2bf54d

add memory usage to lammps test

f6b2c5a

casparvl approved these changes Aug 14, 2024

View reviewed changes

casparvl merged commit 74d0d82 into EESSI:main Aug 14, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LAMMPS tests to EESSI test-suite #131

Add LAMMPS tests to EESSI test-suite #131

laraPPr commented Apr 2, 2024

laraPPr commented Apr 4, 2024

laraPPr commented Apr 17, 2024

boegel commented Apr 17, 2024

laraPPr commented Apr 17, 2024

boegel commented Apr 18, 2024

smoors Aug 5, 2024 •

edited

Loading

laraPPr Aug 7, 2024 •

edited

Loading

smoors Aug 7, 2024

laraPPr Aug 7, 2024 •

edited

Loading

smoors Aug 7, 2024 •

edited

Loading

laraPPr Aug 7, 2024

laraPPr commented Aug 6, 2024 •

edited

Loading

smoors commented Aug 8, 2024

smoors commented Aug 8, 2024

casparvl commented Aug 8, 2024 •

edited

Loading

casparvl commented Aug 8, 2024

laraPPr Aug 8, 2024

casparvl commented Aug 8, 2024

casparvl commented Aug 8, 2024

casparvl commented Aug 8, 2024

laraPPr commented Aug 8, 2024

casparvl commented Aug 8, 2024

casparvl commented Aug 8, 2024

laraPPr commented Aug 9, 2024

casparvl left a comment

casparvl left a comment

Add LAMMPS tests to EESSI test-suite #131

Add LAMMPS tests to EESSI test-suite #131

Conversation

laraPPr commented Apr 2, 2024

laraPPr commented Apr 4, 2024

laraPPr commented Apr 17, 2024

boegel commented Apr 17, 2024

laraPPr commented Apr 17, 2024

boegel commented Apr 18, 2024

smoors Aug 5, 2024 • edited Loading

Choose a reason for hiding this comment

laraPPr Aug 7, 2024 • edited Loading

Choose a reason for hiding this comment

smoors Aug 7, 2024

Choose a reason for hiding this comment

laraPPr Aug 7, 2024 • edited Loading

Choose a reason for hiding this comment

smoors Aug 7, 2024 • edited Loading

Choose a reason for hiding this comment

laraPPr Aug 7, 2024

Choose a reason for hiding this comment

laraPPr commented Aug 6, 2024 • edited Loading

smoors commented Aug 8, 2024

smoors commented Aug 8, 2024

casparvl commented Aug 8, 2024 • edited Loading

casparvl commented Aug 8, 2024

laraPPr Aug 8, 2024

Choose a reason for hiding this comment

casparvl commented Aug 8, 2024

casparvl commented Aug 8, 2024

casparvl commented Aug 8, 2024

laraPPr commented Aug 8, 2024

casparvl commented Aug 8, 2024

casparvl commented Aug 8, 2024

laraPPr commented Aug 9, 2024

casparvl left a comment

Choose a reason for hiding this comment

casparvl left a comment

Choose a reason for hiding this comment

smoors Aug 5, 2024 •

edited

Loading

laraPPr Aug 7, 2024 •

edited

Loading

laraPPr Aug 7, 2024 •

edited

Loading

smoors Aug 7, 2024 •

edited

Loading

laraPPr commented Aug 6, 2024 •

edited

Loading

casparvl commented Aug 8, 2024 •

edited

Loading