Skip to content

Latest commit

 

History

History
142 lines (114 loc) · 6.93 KB

README.md

File metadata and controls

142 lines (114 loc) · 6.93 KB

Sleep example with Hydra

This example implements a task that sleeps for {runtime} seconds and holds a Numpy array of {memory} megabytes in size. The runtime can be measured with timeit and the memory usage with memray.

Because the multirun output folder has a fixed structure, you can parse the folder with a script and plot the benchmarking metrics. This example script creates plots in the multirun folder when executing the script with a benchmark option.

ml swap cluster/donphan
# launch interactive session
ACTIVATE=src/sleep_hydra SCRIPT=bash srun --pty runner.pbs
# execute Hydra like normal
python src/sleep_hydra/main.py +sweep='{runtime: 1, memory: 1},{runtime: 2, memory: 10},{runtime: 3, memory: 100}' task.runtime='${sweep.runtime}' task.memory='${sweep.memory}' benchmark=all hydra/launcher=joblib -m
python scripts/sleep_plots.py

Note that the configuration task.sleep=1,2,3 benchmark=runtime,memory would require 6 tasks and fail on an interactive cluster with a 5 task queue limit.

The runtime plot shows the 3 tasks with increasing sleep length.

The memory plot shows the 3 tasks with increasing memory usage.

Different launch options explained

Run locally one task

Note that by default Hydra stores the configuration, logs and output of each run in a unique folder outputs/{DAY}/{TIME}/. The output folder contains a .hydra/ folder specifying the complete configuration used for the run and any output file written to cfg.output_dir.

python src/sleep_hydra/main.py

OUTPUT:

Output directory : /kyukon/data/gent/vo/000/gvo00070/vsc43257/hydra_hpc_example/outputs/2023-10-20/11-09-33
[2023-10-20 11:09:34,986][__main__][INFO] - Process ID 1595501 executed task {'runtime': 1, 'memory': 10} in 1.000080016994616 seconds

Run locally multiple tasks sequentially

Note that the multiple runs now each have an output folder in multirun/{DAY}/{TIME}/. All process IDs are the same, as the tasks run sequentially in the same process.

python src/sleep_hydra/main.py task.runtime=1,2,3 -m

OUTPUT:

[2023-10-20 11:10:43,219][HYDRA] Launching 3 jobs locally
[2023-10-20 11:10:43,219][HYDRA]        #0 : task.runtime=1
Output directory : /kyukon/data/gent/vo/000/gvo00070/vsc43257/hydra_hpc_example/multirun/2023-10-20/11-10-42/0
[2023-10-20 11:10:44,407][__main__][INFO] - Process ID 1597253 executed task {'runtime': 1, 'memory': 10} in 1.0000840639986563 seconds
[2023-10-20 11:10:44,411][HYDRA]        #1 : task.runtime=2
Output directory : /kyukon/data/gent/vo/000/gvo00070/vsc43257/hydra_hpc_example/multirun/2023-10-20/11-10-42/1
[2023-10-20 11:10:46,501][__main__][INFO] - Process ID 1597253 executed task {'runtime': 2, 'memory': 10} in 2.0000894780096132 seconds
[2023-10-20 11:10:46,503][HYDRA]        #2 : task.runtime=3
Output directory : /kyukon/data/gent/vo/000/gvo00070/vsc43257/hydra_hpc_example/multirun/2023-10-20/11-10-42/2
[2023-10-20 11:10:49,586][__main__][INFO] - Process ID 1597253 executed task {'runtime': 3, 'memory': 10} in 3.0000770180195104 seconds

Run locally multiple tasks in parallel

Note that all process IDs are the different, as each tasks runs in it's own process.

python src/sleep_hydra/main.py task.runtime=1,2,3 hydra/launcher=joblib -m

OUTPUT:

[2023-10-20 11:12:03,084][HYDRA] Joblib.Parallel(n_jobs=-1,backend=loky,prefer=processes,require=None,verbose=0,timeout=None,pre_dispatch=2*n_jobs,batch_size=auto,temp_folder=None,max_nbytes=None,mmap_mode=r) is launching 3 jobs
[2023-10-20 11:12:03,084][HYDRA] Launching jobs, sweep output dir : multirun/2023-10-20/11-11-57
[2023-10-20 11:12:03,085][HYDRA]        #0 : task.runtime=1
[2023-10-20 11:12:03,085][HYDRA]        #1 : task.runtime=2
[2023-10-20 11:12:03,085][HYDRA]        #2 : task.runtime=3
Output directory : /kyukon/data/gent/vo/000/gvo00070/vsc43257/hydra_hpc_example/multirun/2023-10-20/11-11-57/0
Output directory : /kyukon/data/gent/vo/000/gvo00070/vsc43257/hydra_hpc_example/multirun/2023-10-20/11-11-57/1
Output directory : /kyukon/data/gent/vo/000/gvo00070/vsc43257/hydra_hpc_example/multirun/2023-10-20/11-11-57/2
[2023-10-20 11:12:04,881][__main__][INFO] - Process ID 1598945 executed task {'runtime': 1, 'memory': 10} in 1.0003418159903958 seconds
[2023-10-20 11:12:05,912][__main__][INFO] - Process ID 1598949 executed task {'runtime': 2, 'memory': 10} in 2.000080882018665 seconds
[2023-10-20 11:12:06,913][__main__][INFO] - Process ID 1598952 executed task {'runtime': 3, 'memory': 10} in 3.000079121993622 seconds

Run on Slurm with the local test config

Note that by default no logging and print statements are shown, these are stored at .submitit/ in the output directory next to the output of each run. submitit_local uses subprocess to run the tasks locally, so the functionality and output is similar to joblib.

python src/sleep_hydra/main.py task.runtime=1,2,3 hydra/launcher=submitit_local -m

OUTPUT:

[2023-10-20 11:12:57,639][HYDRA] Submitit 'local' sweep output dir : multirun/2023-10-20/11-12-57
[2023-10-20 11:12:57,641][HYDRA]        #0 : task.runtime=1
[2023-10-20 11:12:57,645][HYDRA]        #1 : task.runtime=2
[2023-10-20 11:12:57,653][HYDRA]        #2 : task.runtime=3

Run on Slurm using the HPC job scheduler

Can only be executed on the HPC cluster.

python src/sleep_hydra/main.py task.runtime=1,2,3 hydra/launcher=submitit_slurm -m

OUTPUT:

[2023-10-20 11:15:04,918][HYDRA] Submitit 'slurm' sweep output dir : multirun/2023-10-20/11-15-04
[2023-10-20 11:15:04,920][HYDRA]        #0 : task.runtime=1
[2023-10-20 11:15:04,924][HYDRA]        #1 : task.runtime=2
[2023-10-20 11:15:04,928][HYDRA]        #2 : task.runtime=3

Specifying resources

For a full list of settings, see the HPC documentation. To see available parameters, run:

python src/sleep_hydra/main.py hydra/launcher=submitit_slurm --cfg hydra -p hydra.launcher

Slurm job with 2 CPUs and 4GB of RAM:

python src/sleep_hydra/main.py hydra/launcher=submitit_slurm hydra.launcher.cpus_per_task=2 hydra.launcher.mem_gb=4GB

Benchmarking overhead

We can benchmark both runtime and memory in two ways besides benchmark=runtime and benchmark=memory:

benchmark=all will run the task twice, once with timeit and once with memray.

# seconds
[(3, 3.000507133983774), (2, 2.0004843850038014), (1, 1.0004661950224545)]
# megabytes
[(100, 95.39), (10, 9.559), (1, 0.999444)]
  • benchmark=hybrid will run the task once with timeit inside the memray wrapper.
# seconds
[(3, 3.013979399984237), (2, 2.013261186017189), (1, 1.015038490993902)]
# megabytes
[(100, 96.392), (10, 10.561), (1, 1.001554)]

The overhead of benchmark=hybrid is negligable for large sizes, it's more important to run multiple times and report the variance.