Running srun, sbatch, salloc from within pyxis #31

itzsimpl · 2020-10-28T00:38:31Z

I would like to use Enroot containers to provide toolchain environments for Slurm, i.e. as a sort of substitute for lmod modules. A typical example are NVIDIA Container images which can contain source code with multiple steps.
My question is, is it possible to generate Slurm jobs from within a pyxis/enroot container?

flx42 · 2020-10-28T02:31:19Z

It might be possible, but honestly I haven't tried.

You will likely need to have the same Slurm version inside the container than on the cluster (or bind-mount binaries/libraries). You want to run as non-remapped root, and you might need to bind-mount some more files from the host (I don't think slurmd uses a UNIX domain socket, so at least it should be fine on this side).

If it fails initially, using strace might help to discover which files srun/sbatch/salloc are trying to open inside the container environment.

itzsimpl · 2020-10-29T16:35:40Z

I did a quick test, but at the moment it seems unfeasible, as one needs to bind mount too many things. For example:

/etc/passwd
/etc/slurm/
/usr/local/lib/slurm
/usr/local/bin
/usr/lib/x86_64-linux-gnu/libmunge.so.2
/var/run/munge/munge.socket.2

With this I was able to at least run sinfo, but I stopped there.

Is there any plan to add official support?

One connected question, is there a plan to support passing the enroot container to sbatch?

flx42 · 2020-10-29T16:52:09Z

Is there any plan to add official support?

No, not right now, sorry. Because most of the work can be done in the container image (e.g. by installing the same stack / scripts inside the container image). You could also do a custom enroot hook to mount everything that is needed, I don't think it should be done by pyxis.

One connected question, is there a plan to support passing the enroot container to sbatch?

This has been a requested a few times, so we are considering it. I can't tell you for sure if it will happen, or when.

Thanks.

itzsimpl · 2020-10-29T17:17:16Z

Could you clarify on what you mean by e.g. by installing the same stack / scripts inside the container image?

Support for sbatch would be really awesome, as it is the only command to support the --array parameter, and many existing scripts use it.

One highly specific example (which I am playing with, just to give some perspective) is the Kaldi toolkit (https://github.com/kaldi-asr/kaldi). Sure, one can run it from inside the container image (https://ngc.nvidia.com/catalog/containers/nvidia:kaldi) started with a single srun command which requested resources (cpus+gpus) for the entire duration of the run.

I would say this is not good practice as during training for half of the time only cpus are in use. Most of the scripts, however, have already been written to support gridEngine/slurm, they generate srun or sbatch commands. So, to take full advantage of the cluster and not hog resources when not necessary one would need to be able to run srun from inside an Enroot container (to place a subtask into the queue) or be able to pass the --container-image to sbatch and run the top script from the shell.

flx42 · 2020-10-29T17:54:15Z

Could you clarify on what you mean by e.g. by installing the same stack / scripts inside the container image?

I mean that you could craft a custom container image with the same Slurm libraries, binaries and configuration than the one you install on your cluster. I guess your Slurm version doesn't change often so it might be fine.

So, to take full advantage of the cluster and not hog resources when not necessary one would need to be able to run srun from inside an Enroot container (to place a subtask into the queue) or be able to pass the --container-image to sbatch and run the top script from the shell.

I see, we have similar use cases but we took a different approach: the sbatch script uses srun --container-image to run the containerized task, and if it needs to schedule a follow-up job, it will do that after this job has completed, for instance with sbatch --dependency=afterok:${SLURM_JOB_ID} next_task.sh

3XX0 · 2020-10-29T18:07:57Z

FWIW the following is an enroot config that should do the job (I used it in the past), you can convert it to enroot system configuration files and have SLURM be injected automatically in all your containers.

readonly srun_cmd=$(command -v srun)
readonly slurm_conf="/etc/slurm/slurm.conf"
readonly slurm_plugin_dir=$(scontrol show config | awk '/PluginDir/{print $3}')
readonly slurm_plugstack_dir="/etc/slurm/plugstack.conf.d"
readonly slurm_user=$(scontrol show config | awk '/SlurmUser/{print $3}')
readonly libpmix_path=$(ldconfig -p | awk '/libpmix/{print $4; exit}')
readonly libhwloc_path=$(ldconfig -p | awk '/libhwloc/{print $4; exit}')
readonly libmunge_path=$(ldconfig -p | awk '/libmunge/{print $4; exit}')
readonly munge_sock_path=$(awk -F= '/AccountingStoragePass/{print $2}' "${slurm_conf}")

mounts() {
   echo "${srun_cmd} ${srun_cmd}"
   echo "${slurm_conf%/*} ${slurm_conf%/*}"
   echo "${slurm_plugin_dir} ${slurm_plugin_dir}"
   awk '{print $2" "$2}' "${slurm_plugstack_dir}"/*
   echo "${libpmix_path} ${libpmix_path%.*}"
   echo "${libhwloc_path} ${libhwloc_path}"
   echo "${libmunge_path} ${libmunge_path}"
   echo "${munge_sock_path} ${munge_sock_path}"
}

environ() {
   echo "LD_LIBRARY_PATH=${libmunge_path%/*}:${libpmix_path%/*}:${libhwloc_path%/*}"
   env | grep SLURM || :
}

hooks() {
   getent passwd "${slurm_user%(*}" >> ${ENROOT_ROOTFS}/etc/passwd
}

dr-br · 2021-08-24T14:56:08Z

We are very happy with #55, but now users want to do multinode jobs with sbatch.
@3XX0 could you please comment on #31 (comment) how to set that up?
Thanks!

flx42 · 2021-08-24T16:21:17Z

I don't think this is something we can support reliably unless we get https://bugs.schedmd.com/show_bug.cgi?id=12230 OR some kind of API compatibility guarantee OR you build your containers with the same version of Slurm than what is installed on the cluster (i.e. non-portable containers).

tf-nv · 2024-03-15T17:31:41Z

I do have a use case for srun inside a pyxis container as well. There is a framework which dynamically builds an srun command and launches that. The framework has intricate dependencies, and has to be launched from within a container itself. However srun is not availabe inside that container, so the dynamic srun command can not be launched.

flx42 mentioned this issue Aug 4, 2021

Support for sbatch #55

Closed

itzsimpl mentioned this issue Dec 3, 2023

Running scontrol from within container #129

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running srun, sbatch, salloc from within pyxis #31

Running srun, sbatch, salloc from within pyxis #31

itzsimpl commented Oct 28, 2020

flx42 commented Oct 28, 2020 •

edited

Loading

itzsimpl commented Oct 29, 2020 •

edited

Loading

flx42 commented Oct 29, 2020

itzsimpl commented Oct 29, 2020

flx42 commented Oct 29, 2020 •

edited

Loading

3XX0 commented Oct 29, 2020

dr-br commented Aug 24, 2021 •

edited

Loading

flx42 commented Aug 24, 2021

tf-nv commented Mar 15, 2024 •

edited

Loading

Running srun, sbatch, salloc from within pyxis #31

Running srun, sbatch, salloc from within pyxis #31

Comments

itzsimpl commented Oct 28, 2020

flx42 commented Oct 28, 2020 • edited Loading

itzsimpl commented Oct 29, 2020 • edited Loading

flx42 commented Oct 29, 2020

itzsimpl commented Oct 29, 2020

flx42 commented Oct 29, 2020 • edited Loading

3XX0 commented Oct 29, 2020

dr-br commented Aug 24, 2021 • edited Loading

flx42 commented Aug 24, 2021

tf-nv commented Mar 15, 2024 • edited Loading

flx42 commented Oct 28, 2020 •

edited

Loading

itzsimpl commented Oct 29, 2020 •

edited

Loading

flx42 commented Oct 29, 2020 •

edited

Loading

dr-br commented Aug 24, 2021 •

edited

Loading

tf-nv commented Mar 15, 2024 •

edited

Loading