Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running srun, sbatch, salloc from within pyxis #31

Open
itzsimpl opened this issue Oct 28, 2020 · 9 comments
Open

Running srun, sbatch, salloc from within pyxis #31

itzsimpl opened this issue Oct 28, 2020 · 9 comments

Comments

@itzsimpl
Copy link

I would like to use Enroot containers to provide toolchain environments for Slurm, i.e. as a sort of substitute for lmod modules. A typical example are NVIDIA Container images which can contain source code with multiple steps.
My question is, is it possible to generate Slurm jobs from within a pyxis/enroot container?

@flx42
Copy link
Member

flx42 commented Oct 28, 2020

It might be possible, but honestly I haven't tried.

You will likely need to have the same Slurm version inside the container than on the cluster (or bind-mount binaries/libraries). You want to run as non-remapped root, and you might need to bind-mount some more files from the host (I don't think slurmd uses a UNIX domain socket, so at least it should be fine on this side).

If it fails initially, using strace might help to discover which files srun/sbatch/salloc are trying to open inside the container environment.

@itzsimpl
Copy link
Author

itzsimpl commented Oct 29, 2020

I did a quick test, but at the moment it seems unfeasible, as one needs to bind mount too many things. For example:

  • /etc/passwd
  • /etc/slurm/
  • /usr/local/lib/slurm
  • /usr/local/bin
  • /usr/lib/x86_64-linux-gnu/libmunge.so.2
  • /var/run/munge/munge.socket.2

With this I was able to at least run sinfo, but I stopped there.

Is there any plan to add official support?

One connected question, is there a plan to support passing the enroot container to sbatch?

@flx42
Copy link
Member

flx42 commented Oct 29, 2020

Is there any plan to add official support?

No, not right now, sorry. Because most of the work can be done in the container image (e.g. by installing the same stack / scripts inside the container image). You could also do a custom enroot hook to mount everything that is needed, I don't think it should be done by pyxis.

One connected question, is there a plan to support passing the enroot container to sbatch?

This has been a requested a few times, so we are considering it. I can't tell you for sure if it will happen, or when.

Thanks.

@itzsimpl
Copy link
Author

Could you clarify on what you mean by e.g. by installing the same stack / scripts inside the container image?

Support for sbatch would be really awesome, as it is the only command to support the --array parameter, and many existing scripts use it.

One highly specific example (which I am playing with, just to give some perspective) is the Kaldi toolkit (https://github.com/kaldi-asr/kaldi). Sure, one can run it from inside the container image (https://ngc.nvidia.com/catalog/containers/nvidia:kaldi) started with a single srun command which requested resources (cpus+gpus) for the entire duration of the run.

I would say this is not good practice as during training for half of the time only cpus are in use. Most of the scripts, however, have already been written to support gridEngine/slurm, they generate srun or sbatch commands. So, to take full advantage of the cluster and not hog resources when not necessary one would need to be able to run srun from inside an Enroot container (to place a subtask into the queue) or be able to pass the --container-image to sbatch and run the top script from the shell.

@flx42
Copy link
Member

flx42 commented Oct 29, 2020

Could you clarify on what you mean by e.g. by installing the same stack / scripts inside the container image?

I mean that you could craft a custom container image with the same Slurm libraries, binaries and configuration than the one you install on your cluster. I guess your Slurm version doesn't change often so it might be fine.

So, to take full advantage of the cluster and not hog resources when not necessary one would need to be able to run srun from inside an Enroot container (to place a subtask into the queue) or be able to pass the --container-image to sbatch and run the top script from the shell.

I see, we have similar use cases but we took a different approach: the sbatch script uses srun --container-image to run the containerized task, and if it needs to schedule a follow-up job, it will do that after this job has completed, for instance with sbatch --dependency=afterok:${SLURM_JOB_ID} next_task.sh

@3XX0
Copy link
Member

3XX0 commented Oct 29, 2020

FWIW the following is an enroot config that should do the job (I used it in the past), you can convert it to enroot system configuration files and have SLURM be injected automatically in all your containers.

readonly srun_cmd=$(command -v srun)
readonly slurm_conf="/etc/slurm/slurm.conf"
readonly slurm_plugin_dir=$(scontrol show config | awk '/PluginDir/{print $3}')
readonly slurm_plugstack_dir="/etc/slurm/plugstack.conf.d"
readonly slurm_user=$(scontrol show config | awk '/SlurmUser/{print $3}')
readonly libpmix_path=$(ldconfig -p | awk '/libpmix/{print $4; exit}')
readonly libhwloc_path=$(ldconfig -p | awk '/libhwloc/{print $4; exit}')
readonly libmunge_path=$(ldconfig -p | awk '/libmunge/{print $4; exit}')
readonly munge_sock_path=$(awk -F= '/AccountingStoragePass/{print $2}' "${slurm_conf}")

mounts() {
   echo "${srun_cmd} ${srun_cmd}"
   echo "${slurm_conf%/*} ${slurm_conf%/*}"
   echo "${slurm_plugin_dir} ${slurm_plugin_dir}"
   awk '{print $2" "$2}' "${slurm_plugstack_dir}"/*
   echo "${libpmix_path} ${libpmix_path%.*}"
   echo "${libhwloc_path} ${libhwloc_path}"
   echo "${libmunge_path} ${libmunge_path}"
   echo "${munge_sock_path} ${munge_sock_path}"
}

environ() {
   echo "LD_LIBRARY_PATH=${libmunge_path%/*}:${libpmix_path%/*}:${libhwloc_path%/*}"
   env | grep SLURM || :
}

hooks() {
   getent passwd "${slurm_user%(*}" >> ${ENROOT_ROOTFS}/etc/passwd
}

@flx42 flx42 mentioned this issue Aug 4, 2021
@dr-br
Copy link

dr-br commented Aug 24, 2021

We are very happy with #55, but now users want to do multinode jobs with sbatch.
@3XX0 could you please comment on #31 (comment) how to set that up?
Thanks!

@flx42
Copy link
Member

flx42 commented Aug 24, 2021

I don't think this is something we can support reliably unless we get https://bugs.schedmd.com/show_bug.cgi?id=12230 OR some kind of API compatibility guarantee OR you build your containers with the same version of Slurm than what is installed on the cluster (i.e. non-portable containers).

@tf-nv
Copy link

tf-nv commented Mar 15, 2024

I do have a use case for srun inside a pyxis container as well. There is a framework which dynamically builds an srun command and launches that. The framework has intricate dependencies, and has to be launched from within a container itself. However srun is not availabe inside that container, so the dynamic srun command can not be launched.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants