Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

discuss launchers (mpirun and mpiexec) #6

Open
jeffhammond opened this issue Nov 18, 2022 · 4 comments
Open

discuss launchers (mpirun and mpiexec) #6

jeffhammond opened this issue Nov 18, 2022 · 4 comments

Comments

@jeffhammond
Copy link
Member

Problem

There is no guarantee of compatibility across launchers (e.g. mpirun).

Proposal

We should not try to solve this problem, because it can be solved without additional specification.

Existing practice allows for Slurm, PBS, etc. to launch MPI programs compiled with either MPICH or Open-MPI.

The included launchers from these libraries do not interoperate, but it is straightforward for a third-party tool to solve this, by wrapping their existing launchers.

Changes to the Text

Impact on Implementations

No additional work is required, since existing third-party launchers are supported by MPICH and Open-MPI.

Impact on Users

Some users may complain if we do not solve this thoroughly.

References and Pull Requests

https://github.com/jeffhammond/blog/blob/main/MPI_Needs_ABI_Part_3.md

@jeffhammond jeffhammond changed the title discuss launchers discuss launchers (mpirun and mpiexec) Nov 18, 2022
@jedbrown
Copy link

For users who expect to use mpirun or mpiexec, a hack is to figure out what launcher the program expects and then invoke it. In this design, mpiexec can be a shell script that calls strings or some other introspection method on the binary and figures out if it's MPICH or Open-MPI or Intel MPI or MVAPICH2, and then calls the implementation specific mpiexec. This is not an elegant method but it probably works for a lot of users, and isn't any worse than the mess we have right now.

I think that was written in a different context, but if we have a standard ABI, then there will be no strings and you can run the binaries with any library (assuming dynamically linking). Using a Hydra launcher would presumably ensure that the MPICH library is used, and similarly for the ORTE launcher.

Static linking is another matter, rendering ABI moot. Of course it would be ideal if Hydra and ORTE launchers could settle on a standard protocol that resource managers use (PMIx or whatever) to talk to the executable. I agree that's out of scope here.

@gonzalobg
Copy link

We should not try to solve this problem, because it can be solved without additional specification.

+1. We can always try to solve this later, if this turns out to be a problem.

@jeffhammond
Copy link
Member Author

For users who expect to use mpirun or mpiexec, a hack is to figure out what launcher the program expects and then invoke it. In this design, mpiexec can be a shell script that calls strings or some other introspection method on the binary and figures out if it's MPICH or Open-MPI or Intel MPI or MVAPICH2, and then calls the implementation specific mpiexec. This is not an elegant method but it probably works for a lot of users, and isn't any worse than the mess we have right now.

I think that was written in a different context, but if we have a standard ABI, then there will be no strings and you can run the binaries with any library (assuming dynamically linking). Using a Hydra launcher would presumably ensure that the MPICH library is used, and similarly for the ORTE launcher.

That's an interesting way to look at it (and I like it). It's a nice situation if users understand that the launcher prescribes the implementation to be used, since that means we don't have to solve the universal launcher problem.

The place where things get interesting is singleton initialization, where the application is started without a launcher and then spawns processes. Today, this does not always work become some implementations can't create multiple processes without environment variables being set. This is a solvable problem but insufficient priority for most implementations to care.

@jedbrown
Copy link

Regarding singletons, I think laptop installations would have a default (managed by apt alternatives, modules, and similar) and resource managers would select (at sbatch level) based on modules or explicit parameters. I anticipate this environment management being uniformly easier than current practice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants