CUDA support #71

ocaisa · 2021-02-10T13:55:01Z

I was experimenting with CUDA support within EESSI and ran into the issue that, when using CUDA compiled with the EESSI stack, the CUDA libraries from the host are not seen by the executables created by nvcc. This is because it looks for the CUDA driver libraries in the prefix, where they do not exist. There are a few viable solutions:

Use LD_PRELOAD...but this requires that you know exactly which libraries to preload.
Use LD_LIBRARY_PATH...but in the system that I was on, the CUDA libraries are in /usr/lib64 so this would drag in a lot of unwanted libraries. This could be worked around by symlinking the libraries to a more unique location in the host (Compute Canada uses /usr/lib64/nvidia and has a script that creates the necessary symlinks).
Modify the loader to (also) look in an appropriate directory - this is what Compute Canada does, see https://github.com/ComputeCanada/gentoo-overlay/blob/master/sys-libs/glibc/glibc-2.30-r8.ebuild#L1111

The text was updated successfully, but these errors were encountered:

peterstol · 2021-02-16T08:06:50Z

For systems with glibc in the prefix using /usr/lib64/nvidia as location may be a good choice.

On my brightcomputing system the CUDA libraries can be simply linked from /cm/local/apps/cuda/libs/current/lib64
Creating the symlinks will require admin privileges though and can vary between systems.
Could EESSI provide different CUDA versions and use the matching one with the kernel driver in a similar way as the architecture is matched with Archspec?

bedroge · 2021-02-17T07:20:48Z

Bart reported on Slack yesterday that the libcuda.so of new CUDA versions no longer link against this versioned libnvidia-fatbinaryload.so.XXX.YY:

Removed libnvidia-fatbinaryloader.so from the driver package. This functionality is now built into other driver libraries.

This means that, with these newer version, we could also make symlinks to the host's libcuda. The only annoying thing is that the location can, apparently, differ between distros, so I guess we would need something like a variant symlink that allows site to override the location of libcuda.so if necessary.

ocaisa · 2021-02-17T08:21:58Z

I don't understand this stuff well enough but I think that a symlink might only get us out of jail with a pure CUDA code? Comparing the lib64 to the stubs directory inside a CUDA toolkit installation, it looks like you also need libnvidia-ml.so.

I took a look at the OpenGL configuration for JSC and if we want to use visualisation capabilities on the available GPU, we would also need some of the other driver libraries.

bedroge · 2021-02-17T08:35:08Z

Yes, you probably need more, but I guess the same thing could be done for those? The advantage of that approach could be that it would for instance work out of the box on all systems that have these libraries in /usr/lib64. On other systems, the variable for the variant symlink would have to be overridden in the cvmfs configuration.

ocaisa · 2021-02-17T08:39:44Z

We could probably learn a lot from what happens with containers and GPUs, they would have the same issues with passing through driver libraries.

ocaisa · 2021-02-17T08:41:50Z

See https://sylabs.io/guides/3.5/user-guide/gpu.html#library-search-options

They maintain two lists, one for NVIDIA, one for AMD

ocaisa · 2021-04-08T12:51:22Z

Thanks to #91, the 2021.03 release of EESSI can successfully compile and run CUDA code if symlinks to the driver libraries are placed in /opt/eessi/lib

Add jsonschema as test dependency for archspec

bedroge mentioned this issue Feb 18, 2021

Pass user-defined-trusted-dirs to glibc EESSI/gentoo-overlay#7

Closed

bedroge assigned ocaisa, bedroge and boegel and unassigned bedroge Nov 15, 2021

huebner-m mentioned this issue May 12, 2022

Add CUDA support EESSI/software-layer#172

Closed

5 tasks

poksumdo pushed a commit to poksumdo/compatibility-layer that referenced this issue Jun 8, 2023

Merge pull request EESSI#71 from bedroge/archspec_jsonschema

ed9baae

Add jsonschema as test dependency for archspec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA support #71

CUDA support #71

ocaisa commented Feb 10, 2021 •

edited

Loading

peterstol commented Feb 16, 2021

bedroge commented Feb 17, 2021

ocaisa commented Feb 17, 2021

bedroge commented Feb 17, 2021

ocaisa commented Feb 17, 2021

ocaisa commented Feb 17, 2021 •

edited

Loading

ocaisa commented Apr 8, 2021

CUDA support #71

CUDA support #71

Comments

ocaisa commented Feb 10, 2021 • edited Loading

peterstol commented Feb 16, 2021

bedroge commented Feb 17, 2021

ocaisa commented Feb 17, 2021

bedroge commented Feb 17, 2021

ocaisa commented Feb 17, 2021

ocaisa commented Feb 17, 2021 • edited Loading

ocaisa commented Apr 8, 2021

ocaisa commented Feb 10, 2021 •

edited

Loading

ocaisa commented Feb 17, 2021 •

edited

Loading