You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since the precompiled version of torch manages some dependencies on its own (e.g. cuda, nccl, cudnn), when we install torch via pip, let's say I want to install version 2.2.0 of torch:
So here's the issue: the nccl downloaded here is compiled using cuda12.3, while torch uses cuda12.1.
Although the compilation uses inconsistent versions, it actually works (at least I haven't had any problems so far), so I thought I'd ask here if this inconsistency could be hiding some problems I'm not aware of.
By the way, we can use nccl-tests to verify the version of cuda used by the nccl compilation:
now that containers are mainstream, it would be great to move off of python packaging for NVIDIA artifacts and instead install them on the system (i.e. - in the container, not in a conda environment, virtualenv, etc.)
Problem Description
Since the precompiled version of
torch
manages some dependencies on its own (e.g.cuda
,nccl
,cudnn
), when we installtorch
viapip
, let's say I want to install version2.2.0
oftorch
:It will go and download
nvidia-nccl-cu12==2.19.3
as shown in the following log:So here's the issue: the
nccl
downloaded here is compiled usingcuda12.3
, whiletorch
usescuda12.1
.Although the compilation uses inconsistent versions, it actually works (at least I haven't had any problems so far), so I thought I'd ask here if this inconsistency could be hiding some problems I'm not aware of.
By the way, we can use
nccl-tests
to verify the version ofcuda
used by thenccl
compilation:NCCL_DEBUG=INFO
this option prints out the version ofcuda
used at compile time whennccl-tests
runs:The text was updated successfully, but these errors were encountered: