-
-
Notifications
You must be signed in to change notification settings - Fork 718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add LocalCUDA Cluster (do not merge) #2432
Conversation
Disclaimer: I am not a GPU expert at all. It feels like if This is linked to #2430 (comment): when a job starts running in our cluster, |
@cmgreen210 should you find yourself with some free time I'd be curious if the following would have worked for your use case with multiprocessing:
from dask.distributed import LocalCUDACluster, Client, progress
cluster = LocalCUDACluster()
client = Client(cluster)
futures = client.map(your_function, arg_sequence)
progress(futures) I think that this should naively handle the things that you ran into, but I wouldn't be surprised if I've left something out or this breaks in some other way. If you have an opportunity to break this and provide feedback I would find that valuable. No pressure though if you're busy. |
Interesting @mrocklin, I'll give it a shot when I have time. |
distributed/deploy/cuda.py
Outdated
|
||
yield [ | ||
self._start_worker( | ||
**self.worker_kwargs, env={"CUDA_VISIBLE_DEVICES": str(i)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note: while this will target the correct GPU with each worker, it will prevent workers from seeing other GPUs and prevent using CUDA IPC. If you'd want to use CUDA IPC with 2 GPUs you'd want something like:
CUDA_VISIBLE_DEVICES=0,1 ...
CUDA_VISIBLE_DEVICES=1,0 ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks @kkraus14 !
previously we relied on thread/core count for this
OK, I have a small Where should this go? I can push this up to the dask github org, but I'd be happier to have it move into the |
+1. I also agree with keeping this LocalCudaCluster separate. Would be really nice to see a fully distributed CUDA cluster in the future as well (I certainly don't mind contributing / helping to maintain). Makes me wonder if this is presenting a good opportunity to build a repository focused on developer tooling within the RAPIDS ecosystem. |
Closing in favor of https://github.com/mrocklin/dask-cuda |
Thanks all for the comments |
@mrocklin, I've been so slammed the past 2 weeks and I would really like to make use of this (specifically for py.test within dask-cuml). What is the verdict on the new home for this? Are we moving it into RAPIDS? |
You can pip install it from github today:
pip install git+https://github.com/mrocklin/dask-cuda
I think that we should wait until the conversation in
dask/governance#4 before moving it to a github
organization like rapidsai.
…On Tue, Jan 8, 2019 at 8:52 AM Corey J. Nolet ***@***.***> wrote:
@mrocklin <https://github.com/mrocklin>, I've been so slammed the past 2
weeks and I would really like to make use of this (specifically for py.test
within dask-cuml). What is the verdict on the new home for this? Are we
moving it into RAPIDS?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2432 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASszD1C0kk57mUuWdONrW67YK7DKfmCks5vBMy8gaJpZM4ZbDMa>
.
|
This should not be merged. If this goes well then I'll move this to some other repository.
In the mean time, if people are able to try this out and give feedback that would be welcome.