Add LocalCUDA Cluster (do not merge) #2432

mrocklin · 2018-12-19T21:25:50Z

This should not be merged. If this goes well then I'll move this to some other repository.

In the mean time, if people are able to try this out and give feedback that would be welcome.

pip install git+https://github.com/mrocklin/distributed@cuda-cluster --upgrade

In [1]: from dask.distributed import LocalCUDACluster, Client

In [2]: cluster = LocalCUDACluster()

In [3]: client = Client(cluster)

In [4]: import os

In [5]: def f():
   ...:     return os.environ['CUDA_VISIBLE_DEVICES']
   ...:

In [6]: client.run(f)
Out[6]:
{'tcp://127.0.0.1:33502': '4',
 'tcp://127.0.0.1:35447': '6',
 'tcp://127.0.0.1:37706': '2',
 'tcp://127.0.0.1:38728': '3',
 'tcp://127.0.0.1:39490': '7',
 'tcp://127.0.0.1:40090': '1',
 'tcp://127.0.0.1:42862': '5',
 'tcp://127.0.0.1:43920': '0'}

lesteve · 2018-12-20T10:32:29Z

Disclaimer: I am not a GPU expert at all. It feels like if CUDA_VISIBLE_DEVICES is already set though, you should only be able to use the devices listed in CUDA_VISIBLE_DEVICES.

This is linked to #2430 (comment): when a job starts running in our cluster, CUDA_VISIBLE_DEVICES is already set. Using other devices would mean "stealing" other jobs GPUs and possibly make other jobs crash (for example by having them run out of GPU memory).

mrocklin · 2018-12-20T13:38:41Z

@cmgreen210 should you find yourself with some free time I'd be curious if the following would have worked for your use case with multiprocessing:

pip install git+https://github.com/mrocklin/distributed@cuda-cluster --upgrade

from dask.distributed import LocalCUDACluster, Client, progress
cluster = LocalCUDACluster()
client = Client(cluster)

futures = client.map(your_function, arg_sequence)
progress(futures)

I think that this should naively handle the things that you ran into, but I wouldn't be surprised if I've left something out or this breaks in some other way. If you have an opportunity to break this and provide feedback I would find that valuable. No pressure though if you're busy.

cmgreen210 · 2018-12-20T21:45:35Z

@cmgreen210 should you find yourself with some free time I'd be curious if the following would have worked for your use case with multiprocessing:
pip install git+https://github.com/mrocklin/distributed@cuda-cluster --upgrade
from dask.distributed import LocalCUDACluster, Client, progress
cluster = LocalCUDACluster()
client = Client(cluster)

futures = client.map(your_function, arg_sequence)
progress(futures)
I think that this should naively handle the things that you ran into, but I wouldn't be surprised if I've left something out or this breaks in some other way. If you have an opportunity to break this and provide feedback I would find that valuable. No pressure though if you're busy.

Interesting @mrocklin, I'll give it a shot when I have time.

kkraus14 · 2018-12-30T21:56:09Z

distributed/deploy/cuda.py

+
+        yield [
+            self._start_worker(
+                **self.worker_kwargs, env={"CUDA_VISIBLE_DEVICES": str(i)}


Just a note: while this will target the correct GPU with each worker, it will prevent workers from seeing other GPUs and prevent using CUDA IPC. If you'd want to use CUDA IPC with 2 GPUs you'd want something like:

CUDA_VISIBLE_DEVICES=0,1 ... CUDA_VISIBLE_DEVICES=1,0 ...

Done. Thanks @kkraus14 !

previously we relied on thread/core count for this

mrocklin · 2019-01-02T20:42:50Z

OK, I have a small dask-cuda repository locally that has this functionality (and also the consideration that @lesteve brought up earlier about respecting CUDA_VISIBLE_DEVICES that may be given to our process).

Where should this go? I can push this up to the dask github org, but I'd be happier to have it move into the rapidsai org (I suspect that rapids devs are more likely to do maintenance on this than Dask devs). If the answer is rapidsai then I'll need someone else to make the repository on github (blank repo ideally, no commits) and give me permissions.

cjnolet · 2019-01-04T04:31:23Z

+1. I also agree with keeping this LocalCudaCluster separate. Would be really nice to see a fully distributed CUDA cluster in the future as well (I certainly don't mind contributing / helping to maintain).

Makes me wonder if this is presenting a good opportunity to build a repository focused on developer tooling within the RAPIDS ecosystem.

mrocklin · 2019-01-08T16:47:45Z

Closing in favor of https://github.com/mrocklin/dask-cuda

mrocklin · 2019-01-08T16:47:52Z

Thanks all for the comments

cjnolet · 2019-01-08T16:52:11Z

@mrocklin, I've been so slammed the past 2 weeks and I would really like to make use of this (specifically for py.test within dask-cuml). What is the verdict on the new home for this? Are we moving it into RAPIDS?

mrocklin · 2019-01-08T16:59:43Z

You can pip install it from github today: pip install git+https://github.com/mrocklin/dask-cuda I think that we should wait until the conversation in dask/governance#4 before moving it to a github organization like rapidsai.

…

On Tue, Jan 8, 2019 at 8:52 AM Corey J. Nolet ***@***.***> wrote: @mrocklin <https://github.com/mrocklin>, I've been so slammed the past 2 weeks and I would really like to make use of this (specifically for py.test within dask-cuml). What is the verdict on the new home for this? Are we moving it into RAPIDS? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2432 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AASszD1C0kk57mUuWdONrW67YK7DKfmCks5vBMy8gaJpZM4ZbDMa> .

mrocklin added 2 commits December 19, 2018 12:24

Add optional environment variables to Nanny

d86161a

Add LocalCUDACluster

a33be4c

mrocklin mentioned this pull request Dec 19, 2018

LocalCUDACluster #2430

Closed

kkraus14 reviewed Dec 30, 2018

View reviewed changes

mrocklin added 2 commits December 30, 2018 17:27

Use cycling values for CUDA_VISIBLE_DEVICES

830a1a4

split memory between workers

4cbd2c1

previously we relied on thread/core count for this

mrocklin closed this Jan 8, 2019

mrocklin deleted the cuda-cluster branch January 8, 2019 16:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LocalCUDA Cluster (do not merge) #2432

Add LocalCUDA Cluster (do not merge) #2432

mrocklin commented Dec 19, 2018

lesteve commented Dec 20, 2018

mrocklin commented Dec 20, 2018

cmgreen210 commented Dec 20, 2018

kkraus14 Dec 30, 2018

mrocklin Dec 31, 2018

mrocklin commented Jan 2, 2019

cjnolet commented Jan 4, 2019 •

edited

Loading

mrocklin commented Jan 8, 2019

mrocklin commented Jan 8, 2019

cjnolet commented Jan 8, 2019

mrocklin commented Jan 8, 2019 via email

Add LocalCUDA Cluster (do not merge) #2432

Add LocalCUDA Cluster (do not merge) #2432

Conversation

mrocklin commented Dec 19, 2018

lesteve commented Dec 20, 2018

mrocklin commented Dec 20, 2018

cmgreen210 commented Dec 20, 2018

kkraus14 Dec 30, 2018

Choose a reason for hiding this comment

mrocklin Dec 31, 2018

Choose a reason for hiding this comment

mrocklin commented Jan 2, 2019

cjnolet commented Jan 4, 2019 • edited Loading

mrocklin commented Jan 8, 2019

mrocklin commented Jan 8, 2019

cjnolet commented Jan 8, 2019

mrocklin commented Jan 8, 2019 via email

cjnolet commented Jan 4, 2019 •

edited

Loading