Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to run the code on multiple GPUs? #507

Open
williamyuanv0 opened this issue Apr 18, 2022 · 1 comment
Open

How to run the code on multiple GPUs? #507

williamyuanv0 opened this issue Apr 18, 2022 · 1 comment

Comments

@williamyuanv0
Copy link

Hi, kengz,
I meet a problem on how to run on multiple GPUs? In the initial of class ConvNet in conv.py, the code assigned device as follows:
self.to(self.device)
but how to extent to multi GPUs here( in initial of class ConvNet ) , or for an instantiation of class ConvNet .
When I try to use torch.nn.DataParallel(module, device_ids=None, output_device=None, dim=0) to assign to multi GPUs, there is a problem that some (public) methods or variables definition in class ConvNet will lose after conv_mode=torch.nn.DataParallel(conv_mode, device_ids={1,2,3,4}).

@kengz
Copy link
Owner

kengz commented Apr 21, 2022

hey @JZHOU0 SLM Lab wasn't written with distributed training across GPUs in mind. However I think you could do so with:

  1. write your own extension of the conv net class. So, something like this https://pytorch.org/tutorials/beginner/former_torchies/parallelism_tutorial.html and have it consume a new key passed in from net_spec for GPU assignment as you need.
  2. specify your custom net class in your net_spec with "type": "YourConvNet", with the net spec values.

And the algorithm should just be able to pick it up. Depending on which algorithm the loss computation is going to use data from different devices so you'd need to make sure the correct device transfer happens on your net class implementation. But again certain things might break when you're training something so big across device - so definitely watch out for that. Let me know how it goes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants