Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Training on multiple gpus #2499

Open
KornkamonGib opened this issue Aug 14, 2024 · 1 comment
Open

[BUG] Training on multiple gpus #2499

KornkamonGib opened this issue Aug 14, 2024 · 1 comment
Labels
bug Something isn't working gpu Question or bug occuring with gpu

Comments

@KornkamonGib
Copy link

I have 4 gpus and trying to train TFT model by setting up pl_trainer_kwargs as follows

pl_trainer_kwargs = {
"accelerator": "gpu",
"strategy": "ddp_notebook",
"devices": 4,
"callbacks": callbacks
}

There is an error shows: RuntimeError: Lightning can't create new processes if CUDA is already initialized. Did you manually call torch.cuda.* functions, have moved the model to the device, or allocated memory on the GPU any other way? Please remove any such calls, or change the selected strategy. You will have to restart the Python kernel.

@KornkamonGib KornkamonGib added bug Something isn't working triage Issue waiting for triaging labels Aug 14, 2024
@madtoinou madtoinou added gpu Question or bug occuring with gpu and removed triage Issue waiting for triaging labels Aug 15, 2024
@madtoinou
Copy link
Collaborator

Hi @KornkamonGib,

Can you try the solution described in #1945 and use "devices":"auto"? What about "strategy":"auto"?

Sharing a minimum reproducible example would made it easier to investigate and propose some solutions to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working gpu Question or bug occuring with gpu
Projects
None yet
Development

No branches or pull requests

2 participants