add train on cpu #226

vincehass · 2024-12-03T21:19:08Z

Add training on cpu and config file

abdulfatir · 2024-12-04T09:59:16Z

@vincehass Thank you for the PR. I think it would be better, both for repo organization and reviewing, if you directly make modifications to the original script to allow CPU training. What are the key changes? The optimizer? You could add a check if the model is on the CPU, then switch the default optimizer name.

vincehass · 2024-12-04T20:52:53Z

Sure, I can create a new branch if you will, so basically we have:

Detailed Changes Explanation

Optimizer Name:
- In train_cpu.py:
```
optim: str = "adamw_torch",
```
  - The optimizer is set to "adamw_torch", which is a standard AdamW optimizer implementation suitable for CPU training.
- In train.py:
```
optim: str = "adamw_torch_fused",
```
  - The optimizer is set to "adamw_torch_fused", which is an optimized version of the AdamW optimizer that is designed to take advantage of GPU capabilities, particularly for mixed precision training. This optimizer can provide better performance on NVIDIA GPUs.
Device Handling:
- In Both Scripts:
  - Both scripts check for the availability of CUDA and set the device accordingly. However, the handling of the optimizer based on the device is not explicitly defined in train_cpu.py.
- Current Device Check:
```
device = 'cuda' if torch.cuda.is_available() else 'cpu' if torch.backends.mps.is_available() else 'cpu'
```
  - This line determines the device to be used for training. It checks if CUDA is available (indicating a GPU), and if not, it checks for Metal Performance Shaders (MPS) for Apple devices. If neither is available, it defaults to CPU.

Suggested Code Modification:

To improve the handling of the optimizer based on the device, we can add a conditional statement that sets the optimizer name based on whether the model is being trained on a CPU or a GPU. Here’s the suggested modification:

# .
    # Check if CUDA is available
    device = 'cuda' if torch.cuda.is_available() else 'cpu' if torch.backends.mps.is_available() else 'cpu'

    # Change optimizer if not using CUDA
    optim = 'adamw_torch' if device == 'cpu' else 'adamw_torch_fused'  # Use a valid optimizer name
#

Suggested Changes

Device Check:
- The line device = 'cuda' if torch.cuda.is_available() else 'cpu' if torch.backends.mps.is_available() else 'cpu' remains unchanged. It effectively determines the device to be used for training.
Optimizer Assignment:
- The new line optim = 'adamw_torch' if device == 'cpu' else 'adamw_torch_fused' introduces a conditional assignment for the optimizer:
  - If the device is cpu, it assigns the optimizer to 'adamw_torch', which is suitable for CPU training.
  - If the device is cuda (indicating a GPU), it assigns the optimizer to 'adamw_torch_fused', which is optimized for GPU training.

abdulfatir · 2024-12-13T18:32:17Z

@vincehass was the response written by an LLM? 😄

My understanding from your response is that you only need to take care of the optimizer for cpu training to work? Please feel free to modify the PR to update the main training script instead.

abdulfatir · 2024-12-20T08:23:44Z

@vincehass any update on this?

add train on cpu

ec521a9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add train on cpu #226

add train on cpu #226

vincehass commented Dec 3, 2024

abdulfatir commented Dec 4, 2024

vincehass commented Dec 4, 2024

abdulfatir commented Dec 13, 2024

abdulfatir commented Dec 20, 2024

add train on cpu #226

Are you sure you want to change the base?

add train on cpu #226

Conversation

vincehass commented Dec 3, 2024

abdulfatir commented Dec 4, 2024

vincehass commented Dec 4, 2024

Detailed Changes Explanation

Suggested Changes

abdulfatir commented Dec 13, 2024

abdulfatir commented Dec 20, 2024