Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add train on cpu #226

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

add train on cpu #226

wants to merge 1 commit into from

Conversation

vincehass
Copy link

Add training on cpu and config file

@abdulfatir
Copy link
Contributor

@vincehass Thank you for the PR. I think it would be better, both for repo organization and reviewing, if you directly make modifications to the original script to allow CPU training. What are the key changes? The optimizer? You could add a check if the model is on the CPU, then switch the default optimizer name.

@vincehass
Copy link
Author

Sure, I can create a new branch if you will, so basically we have:

Detailed Changes Explanation

  1. Optimizer Name:

    • In train_cpu.py:

      optim: str = "adamw_torch",
      • The optimizer is set to "adamw_torch", which is a standard AdamW optimizer implementation suitable for CPU training.
    • In train.py:

      optim: str = "adamw_torch_fused",
      • The optimizer is set to "adamw_torch_fused", which is an optimized version of the AdamW optimizer that is designed to take advantage of GPU capabilities, particularly for mixed precision training. This optimizer can provide better performance on NVIDIA GPUs.
  2. Device Handling:

    • In Both Scripts:

      • Both scripts check for the availability of CUDA and set the device accordingly. However, the handling of the optimizer based on the device is not explicitly defined in train_cpu.py.
    • Current Device Check:

      device = 'cuda' if torch.cuda.is_available() else 'cpu' if torch.backends.mps.is_available() else 'cpu'
      • This line determines the device to be used for training. It checks if CUDA is available (indicating a GPU), and if not, it checks for Metal Performance Shaders (MPS) for Apple devices. If neither is available, it defaults to CPU.
  3. Suggested Code Modification:

    • To improve the handling of the optimizer based on the device, we can add a conditional statement that sets the optimizer name based on whether the model is being trained on a CPU or a GPU. Here’s the suggested modification:
    # .
        # Check if CUDA is available
        device = 'cuda' if torch.cuda.is_available() else 'cpu' if torch.backends.mps.is_available() else 'cpu'
    
        # Change optimizer if not using CUDA
        optim = 'adamw_torch' if device == 'cpu' else 'adamw_torch_fused'  # Use a valid optimizer name
    # 

Suggested Changes

  • Device Check:

    • The line device = 'cuda' if torch.cuda.is_available() else 'cpu' if torch.backends.mps.is_available() else 'cpu' remains unchanged. It effectively determines the device to be used for training.
  • Optimizer Assignment:

    • The new line optim = 'adamw_torch' if device == 'cpu' else 'adamw_torch_fused' introduces a conditional assignment for the optimizer:
      • If the device is cpu, it assigns the optimizer to 'adamw_torch', which is suitable for CPU training.
      • If the device is cuda (indicating a GPU), it assigns the optimizer to 'adamw_torch_fused', which is optimized for GPU training.

@abdulfatir
Copy link
Contributor

@vincehass was the response written by an LLM? 😄

My understanding from your response is that you only need to take care of the optimizer for cpu training to work? Please feel free to modify the PR to update the main training script instead.

@abdulfatir
Copy link
Contributor

@vincehass any update on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants