-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug Report] #337
Comments
I can't reproduce this on my local machine, but I also don't have multiple GPUs. Does this only happen when using multiple GPUs? |
Yes, I didn't encounter this error when using just one GPU, but when using multiple GPUs with a larger context size and latent size, causing higher GPU memory usage, this error occurs. |
Previously, the same error occurred when learning the same-size sparse autoencoder.
|
I tried to lower the learning rate under the same conditions, but the same error occurred in the same place.
|
Update here - I believe the root cause of this issue might be the PR last week which added |
@Yoon-Jeong-ho Is this fixed in the most recent version of SAELens (4.0.9)? Thanks for the fix @callummcdougall! |
If you are submitting a bug report, please fill in the following details and use the tag [bug].
Describe the bug
I encountered a RuntimeError during training while using sae_lens. The error appears to be related to a mismatch between the device used for tensor operations and the indices (CPU vs CUDA).
error message
Code example
System Info
Python : 3.11.9
CUDA : 12.4
GPU : NVIDIA RTX A6000
PyTorch : 2.0.1
ununtu : 20.04.1 LTS
sae-lens : 3.22.2
torch : 2.4.1
transformer-lens : 2.7.0
Checklist
The text was updated successfully, but these errors were encountered: