-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error when using multi-GPU training: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered #254
Comments
@Guokr233 this might be the problem of tensorflow ifself or you didn't setup the environment right. Did you use anaconda3 (or miniconda)? And make sure your cuda driver is installed correctly on your machine. |
I'm creating the environment via conda, it's a really weird bug |
I created the environment through conda, and I upgraded to 2.8 version of tensorflow-gpu, cuda11.2, cuDNN 8.4.0. But still got this error. It seems that I can only train slowly with one GPU |
@Guokr233 Let me recheck the mirror strategy of tensorflow to see if there's any changes. |
I've also encountered the same problem. I've followed the solutions which are given on this issue, but it didn't work: Moreover, I followed this solution also, but again it didn't work: |
@Guokr233 can you try this solution: |
I am trying to train a Chinese model of a conformer. When I train with 4 2080ti, there will be an error in the middle of the epoch: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered, and the time of occurrence is not fixed. This problem doesn't occur when I train with only one gpu. please help me
This is my environment:
Below is my config.yml configuration
The text was updated successfully, but these errors were encountered: