-
Notifications
You must be signed in to change notification settings - Fork 356
is batch size per gpu important to re-implement the accuracy reported in the paper? #35
Comments
Hi @Gus-Guo, batch size is not important for the method, since there is no batch-component to the loss. However, I expect that changing the batch size might require you to change other parameters accordingly, such as the learning-rate or the ema range. However, running out of memory in the middle of training is not great. Would you mind sharing some logs including:
|
Thank you
Mido, thank you very much for your reply. My training config is as follows: data:
part of my training logs where running out of memory: INFO:root:rank [0],[Mon Jul 3 14:54:28 2023]: [2, 190] loss: 0.104 masks: 68.8 44.1 [wd: 4.00e-02] [lr: 2.26e-04] [mem: 7.07e+04] (5109.5 ms) File "/opt/tiger/test_merlin_demo/src/train_iter.py", line 419, in forward_context |
It's surprising that it's happening already more than an epoch into training. Not sure if there are other processes running on your GPU, but could you try changing |
Thank you very much. I will have a try. |
Hi, I recently has run the vit-14-ep300 config on 16 a100 gpus. But since the gpu would run out of memory in the middle of training, I decrease the batch size from 128 into 112 per gpu. But I obtain a lower linear probe accuracy (like -2%). Is it important to preserve the batch size of 128 per gpu? Thank you very much!
The text was updated successfully, but these errors were encountered: