Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataloader distributed or not #67

Open
zhd2rng opened this issue Apr 3, 2023 · 0 comments
Open

Dataloader distributed or not #67

zhd2rng opened this issue Apr 3, 2023 · 0 comments

Comments

@zhd2rng
Copy link

zhd2rng commented Apr 3, 2023

Hi, I was checking the logging file, i.e., hrnet_w48_contrast_lr1x_hrnet_contrast_t0.1.log. The epoch and iteration seem to be computed as the training is for a single gpu, while it is a 4 GPU job.

Basically, for 4 GPU job and bs=8 per GPU, one epoch of Cityscapes with 2975 training set, should have 93 iterations if the dataloader is distributed across all GPUs. But in the logging, one epoch has 4 times more iterations. This leads to a question if the dataloader is not distributed over multiple gpus, and how the iteration/epoch is counted. This would affect how the learning rate and warm up iterations are configured.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant