You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I used this code on torch.nn.BatchNorm2d like this import torch bn = torch.nn.BatchNorm2d(10) sum(p.numel() for p in bn.parameters() if p.requires_grad)
Last line returns 20, but torch.nn.BatchNorm2d also has running (moving) mean and variance as parameters, doesn't it?
so I thought the correct number of parameters on torch.nn.BatchNorm2d(10) is
the number of weight parameters = 10
the number of bias parameters = 10
the number of running mean parameters = 10
the number of running var parameters = 10
that is, 10 * 4 = 40.
so I'm appreciated if you explain this! thank you!
The text was updated successfully, but these errors were encountered:
Hi @hello-friend1242954
Weight and bias are parameters in the BN layer (they are updated during the back propagation). Running mean and variance are calculated during the forward pass, that's why, I think, they are not considered as parameters (since they do not require gradient). https://d2l.ai/chapter_convolutional-modern/batch-norm.html#training-deep-networks
Thank you for the reply!
I agree that we have to judge whether they are counted as parameters or not by considering if they require gradient or not,
but they also undoubtedly take up some static memory/storage spaces, right?
So I thought the definition of 'parameters' is very ambiguous.
Thank you for clear explanation!!
The number of parameters of each module is calculated by following code,
flops-counter.pytorch/ptflops/pytorch_engine.py
Lines 110 to 112 in 5f2a45f
I used this code on torch.nn.BatchNorm2d like this
import torch
bn = torch.nn.BatchNorm2d(10)
sum(p.numel() for p in bn.parameters() if p.requires_grad)
Last line returns 20, but torch.nn.BatchNorm2d also has running (moving) mean and variance as parameters, doesn't it?
so I thought the correct number of parameters on torch.nn.BatchNorm2d(10) is
the number of weight parameters = 10
the number of bias parameters = 10
the number of running mean parameters = 10
the number of running var parameters = 10
that is, 10 * 4 = 40.
so I'm appreciated if you explain this! thank you!
The text was updated successfully, but these errors were encountered: