Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on the parameter count presented in Table 1 #3

Closed
Nikolai10 opened this issue Apr 13, 2023 · 4 comments
Closed

Question on the parameter count presented in Table 1 #3

Nikolai10 opened this issue Apr 13, 2023 · 4 comments

Comments

@Nikolai10
Copy link

Hello @jmliu206,

thank you very much for providing your interesting work.

Could you please explain in more detail how exactly you calculate the parameter number given in Table 1? According to your paper, your small model should have a parameter count of 44.96M, while I get about 76M when testing your code. I have created a colab to reproduce this result:

https://colab.research.google.com/drive/1KdwoC1i-TYMtc3akyuX83exipynKEE4v?usp=sharing

I have used the default setting with C=128 - probably I am just missing some details here...

I was also a bit surprised by the reported number of model parameters for SwinT-ChARM. According to Zhu et al., they have a total of 32.6M (Table 3), whereas you report 60.55M.

It would be great if you could provide further insights here.

Thanks in advance,
Nikolai

@jmliu206
Copy link
Owner

I apologize for the misunderstanding. In the code, N refers to 1/2 of C in the paper, as it represents the number of channels used to be input into the Tensformer and CNN networks respectively. Therefore, for the small model, N should be set as 64.

Regarding Swin-Charm, since the open-source code was not available when we finished our work, we reproduced the method based on the paper. The difference between our implementation and the open-source code may be the slice transform. Since the paper didn't say the output channels of the middle convolutional layers, we used the same convolutional layers of our method.

Sorry for any confusion caused. I'll update README to clarify both points.

@Nikolai10
Copy link
Author

Thanks a lot for your help. Now I get the following values with the Deepspeed Profiler/ get_model_profile():

N=64    - 441.38 G 215.32 GMACs 45.18 M (flops, macs, params)
N=96    - 865.73 G 425.09 GMACs 59.13 M (flops, macs, params)
N=128   - 1454.5 G 717.08 GMACs 76.57 M (flops, macs, params)

The number of parameters is now quite similar to the reported numbers. The Flops however are approximately twice as high compared to the reported numbers. Do you have any idea why? How are you profiling your model?

I also used a RTX 3090 GPU.

Thanks again!

@jmliu206
Copy link
Owner

jmliu206 commented Apr 14, 2023

For all methods in table 1, we use flops-counter.pytorch to caculate complexity.
Specifically, the Flops we report here should be MACs.
For most CV tasks, many papers identify FLOPs with MACs. Some versions of packages also mixed up Flops and MACs .

I think the following reference might be helpful:
open-mmlab/mmcv#785 (comment)
sovrasov/flops-counter.pytorch#16 (comment)
https://github.com/sovrasov/flops-counter.pytorch/blob/1ad0ed1999620c0170e5854dde39805d30d9b6aa/sample.py#L36
https://github.com/Lyken17/pytorch-OpCounter/tree/160004dd1535323d71763c93482d2a8f5f260301

@Nikolai10
Copy link
Author

Thank you very much :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants