-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question on the parameter count presented in Table 1 #3
Comments
I apologize for the misunderstanding. In the code, N refers to 1/2 of C in the paper, as it represents the number of channels used to be input into the Tensformer and CNN networks respectively. Therefore, for the small model, N should be set as 64. Regarding Swin-Charm, since the open-source code was not available when we finished our work, we reproduced the method based on the paper. The difference between our implementation and the open-source code may be the slice transform. Since the paper didn't say the output channels of the middle convolutional layers, we used the same convolutional layers of our method. Sorry for any confusion caused. I'll update README to clarify both points. |
Thanks a lot for your help. Now I get the following values with the Deepspeed Profiler/ get_model_profile():
The number of parameters is now quite similar to the reported numbers. The Flops however are approximately twice as high compared to the reported numbers. Do you have any idea why? How are you profiling your model? I also used a RTX 3090 GPU. Thanks again! |
For all methods in table 1, we use flops-counter.pytorch to caculate complexity. I think the following reference might be helpful: |
Thank you very much :) |
Hello @jmliu206,
thank you very much for providing your interesting work.
Could you please explain in more detail how exactly you calculate the parameter number given in Table 1? According to your paper, your small model should have a parameter count of 44.96M, while I get about 76M when testing your code. I have created a colab to reproduce this result:
https://colab.research.google.com/drive/1KdwoC1i-TYMtc3akyuX83exipynKEE4v?usp=sharing
I have used the default setting with C=128 - probably I am just missing some details here...
I was also a bit surprised by the reported number of model parameters for SwinT-ChARM. According to Zhu et al., they have a total of 32.6M (Table 3), whereas you report 60.55M.
It would be great if you could provide further insights here.
Thanks in advance,
Nikolai
The text was updated successfully, but these errors were encountered: