Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

questions about MATE-KD #2

Open
jinxinglu opened this issue Aug 18, 2022 · 0 comments
Open

questions about MATE-KD #2

jinxinglu opened this issue Aug 18, 2022 · 0 comments

Comments

@jinxinglu
Copy link

hi, the mate-kd is an excellent work on NLP KD. Here I have a question about the codes of this paper.

In the section 4.1 of the paper, the authors said that two different teacher models (Roberta large and BERT base) were used in the two steps, but the codes showed that only one teacher model is used. Is it right?

on the other hand, the two steps should be trained separately? But the codes showed that in the training procedure, 10 steps for updating the params of generator, then 100 steps for updating the student model. That makes me feel wired.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant