questions about MATE-KD #2

jinxinglu · 2022-08-18T02:47:04Z

hi, the mate-kd is an excellent work on NLP KD. Here I have a question about the codes of this paper.

In the section 4.1 of the paper, the authors said that two different teacher models (Roberta large and BERT base) were used in the two steps, but the codes showed that only one teacher model is used. Is it right?

on the other hand, the two steps should be trained separately? But the codes showed that in the training procedure, 10 steps for updating the params of generator, then 100 steps for updating the student model. That makes me feel wired.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

questions about MATE-KD #2

questions about MATE-KD #2

jinxinglu commented Aug 18, 2022

questions about MATE-KD #2

questions about MATE-KD #2

Comments

jinxinglu commented Aug 18, 2022