这里使用fastNLP复现了几个著名的Matching任务的模型,旨在达到与论文中相符的性能。这几个任务的评价指标均为准确率(%).
复现的模型有(按论文发表时间顺序排序):
- CNTN:模型代码; 训练代码. 论文链接:Convolutional Neural Tensor Network Architecture for Community-based Question Answering.
- ESIM:模型代码; 训练代码. 论文链接:Enhanced LSTM for Natural Language Inference.
- DIIN:模型代码(still in progress); 训练代码(still in progress). 论文链接:Natural Language Inference over Interaction Space.
- MwAN:模型代码; 训练代码. 论文链接:Multiway Attention Networks for Modeling Sentence Pairs.
- BERT:模型代码; 训练代码. 论文链接:BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.
使用fastNLP复现的结果vs论文汇报结果,在前面的表示使用fastNLP复现的结果
'-'表示我们仍未复现或者论文原文没有汇报
model name | SNLI | MNLI | RTE | QNLI | Quora |
---|---|---|---|---|---|
CNTN 代码; 论文 | 77.79 vs - | 63.29/63.16(dev) vs - | 57.04(dev) vs - | 62.38(dev) vs - | - |
ESIM代码; 论文 | 88.13(glove) vs 88.0(glove)/88.7(elmo) | 77.78/76.49 vs 72.4/72.1* | 59.21(dev) vs - | 76.97(dev) vs - | - |
DIIN ; 论文 | - vs 88.0 | - vs 78.8/77.8 | - | - | - vs 89.06 |
MwAN 代码; 论文 | 87.9 vs 88.3 | 77.3/76.7(dev) vs 78.5/77.7 | - | 74.6(dev) vs - | 85.6 vs 89.12 |
BERT (BASE version)代码; 论文 | 90.6 vs - | - vs 84.6/83.4 | 67.87(dev) vs 66.4 | 90.97(dev) vs 90.5 | - |
*ESIM模型由MNLI官方复现的结果为72.4/72.1,ESIM原论文当中没有汇报MNLI数据集的结果。
Performance on Test set:
model name | ESIM | DIIN | MwAN | GPT1.0 | BERT-Large+SRL | MT-DNN |
---|---|---|---|---|---|---|
performance | 88.0 | 88.0 | 88.3 | 89.9 | 91.3 | 91.6 |
Performance on Test set:
model name | CNTN | ESIM | DIIN | MwAN | BERT-Base | BERT-Large |
---|---|---|---|---|---|---|
performance | 77.79 | 88.13 | - | 87.9 | 90.6 | 91.16 |
Performance on Test set(matched/mismatched):
model name | ESIM | DIIN | MwAN | GPT1.0 | BERT-Base | MT-DNN |
---|---|---|---|---|---|---|
performance | 72.4/72.1 | 78.8/77.8 | 78.5/77.7 | 82.1/81.4 | 84.6/83.4 | 87.9/87.4 |
Performance on Test set(matched/mismatched):
model name | CNTN | ESIM | DIIN | MwAN | BERT-Base |
---|---|---|---|---|---|
performance | 63.29/63.16(dev) | 77.78/76.49 | - | 77.3/76.7(dev) | - |
Still in progress.
Performance on Test set:
model name | BiLSTM | BiLSTM + Attn | BiLSTM + ELMo | BiLSTM + Attn + ELMo |
---|---|---|---|---|
performance | 74.6 | 74.3 | 75.5 | 79.8 |
*这些LSTM-based的baseline是由QNLI官方实现并测试的。
model name | GPT1.0 | BERT-Base | BERT-Large | MT-DNN |
---|---|---|---|---|
performance | 87.4 | 90.5 | 92.7 | 96.0 |
Performance on Dev set:
model name | CNTN | ESIM | DIIN | MwAN | BERT |
---|---|---|---|---|---|
performance | 62.38 | 76.97 | - | 74.6 | - |
Still in progress.