Skip to content

Latest commit

 

History

History
 
 

matching

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Matching任务模型复现

这里使用fastNLP复现了几个著名的Matching任务的模型,旨在达到与论文中相符的性能。这几个任务的评价指标均为准确率(%).

复现的模型有(按论文发表时间顺序排序):

数据集及复现结果汇总

使用fastNLP复现的结果vs论文汇报结果,在前面的表示使用fastNLP复现的结果

'-'表示我们仍未复现或者论文原文没有汇报

model name SNLI MNLI RTE QNLI Quora
CNTN 代码; 论文 77.79 vs - 63.29/63.16(dev) vs - 57.04(dev) vs - 62.38(dev) vs - -
ESIM代码; 论文 88.13(glove) vs 88.0(glove)/88.7(elmo) 77.78/76.49 vs 72.4/72.1* 59.21(dev) vs - 76.97(dev) vs - -
DIIN ; 论文 - vs 88.0 - vs 78.8/77.8 - - - vs 89.06
MwAN 代码; 论文 87.9 vs 88.3 77.3/76.7(dev) vs 78.5/77.7 - 74.6(dev) vs - 85.6 vs 89.12
BERT (BASE version)代码; 论文 90.6 vs - - vs 84.6/83.4 67.87(dev) vs 66.4 90.97(dev) vs 90.5 -

*ESIM模型由MNLI官方复现的结果为72.4/72.1,ESIM原论文当中没有汇报MNLI数据集的结果。

数据集复现结果及其他主要模型对比

SNLI

Link to SNLI leaderboard

Performance on Test set:

model name ESIM DIIN MwAN GPT1.0 BERT-Large+SRL MT-DNN
performance 88.0 88.0 88.3 89.9 91.3 91.6

基于fastNLP复现的结果

Performance on Test set:

model name CNTN ESIM DIIN MwAN BERT-Base BERT-Large
performance 77.79 88.13 - 87.9 90.6 91.16

MNLI

Link to MNLI main page

Performance on Test set(matched/mismatched):

model name ESIM DIIN MwAN GPT1.0 BERT-Base MT-DNN
performance 72.4/72.1 78.8/77.8 78.5/77.7 82.1/81.4 84.6/83.4 87.9/87.4

基于fastNLP复现的结果

Performance on Test set(matched/mismatched):

model name CNTN ESIM DIIN MwAN BERT-Base
performance 63.29/63.16(dev) 77.78/76.49 - 77.3/76.7(dev) -

RTE

Still in progress.

QNLI

From GLUE baselines

Link to GLUE leaderboard

Performance on Test set:

LSTM-based

model name BiLSTM BiLSTM + Attn BiLSTM + ELMo BiLSTM + Attn + ELMo
performance 74.6 74.3 75.5 79.8

*这些LSTM-based的baseline是由QNLI官方实现并测试的。

Transformer-based

model name GPT1.0 BERT-Base BERT-Large MT-DNN
performance 87.4 90.5 92.7 96.0

基于fastNLP复现的结果

Performance on Dev set:

model name CNTN ESIM DIIN MwAN BERT
performance 62.38 76.97 - 74.6 -

Quora

Still in progress.