Skip to content

Reference

Chenghao MOU edited this page Oct 22, 2019 · 1 revision

Baseline models

Models Size Category
Bert base 110 M base
Bert large 340 M large
openai gpt 110 M base
GPT2 117 M weird large
XLM >= 295 M super large
XLnet 110 M base
XLNet large 340 M large
roberta 125 M base
roberta large 355 M large
distilbert 60 M small

It is impossible to fit super large models in P100s on HPC. Weird large models are base models eating memory like a large one.

Baseline Scores

Models aNLI hellaswag piqa siqa Config Commit
Bert (bert-base-cased) 63.32 37.83 65.29 60.33 commit
Bert (bert-large-cased) 66.28 43.84 68.67 65 commit
RoBERTa (roberta-base) 71.54 58.51 48.03 69.09 commit
RoBERTa (roberta-large) 84.39 82.42 76.96 77.12 commit
XLNet (xlnet-base-cased) 68.15 52.99 52.94 65.79 commit
XLNet (xlnet-large-cased) 80.16 80.38 69.27 75.23 commit
GPT (openai-gpt) 64.23 38.15 67.11 61.73 commit
GPT2 (gpt2) 53.46 26.52 48.05 35.16 commit
DistilBERT (distilbert-base-uncased) 60.17 35.57 64.96 52.92 commit

Fine-tuning Time Reference

With two P100s on HPC, it takes the following time to fine tune a model.

Tasks Base Model(3 epochs) Large Model(3 epochs)
aNLI 1 ~ 2 hrs ~ 7 hrs
hellaswag 6 ~ 8 hrs 24 hrs
physicaliqa 1 hr 3 ~ 4 hrs
socialiqa 1 hr 4 ~ 5 hrs
Clone this wiki locally