Reference

Baseline models

It is impossible to fit super large models in P100s on HPC. Weird large models are base models eating memory like a large one.

Models	aNLI	hellaswag	piqa	siqa	Config Commit
Bert (bert-base-cased)	63.32	37.83	65.29	60.33	commit
Bert (bert-large-cased)	66.28	43.84	68.67	65	commit
RoBERTa (roberta-base)	71.54	58.51	48.03	69.09	commit
RoBERTa (roberta-large)	84.39	82.42	76.96	77.12	commit
XLNet (xlnet-base-cased)	68.15	52.99	52.94	65.79	commit
XLNet (xlnet-large-cased)	80.16	80.38	69.27	75.23	commit
GPT (openai-gpt)	64.23	38.15	67.11	61.73	commit
GPT2 (gpt2)	53.46	26.52	48.05	35.16	commit
DistilBERT (distilbert-base-uncased)	60.17	35.57	64.96	52.92	commit

With two P100s on HPC, it takes the following time to fine tune a model.