Question Answering NLP Task:

Gaining More from Less Data in out-of-domain Question Answering Models

We propose text augmentation techniques for Question Answering task in NLP that involves using synonyms with stochasticity on out-of-domain datasets (DuoRC and RACE and RelationExtraction) that are set to be 400 times smaller than the in-domain datasets (SQuAD, NewsQA, NaturalQuestions). We address ways to improve extraction of generalized information from out-of-domain or less available datasets from large pre-trained models like BERT or its variation DistilBERT which is used here with also being able to benefit from producing QA applications across domains. It is found that augmenting less available QA datasets in ways described, indicate improvement in generalization, but not all augmentations strategies are equally good. We find that these augmentations are helpful in achieving better performance on out-of-domain data.

Starter code for robustqa track

Download datasets from here
Setup environment with conda env create -f environment.yml
Train a baseline MTL system with python train.py --do-train --eval-every 2000 --run-name baseline
Evaluate the system on test set with python train.py --do-eval --sub-file mtl_submission.csv --save-dir save/baseline-01
Upload the csv file in save/baseline-01 to the test leaderboard. For the validation leaderboard, run python train.py --do-eval --sub-file mtl_submission_val.csv --save-dir save/baseline-01 --eval-dir datasets/oodomain_val

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Data Augmentation.ipynb		Data Augmentation.ipynb
README.md		README.md
Salman_CS224N__Project_Final_Report.pdf		Salman_CS224N__Project_Final_Report.pdf
args.py		args.py
convert_to_squad.py		convert_to_squad.py
environment.yml		environment.yml
robustqa paper-2.pdf		robustqa paper-2.pdf
robustqa paper.pdf		robustqa paper.pdf
subsample.py		subsample.py
train.py		train.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Question Answering NLP Task:

Gaining More from Less Data in out-of-domain Question Answering Models

Starter code for robustqa track

About

Releases

Packages

Languages

salman-moh/CS224N-RobustQA

Folders and files

Latest commit

History

Repository files navigation

Question Answering NLP Task:

Gaining More from Less Data in out-of-domain Question Answering Models

Starter code for robustqa track

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages