https://biendata.com/competition/wsdm2020/
ID: @nlp-rabbit
Python >= 3.6
pip3 install -r requirements.txt
python3 -m spacy download en
-
setup elasticsearch service, refer to link
-
setting value
ES_BASE_URL
in constants.py with your configured elastic search endpoint.
- unzip file and put all files under
data/
folder, renametest.csv
totest_release.csv
- execute
bash scripts/prepare_data.sh
in project root folder to build the data for next step
-
put the model into
data/models/rerank_model.model
-
execute
bash scripts/run_end2end.sh
in project root folder
the above script includes three main parts
-
execute elasticsearch to retrieval candidate papers
-
prepare rerank data from elastic search result (baseline result)
-
execute the rerank by BERT
-
recall phase
noun chunk extraction + textrank keyword extraction + BM25 based search (elasticsearch)
-
rerank phase
Bert based rerank (SciBert from AllenAI)
The model required to be trained in this project just the Bert based reranking model