This repository has been archived by the owner on Nov 20, 2022. It is now read-only.
Releases: neuml/cord19q
Releases · neuml/cord19q
v1.3.0
Made the following changes to this process. Will move on trying to determine level of evidence within a study.
ETL
- Process 2020-03-27 dataset
- Investigated cord_uid but found that it had duplicate articles with the same sha but different cord_uid. Will continue using current id strategy.
- Changed reference field to use url instead of doi. Now includes 3000+ more urls for documents that didn't have a doi.
- Filter out full text section for COVID-19 resource centre boilerplate text to prevent tagging older, non-relevant documents
- Add section name to sections table to help with determining level of evidence of a study
- Rebuild vectors
Reports
- Add parameter for number of article results in output
- Add journal column
- Modify report.py and add methods to read data from list and write markdown output to string.
- Escape | with escape sequence in report.py
Kaggle Notebook
- Remove task reports from main notebook and add notebook per task. Link to each task from main notebook.
- Add report query notebook to allow building a report on an adhoc query
v1.2.0
Made a couple of updates to the backing project which will propagate to the notebook.
- Modified report formatting to conform with this discussion. Article results are now shown as a table.
- Added linguistic rules to identify sentence fragments and questions. These are not used in the embeddings index.
- Modified highlighting logic to require uniqueness within each bullet. Previously, there was a lot of duplication.
- Added abstract field to word vectors and models. Was only using full text previously.
v1.1.0
Added notebook version of cord19q to Kaggle
v1.0.0
Initial release of cord19q project