Skip to content
This repository has been archived by the owner on Nov 20, 2022. It is now read-only.

Releases: neuml/cord19q

v1.3.0

27 May 21:43
Compare
Choose a tag to compare

Made the following changes to this process. Will move on trying to determine level of evidence within a study.

ETL

  • Process 2020-03-27 dataset
  • Investigated cord_uid but found that it had duplicate articles with the same sha but different cord_uid. Will continue using current id strategy.
  • Changed reference field to use url instead of doi. Now includes 3000+ more urls for documents that didn't have a doi.
  • Filter out full text section for COVID-19 resource centre boilerplate text to prevent tagging older, non-relevant documents
  • Add section name to sections table to help with determining level of evidence of a study
  • Rebuild vectors

Reports

  • Add parameter for number of article results in output
  • Add journal column
  • Modify report.py and add methods to read data from list and write markdown output to string.
  • Escape | with escape sequence in report.py

Kaggle Notebook

  • Remove task reports from main notebook and add notebook per task. Link to each task from main notebook.
  • Add report query notebook to allow building a report on an adhoc query

v1.2.0

27 May 21:41
Compare
Choose a tag to compare

Made a couple of updates to the backing project which will propagate to the notebook.

  • Modified report formatting to conform with this discussion. Article results are now shown as a table.
  • Added linguistic rules to identify sentence fragments and questions. These are not used in the embeddings index.
  • Modified highlighting logic to require uniqueness within each bullet. Previously, there was a lot of duplication.
  • Added abstract field to word vectors and models. Was only using full text previously.

v1.1.0

27 May 21:41
Compare
Choose a tag to compare

Added notebook version of cord19q to Kaggle

v1.0.0

27 May 21:39
Compare
Choose a tag to compare

Initial release of cord19q project