miniproject: viral epidemics and disease

What diseases co-occur with viral epidemics?

owner:

Priya

collaborators:

Dheeraj Kumar

Aishwarya

miniproject summary

Please read the INITIAL SUMMARY section first, if you have any difficulties in this section.

proposed activities

Use the communal corpus epidemic50noCov consisting of 50 articles. CREATED
Scrutinizing the 50 articles to know the true positives and false positives, that is, whether the articles are about viral epidemic or not. FINISHED
Using ami search to find whether the articles mentioned any comorbidity in a viral epidemic or not, annotating with dictionaries to create ami DataTables. FINISHED
Sectioning the articles using ami:section to extract the relevant information on comorbidity. FINISHED
Refining and rerunning the query to get a corpus of 950 articles. CREATED
Scrutinizing the 950 articles for true positives and false positives and creating a spreadsheet. PROGRESSING
Using ami search to create DataTables and ami section for sectioning the 950 articles. FINISHED
Using relevant ML technique for the classification of data whether the articles are based on viral epidemic and the diseases/disorders that co-occur. PROGRESSING
Creating a dashboard of knowledge, especially with an annotated map. NOT STARTED

outcomes

A spreadsheet will be developed based on the comorbidity during a viral epidemic and their count;

for 50 articles in epidemic50noCov. FINISHED
for 950 articles in disease corpus. PROGRESSING

Development of the ML model for data classification on accuracy. PROGRESSING
Annotated map with the obtained data. NOT STARTED

corpora `CREATED`

English

Initially the communal corpus epidemic50noCov will be used. (A small test corpus for using the large corpus disease)
Later a corpus of 950 articles created in disease corpus, using the syntax getpapers -q "viral epidemics AND human NOT COVID NOT corona virus NOT SARS-Cov-2" -o disease -f disease/log.txt -k 950 -x -p, will be used.

Spanish

For testing the Spanish disease dictionary (in order to create further other language dictionaries), it was created from Redalyc

dictionaries

Disease [Details]
Valid Disease Dictionary [Details]

software

getpapers to create the corpus of 950 articles by downloading from EPMC.
AMI for creating DataTables, creating and using dictionaries, sectioning.
SPARQL for creating dictionaries.
Jupyter Notebook [Python] for binary classification & display.

constraints

Respective pages

50 articles corpus epidemic50noCov at - https://github.com/petermr/openVirus/tree/master/miniproject/epidemic50noCov
950 articles corpus disease at - https://github.com/petermr/openVirus/tree/master/miniproject/disease
for getpapers - https://github.com/petermr/openVirus/wiki/getpapers#tester-2
for installing ami - https://github.com/petermr/ami3/wiki/ami-installation
for updating ami - https://github.com/petermr/openVirus/wiki/Tools:-ami3#updating-ami3
for amidict/dictionary validation - https://github.com/petermr/openVirus/wiki/Tools:-ami3#amidict-validation
for ami search - https://github.com/petermr/openVirus/wiki/ami-search
for ami section - https://github.com/petermr/openVirus/wiki/ami:section
for SPARQL - https://github.com/petermr/openVirus/wiki/Tools-:-SPARQL
for ML technique jupyter notebook is used - https://github.com/petermr/openVirus/wiki/Jupyter-Notebooks#data-preparation-for-ml
the Spanish corpus is at - https://github.com/petermr/openVirus/tree/master/miniproject/disease/spanish

Initial Summary

(by collaborator Dheeraj)

The aim of the mini-project

Our aim first of all, is that if we recognize diseases, then we will be able to give medicines for them. In this mini project, we will be able to find diseases with the help of disease dictionary (from open access articles) in accordance to "viral epidemic" by using ContentMine software(getpapers and ami).

Resources

Dictionary

The names of all diseases are updated in the dictionary of diseases which are helpful in searching particular diseases' words in the articles, just like the dictionary contains a store of words.
It's source is ICD-10(by WHO) and Wikidata and it was created using ami.
It's a multilingual dictionary ( contains english,hindi,tamil,Kannada,Spanish, Portuguese)

Corpus 950 (disease)

This is a group of articles which is based on viral epidemics and diseases. These articles contain information regarding diseases which are to be simplified.
This is a group of 950 articles that have been downloaded from EPMC via getpapers.

EPMC

This is a Pub Med Central website with a lot of scientific research knowledge articles. We are analyzing some of the open access articles from EPMC for our mini-project, which are downloaded using getpapers.

Tools

getpapers

It is a ContentMine software capable of downloading large number of articles from Eupmc.
See https://github.com/petermr/openVirus/wiki/getpapers#use-of-getpapers for using.

ami

It is also a ContentMine software. It is used in creating a dictionary. It is useful for searching particular diseases' words that are updated in dictionary, sectioning downloaded articles and gathering information from them.
Like in this, we have created a dictionary of disease.

Wikidata `SPARQL`

The query service by wikidata. It has everything included from Wikipedia and even more.
In this mini project we needed ICD-10 code for Diseases and wanted the result in different languages.
We obtained primarily the following result. CLICK HERE results in four languages.

Work done

I have read about getpapers and EPMC and also I have read about advanced search in EPMC and reading its articles too.
I read wikidata and learned to update the dictionary.
Also updated the Dictionary with the help of Wikidata Query Service with the ICD-10 codes.
So far I have manually classified some articles as True and False Positives.
Created a SPARQL query for multilingual(six languages) disease dictionary.

My goal

As said that if diseases are known, then we can give medicines accordingly. Therefore, our main goal will be to find out the names of diseases that co-occur during viral epidemics and work accordingly.
Now have to manually classify all the articles into true positive and false positive.

Challenging

Learning Python code in Jupyter Notebook to use in binary classification.

Issue Rectification

Splitting 950 corpus for `ami search`

The 950 article corpus was large in size and hence using ami search popped the OutOfMemoryError.
Hence, the disease corpus (Cproject) was split into 4-parts consisting of 200-250 Ctrees.
Then, ami search was used in each parts successfully, which created DataTables.
The test details at https://github.com/petermr/openVirus/wiki/ami-search#running-ami-search-in-disease-dictionary

`_cooccurence` folder

Primarily in Windows amisearch created an empty _cooccurence folder.
After debugging, AMI was updated which gave the desired result in _cooccurence folder.
Thus the error was rectified.

Update

Uploading corpus to GitHub

(Reference from Ambreen's update )

Download VS code and clone the openVirus repository into your system.
Open the openVirus folder in VS code (don't close it).
Now open your openVirus folder in your directory and make your changes in it.
Reopen the VS code that was minimized. Now commit the changes by selecting the commit symbol. It might take time with respect to your size of uploading files.
After adding the remote repository, push the changes to GitHub. See this video for other clarification.

NOTE : If already had cloned the repository, first pull the repo and then push the changes.

Using Valid dictionary for `ami search`

The syntax used in above ami search used the in-build disease dictionary.
To use the Valid Disease Dictionary, the whole path must be specified in the syntax as follows:

ami -p <Cproject> search --dictionary openVirus/cambiohack2020/dictionaries/disease.xml

NOTE : <Cproject> must be replaced by the name of your Cproject, the one that contain Ctrees.

Spanish corpus RESULTS

The Spanish dictionary created, gave the results [here] on using the ami search.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

miniproject: viral epidemics and disease

What diseases co-occur with viral epidemics?

owner:

collaborators:

miniproject summary

proposed activities

outcomes

corpora `CREATED`

dictionaries

software

constraints

Respective pages

Initial Summary

The aim of the mini-project

Resources

Dictionary

Corpus 950 (disease)

EPMC

Tools

getpapers

ami

Wikidata `SPARQL`

Work done

My goal

Challenging

Issue Rectification

Splitting 950 corpus for `ami search`

`_cooccurence` folder

Update

Uploading corpus to GitHub

Using Valid dictionary for `ami search`

Spanish corpus RESULTS

Clone this wiki locally

miniproject: viral epidemics and disease

What diseases co-occur with viral epidemics?

owner:

collaborators:

miniproject summary

proposed activities

outcomes

corpora CREATED

dictionaries

software

constraints

Respective pages

Initial Summary

The aim of the mini-project

Resources

Dictionary

Corpus 950 (disease)

EPMC

Tools

getpapers

ami

Wikidata SPARQL

Work done

My goal

Challenging

Issue Rectification

Splitting 950 corpus for ami search

_cooccurence folder

Update

Uploading corpus to GitHub

Using Valid dictionary for ami search

Spanish corpus RESULTS

Clone this wiki locally

corpora `CREATED`

Wikidata `SPARQL`

Splitting 950 corpus for `ami search`

`_cooccurence` folder

Using Valid dictionary for `ami search`