miniproject: viral epidemics and organization

What organizations fund research on viral epidemics?

Owner:

Vaishali Arora, Shweata N. Hegde

Collaborators:

Simranleen Singh

Mini project summary:

The scientific objective is to find out which are the most active organization to Viral Epidemic research.
To retrieve valuable information about them from the Scientific Literature.

Methodology: 📌

Using the communal corpus Epidemic50noCov on 50 articles. 🟩DONE
Subjecting them to binary Classification based on various parameters- related to viral epidemic or not, funders named or not and so on. 🟩DONE
Rerunning the query to get a corpus of 950 articles. 🟩DONE
Work on sectioning to filter the module of Acknowledgement or Funding in the paper, as it is the sole part of a scientific paper where Funders are more likely to occur. 🟩DONE
Creating dictionary Funder using ami and SPARQL/Wikidata Query Service. 🟩DONE
Using Machine Learning tools for entity extraction so that we can look for particular and very specific phrases, words and regex in those scientific papers. 🟪NOT STARTED
Subjecting the spreadsheets to analysis in order to find which funders are the most active. 🟪NOT STARTED

Corpora: 📂

Initial Communal Corpus named Epidemic50noCov of 50 articles (https://github.com/petermr/openVirus/tree/master/miniproject/epidemic50noCov)
Next, a new corpus of 950 articles using the Dictionary funders.
Downloaded the corpus of 950 articles using getpapers with the syntax:

getpapers -q "Funders in Viral epidemics" -x -k 950 -o mycorpus
Corpus 950 is now available at : https://github.com/petermr/openVirus/tree/master/miniproject/funder

❓ How I committing my corpus 950 :

Scroll down and see the section committing the corpus 950 to github.

Dictionaries:

Dictionary funder : https://github.com/petermr/openVirus/blob/master/dictionaries/test/funder.xml
How I created the dictionary? What source did I use? Overview to my dictionary?

https://github.com/petermr/openVirus/blob/master/dictionaries/funders/funders_dictionary.md

Dictionary update: 🆕

Updated on: September 18, 2020
Source: Crossref
Number of entries: ~17k
Method: SPARQL/Wikidata Query Service
Attributes in there: term, name, description, WikdataID, wikidataURL, wikipedia URL, crossrefID, country, synonyms
SPARQL query used:

#Funders
 SELECT DISTINCT ?Funder ?FunderLabel ?FunderDescription ?FunderAltLabel ?Country ?CountryLabel ?instanceofLabel ?crossrefid ?wikipedia WHERE {
   ?Funder wdt:P3153 ?crossrefid;
     wdt:P31 ?instanceof;
     wdt:P17 ?Country.
   OPTIONAL { ?wikipedia schema:about ?Funder; schema:isPartOf <https://en.wikipedia.org/> }
   SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
 }
 LIMIT 20000000

SPARQL output: https://github.com/petermr/openVirus/blob/master/dictionaries/funders/sparql2ami/funder.sparql.xml
SPARQL output refined using ami SPARQL mapping:

Syntax used:

amidict -vv --dictionary funders --directory mydictionaries --input funder.sparql.xml create --informat wikisparqlxml --sparqlmap wikidataURL=Funder,term=FunderLabel,name=FunderLabel,country=CountryLabel,crossrefid=crossrefid,description=FunderDescription,wikipediaURL=wikipedia,wikidataURL=Funder --transformName wikidataID=EXTRACT(wikidataURL,.*/(.*)) --synonyms=wikidataAltLabel

Final dictionary: https://github.com/petermr/openVirus/blob/master/dictionaries/funders/sparql2ami/funder.xml

Refined sparql query

Click,here and it will take you to wikidata sparql query service.

Updated syntax to create the dictionary

amidict -vv --dictionary organization --directory _sparlendpoint  --input sparql_organization.xml create --informat wikisparqlxml --sparqlmap wikidataURL=Organization,term=OrganizationLabel,name=OrganizationLabel,country=CountryLabel,crossrefIDs=crossrefIds,description=description,wikidataURL=Organization --transformName wikidataID=EXTRACT(wikidataURL,.*/(.*))

Dictionary validation : ✅

amidict --dictionary C:\Users\myPC\mydictionaries\funders(1).xml -v display --fields --validate

Generic values (DictionaryDisplayTool)
================================
--testString        : d      null
--wikilinks         : d [Lorg.contentmine.ami.tools.AbstractAMIDictTool$WikiLink;@1ae7dc0
--fields            : m        []
--files             : d        []
--maxEntries        : d         3
--remote            : d [https://github.com/petermr/dictionary]
--validate          : m      true
--help              : d     false
--version           : d     false
--dictionary        : d [C:\Users\myPC\mydictionaries\funders(1).xml]
--directory         : d      null

Specific values (DictionaryDisplayTool)
================================
list all fields
dictionaries from C:\Users\myPC\ContentMine\dictionaries

❓ Result : I checked the folder dictionaries as suggested in the above path. This folder was empty, Should I do something else or the software is built that way ?

Tools & Softwares: 🛠

1. ami for the creation of dictionary, and sectioning : 🟩DONE

To download my corpus of 950 articles in XML format in the directory mini project:

Open the Command Prompt and give the syntax:

    `getpapers -q "Funders in viral epidemic research" -o miniproject -f mycorpus/log.txt -k 950 `

To divide the CProject into sections, again open the Command Prompt and give the syntax in the Command prompt:
```
       `ami -p miniproject section`
```
This will create a subfolder of sections in each folder of the scientific paper that is there in your directory.
Open the folder sections, you will get subfolders as - Front, Body, Back, etc.
This completes the sectioning of my Cproject.

2. ami searching, full.data.tables (https://github.com/petermr/openVirus/blob/master/miniproject/funder/full.dataTables.html) and _cooccurrence created for the dictionary funder (https://github.com/petermr/openVirus/tree/master/miniproject/funder/_cooccurrence) 🟩DONE

3. Jupyter Notebook for machine learning and data mining. 🟨STARTED

4. Later, R for analysis and to display the results graphically. 🟪NOT STARTED

Releasing the corpus 950 using Github desktop : 🟩`DONE`

Installed Github desktop from : https://desktop.github.com
Cloned the repository openVirus into my system using Gitbash command line : git clone https://github.com/petermr/openVirus.git
Open the folder where you want to upload your CProject.
Paste your project to the folder in openVirus repository(our remote repository) where you want to commit the files.
Open the Github desktop.
Go to 'File', then 'Add Local Repository'.
Now, choose the openVirus repository from your system.
Add a commit message and go to 'Commit to master'.
After committing, go to 'Push to origin'.
After completion of pushing the repository, your uploaded files can be viewed on the Github repository.

💡 Tip: Committing the corpus in parts of five will make the uploading easy.

Updating ami : 🟩`DONE`

Open command prompt and type :

 `cd ami3`
 `git pull`
 `mvn clean install -Dmaven.test.skip=true `

Wait for some time till the command runs.
A BUILD SUCCESS message comes out in the command prompt.

💡 Tip: If you are getting BUILD FAILURE, then close the other command prompt if it is open on your system.

Blockers: 🚫

The latest dictionary created at https://github.com/petermr/openVirus/blob/master/dictionaries/funders/sparql2ami/funder.xml is not valid as per the Schema.
Issue of ami search documented here: https://github.com/petermr/openVirus/issues/85

Software usage: 🔗

Core softwares:

Node
getpapers
Java jdk
Maven
ami

Optional softwares:

KNIME
R graphics
Jupyter Notebook
Github desktop

NOT STARTED:🟪

Binary classification of Corpus 950 into True and False positives using different libraries in Python.

STARTED:🟨

Working on usage ofJupyter Notebookby looking into tutorials on the internet
Maintaining the dictionary FUNDERS so that merging of the dictionaries could become easier

BLOCKED:🟥

The latest dictionary created at https://github.com/petermr/openVirus/blob/master/dictionaries/funders/sparql2ami/funder.xml is not valid as per the Schema.
Issue of ami search documented here: https://github.com/petermr/openVirus/issues/85

FINISHED:🟩

Creating the corpus 950
Ami search on the corpus 950
Sectioning the corpus 950
Creating the ami dictionary funder
Creating the SPARQL dictionary funder
Manual binary classification of corpus 50 "EpidemicnoCov50"
Corpus 950 released
Dictionary funder released
Dictionary validation using ami
Classifying first 50 papers from corpus 950 into True and False positives
Smoke test on Jupyter Notebook
Jupyter Notebook to create dictionary from a text file of funders

Summary:----

Submitted by-

Simranleen Singh

Introduction:

Under this project we are collecting useful data from authentic and global websites which are easily accessible and tabulating data so that it is clear to all that visits it.
My miniproject is on Viral Epidemic and funders. So, It will be dealing with all the Funders from all over the world that provide funds to viral epidemic.

Preliminary work:

My work would be first downloading useful software which will provide me easy access to the data which i am looking for and i will be able to download it and seggregate it whether it is useful to me or not.
initially I have to installed node for the framework of installing other softwares.
One of them is getpapers using the link and information provided by my mentor(given below in the reference) Reference : https://github.com/petermr/tigr2ess/blob/master/installation/INSTALLATION.md

Installation of getpapers:

Getpapers is a necessary software for this project as we have to download papers(of our need and subject) several paper in one go and here Getpapers helps us downloading that.

Blockers:

Installing getpapers.

Current work

Currently I am maintaining the dictionary of funders manually till my issue get solved.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

miniproject: viral epidemics and organization

What organizations fund research on viral epidemics?

Owner:

Collaborators:

Mini project summary:

Methodology: 📌

Corpora: 📂

Dictionaries:

Dictionary update: 🆕

Refined sparql query

Updated syntax to create the dictionary

Dictionary validation : ✅

Tools & Softwares: 🛠

Releasing the corpus 950 using Github desktop : 🟩`DONE`

Updating ami : 🟩`DONE`

Blockers: 🚫

Software usage: 🔗

NOT STARTED:🟪

STARTED:🟨

BLOCKED:🟥

FINISHED:🟩

Summary:----

Submitted by-

Introduction:

Preliminary work:

Installation of getpapers:

Blockers:

Current work

Clone this wiki locally

miniproject: viral epidemics and organization

What organizations fund research on viral epidemics?

Owner:

Collaborators:

Mini project summary:

Methodology: 📌

Corpora: 📂

Dictionaries:

Dictionary update: 🆕

Refined sparql query

Updated syntax to create the dictionary

Dictionary validation : ✅

Tools & Softwares: 🛠

Releasing the corpus 950 using Github desktop : 🟩DONE

Updating ami : 🟩DONE

Blockers: 🚫

Software usage: 🔗

NOT STARTED:🟪

STARTED:🟨

BLOCKED:🟥

FINISHED:🟩

Summary:----

Submitted by-

Introduction:

Preliminary work:

Installation of getpapers:

Blockers:

Current work

Clone this wiki locally

Releasing the corpus 950 using Github desktop : 🟩`DONE`

Updating ami : 🟩`DONE`