-
Notifications
You must be signed in to change notification settings - Fork 17
miniproject: viral epidemics and organization
Vaishali Arora, Shweata N. Hegde
Simranleen Singh
- The scientific objective is to find out which are the most active organization to Viral Epidemic research.
- To retrieve valuable information about them from the Scientific Literature.
-
Using the communal corpus
Epidemic50noCov
on 50 articles. 🟩DONE
-
Subjecting them to binary Classification based on various parameters- related to viral epidemic or not, funders named or not and so on. 🟩
DONE
-
Rerunning the query to get a corpus of 950 articles. 🟩
DONE
-
Work on sectioning to filter the module of Acknowledgement or Funding in the paper, as it is the sole part of a scientific paper where Funders are more likely to occur. 🟩
DONE
-
Creating dictionary Funder using ami and SPARQL/Wikidata Query Service. 🟩
DONE
-
Using Machine Learning tools for entity extraction so that we can look for particular and very specific phrases, words and regex in those scientific papers. 🟪
NOT STARTED
-
Subjecting the spreadsheets to analysis in order to find which funders are the most active. 🟪
NOT STARTED
-
Initial Communal Corpus named
Epidemic50noCov
of 50 articles (https://github.com/petermr/openVirus/tree/master/miniproject/epidemic50noCov) -
Next, a new corpus of 950 articles using the Dictionary funders.
-
Downloaded the corpus of 950 articles using getpapers with the syntax:
getpapers -q "Funders in Viral epidemics" -x -k 950 -o mycorpus
-
Corpus 950 is now available at : https://github.com/petermr/openVirus/tree/master/miniproject/funder
❓ How I committing my corpus 950 :
Scroll down and see the section committing the corpus 950 to github.
-
Dictionary funder : https://github.com/petermr/openVirus/blob/master/dictionaries/test/funder.xml
-
How I created the dictionary? What source did I use? Overview to my dictionary?
https://github.com/petermr/openVirus/blob/master/dictionaries/funders/funders_dictionary.md
-
Updated on: September 18, 2020
-
Source: Crossref
-
Number of entries: ~17k
-
Method: SPARQL/Wikidata Query Service
-
Attributes in there: term, name, description, WikdataID, wikidataURL, wikipedia URL, crossrefID, country, synonyms
-
SPARQL query used:
#Funders
SELECT DISTINCT ?Funder ?FunderLabel ?FunderDescription ?FunderAltLabel ?Country ?CountryLabel ?instanceofLabel ?crossrefid ?wikipedia WHERE {
?Funder wdt:P3153 ?crossrefid;
wdt:P31 ?instanceof;
wdt:P17 ?Country.
OPTIONAL { ?wikipedia schema:about ?Funder; schema:isPartOf <https://en.wikipedia.org/> }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
LIMIT 20000000
-
SPARQL output: https://github.com/petermr/openVirus/blob/master/dictionaries/funders/sparql2ami/funder.sparql.xml
-
SPARQL output refined using ami SPARQL mapping:
Syntax used:
amidict -vv --dictionary funders --directory mydictionaries --input funder.sparql.xml create --informat wikisparqlxml --sparqlmap wikidataURL=Funder,term=FunderLabel,name=FunderLabel,country=CountryLabel,crossrefid=crossrefid,description=FunderDescription,wikipediaURL=wikipedia,wikidataURL=Funder --transformName wikidataID=EXTRACT(wikidataURL,.*/(.*)) --synonyms=wikidataAltLabel
Final dictionary: https://github.com/petermr/openVirus/blob/master/dictionaries/funders/sparql2ami/funder.xml
Click,here and it will take you to wikidata sparql query service.
amidict -vv --dictionary organization --directory _sparlendpoint --input sparql_organization.xml create --informat wikisparqlxml --sparqlmap wikidataURL=Organization,term=OrganizationLabel,name=OrganizationLabel,country=CountryLabel,crossrefIDs=crossrefIds,description=description,wikidataURL=Organization --transformName wikidataID=EXTRACT(wikidataURL,.*/(.*))
amidict --dictionary C:\Users\myPC\mydictionaries\funders(1).xml -v display --fields --validate
Generic values (DictionaryDisplayTool)
================================
--testString : d null
--wikilinks : d [Lorg.contentmine.ami.tools.AbstractAMIDictTool$WikiLink;@1ae7dc0
--fields : m []
--files : d []
--maxEntries : d 3
--remote : d [https://github.com/petermr/dictionary]
--validate : m true
--help : d false
--version : d false
--dictionary : d [C:\Users\myPC\mydictionaries\funders(1).xml]
--directory : d null
Specific values (DictionaryDisplayTool)
================================
list all fields
dictionaries from C:\Users\myPC\ContentMine\dictionaries
❓ Result : I checked the folder dictionaries as suggested in the above path. This folder was empty, Should I do something else or the software is built that way ?
1. ami
for the creation of dictionary, and sectioning : 🟩DONE
-
To download my corpus of 950 articles in XML format in the directory mini project:
-
Open the Command Prompt and give the syntax:
`getpapers -q "Funders in viral epidemic research" -o miniproject -f mycorpus/log.txt -k 950 `
-
To divide the CProject into sections, again open the Command Prompt and give the syntax in the Command prompt:
`ami -p miniproject section`
-
This will create a subfolder of sections in each folder of the scientific paper that is there in your directory.
-
Open the folder sections, you will get subfolders as - Front, Body, Back, etc.
-
This completes the sectioning of my Cproject.
2. ami search
ing, full.data.tables (https://github.com/petermr/openVirus/blob/master/miniproject/funder/full.dataTables.html) and
_cooccurrence
created for the dictionary funder (https://github.com/petermr/openVirus/tree/master/miniproject/funder/_cooccurrence)
🟩DONE
3. Jupyter Notebook
for machine learning and data mining. 🟨STARTED
4. Later, R
for analysis and to display the results graphically. 🟪NOT STARTED
- Installed Github desktop from : https://desktop.github.com
- Cloned the repository openVirus into my system using Gitbash command line : git clone https://github.com/petermr/openVirus.git
- Open the folder where you want to upload your CProject.
- Paste your project to the folder in openVirus repository(our remote repository) where you want to commit the files.
- Open the Github desktop.
- Go to 'File', then 'Add Local Repository'.
- Now, choose the openVirus repository from your system.
- Add a commit message and go to 'Commit to master'.
- After committing, go to 'Push to origin'.
- After completion of pushing the repository, your uploaded files can be viewed on the Github repository.
💡 Tip: Committing the corpus in parts of five will make the uploading easy.
-
Open command prompt and type :
`cd ami3` `git pull` `mvn clean install -Dmaven.test.skip=true `
-
Wait for some time till the command runs.
-
A BUILD SUCCESS message comes out in the command prompt.
💡 Tip: If you are getting BUILD FAILURE, then close the other command prompt if it is open on your system.
- The latest dictionary created at https://github.com/petermr/openVirus/blob/master/dictionaries/funders/sparql2ami/funder.xml is not valid as per the Schema.
- Issue of ami search documented here: https://github.com/petermr/openVirus/issues/85
- Core softwares:
- Node
- getpapers
- Java jdk
- Maven
- ami
- Optional softwares:
- KNIME
- R graphics
- Jupyter Notebook
- Github desktop
- Binary classification of Corpus 950 into True and False positives using different libraries in Python.
- Working on usage of
Jupyter Notebook
by looking into tutorials on the internet - Maintaining the dictionary FUNDERS so that merging of the dictionaries could become easier
- The latest dictionary created at https://github.com/petermr/openVirus/blob/master/dictionaries/funders/sparql2ami/funder.xml is not valid as per the Schema.
- Issue of ami search documented here: https://github.com/petermr/openVirus/issues/85
- Creating the corpus 950
- Ami search on the corpus 950
- Sectioning the corpus 950
- Creating the ami dictionary funder
- Creating the SPARQL dictionary funder
- Manual binary classification of corpus 50 "EpidemicnoCov50"
- Corpus 950 released
- Dictionary funder released
- Dictionary validation using ami
- Classifying first 50 papers from corpus 950 into True and False positives
- Smoke test on Jupyter Notebook
- Jupyter Notebook to create dictionary from a text file of funders
Simranleen Singh
- Under this project we are collecting useful data from authentic and global websites which are easily accessible and tabulating data so that it is clear to all that visits it.
- My miniproject is on Viral Epidemic and funders. So, It will be dealing with all the Funders from all over the world that provide funds to viral epidemic.
- My work would be first downloading useful software which will provide me easy access to the data which i am looking for and i will be able to download it and seggregate it whether it is useful to me or not.
- initially I have to installed node for the framework of installing other softwares.
- One of them is getpapers using the link and information provided by my mentor(given below in the reference) Reference : https://github.com/petermr/tigr2ess/blob/master/installation/INSTALLATION.md
Getpapers is a necessary software for this project as we have to download papers(of our need and subject) several paper in one go and here Getpapers helps us downloading that.
Installing getpapers.
Currently I am maintaining the dictionary of funders manually till my issue get solved.