-
Notifications
You must be signed in to change notification settings - Fork 17
Documentation: Drug
A viral epidemic poses a threat to human life and the last resort to tackle it, on which the entire human race relies on, is the availability of drugs. Each drug has its own synonyms and it is called in different name in different parts of the world. With varieties of drugs available to us in different names, for example anti-viral drugs, palliative drugs and antibacterial drugs and so on, it has become difficult to track the role of the drug. While some drugs are proving to be effective antiviral there are drugs that only cure the symptoms. Hence, this miniproject provides the necessary tools that would provide complete information at one place to the public.
Objective of this drug mini project is to To create a dictionary consisting of drugs for diseases and its local name in different countries. Differentiate drugs that work directly against the pathogen and the drugs that only work on symptoms.
Get papers is a freely available tool which runs in command prompt. It collects all freely available research papers in full text and xml format to your local machine.The command getpapers will initiate the process and -q refers to the query which is to be searched. The query is entered in inverted commas as is done in "antiviral drugs". The next element is -o which refers to output directory and the parameter that follows it in the name of the directory which is drug_corpus in our case. Then, -x -p corresponds to xml and pdf files to be included in our search and -k 1000 limits our search to 1,000 files only.
Get papers used to create corpus of viral epidemic in drug
general code: getpapers -q <"project title "> -o <file name> -x -p -k <number of papers requied>
project code:
getpapers -q "viral epidemics and antiviral drugs" -o drug -x -p -k 800
Project code helps to build corpus of 750+ research articles with full text and xml file
ami is freely available software which is used to scrap and annotate research papers.
Ami section which is used to section the research papers into the front, body, back ,floats and groups. Sectioning of downloaded files will create a tree structure for us which will help in exploring the content of the file. Sectioning done using section function of ami .Which runs on command prompt.
General code:ami -p <cproject> section
Project code:
ami –p drug section
Ami search which search and analysis the terms in your project repository and gives the frequency is terms and the histogram of your corpus.
General code:ami –p <cprooject> search –dictionary <path>
Project code:
ami -p drug search --dictionary dict/country dict/disease dict/drug dict/virus dict/funder dict/testTrace dict/npi dict/zoonosis
(this code finds the country,disease,drug,virus,funder,test and traces,non pharmaceutical intervention,zoonosis mentioned in the research papers)
“SPARQL Protocol and RDF Query Language”, enables users to query information from databases or any data source that can be mapped to RDF. SPARQL WIKIDATA is used to collect data from Wikipedia data source in order to build dictionary. This query tells us the drug name, drug alternative name, drug formula, drug picture and the referral url in English, Hindi, Tamil, Sanskrit, Urdu, Spanish, Portuguese , Hausa, German.
SELECT ?wikidata ?wikidataLabel ?wikipedia ?wikidataAltLabel ?wikidataDescription ?wikidataformule ?wikidatapicture ?hindi ?hindiLabel ?hindialtlabel ?hindiwikipedia ?tamil ?tamilLabel ?tamilaltlabel ?tamilwikipedia ?sanskrit ?sanskritLabel ?sanskritaltLabel ?sanskritwikipedia ?spanish ?spanishLabel ?spanishaltLabel ?spanishwikipedia ?urdu ?urduLabel ?urdualtLabel ?urduwikipedia ?portuguese ?portugueseLabel ?portuguesealtLabel ?portuguesewikipedia ?hausa ?hausaLabel ?hausaaltLabel ?hausawikipedia ?german ?germanLabel ?germanaltLabel ?germanwikipedia WHERE {
?wikidata wdt:P31 wd:Q12140;
wdt:P274 ?wikidataformule;
wdt:P117 ?wikidatapicture.
OPTIONAL { ?wikipedia schema:about ?wikidata; schema:isPartOf <https://en.wikipedia.org/> }
OPTIONAL { ?hindiwikipedia schema:about ?wikidata; schema:isPartOf <https://hi.wikipedia.org/> }
OPTIONAL { ?tamilwikipedia schema:about ?wikidata; schema:isPartOf <https://ta.wikipedia.org/> }
OPTIONAL { ?sanskritwikipedia schema:about ?wikidata; schema:isPartOf <https://sa.wikipedia.org/> }
OPTIONAL { ?spanishwikipedia schema:about ?wikidata; schema:isPartOf <https://es.wikipedia.org/> }
OPTIONAL { ?portuguesewikipedia schema:about ?wikidata; schema:isPartOf <https://pt.wikipedia.org/> }
OPTIONAL { ?hausawikipedia schema:about ?wikidata; schema:isPartOf <https://ha.wikipedia.org/> }
OPTIONAL { ?germanwikipedia schema:about ?wikidata; schema:isPartOf <https://de.wikipedia.org/> }
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en".
## Selecting the prefered label
?wikidata skos:altLabel ?wikidataAltLabel ; rdfs:label ?wikidataLabel; schema:description ?wikidataDescription
}
SERVICE wikibase:label {
bd:serviceParam wikibase:language "hi".
## Selecting the prefered label
?wikidata skos:altLabel ?hindialtlabel .
?wikidata rdfs:label ?hindiLabel .
?wikidata schema:description ?hindi ;
}
SERVICE wikibase:label {
bd:serviceParam wikibase:language "ta".
## Selecting the prefered label
?wikidata skos:altLabel ?tamilaltlabel .
?wikidata rdfs:label ?tamilLabel .
?wikidata schema:description ?tamil ;
}
SERVICE wikibase:label {
bd:serviceParam wikibase:language "sa".
## Selecting the prefered label
?wikidata skos:altLabel ?sanskritaltlabel .
?wikidata rdfs:label ?sanskritLabel .
?wikidata schema:description ?sanskrit ;
}
SERVICE wikibase:label {
bd:serviceParam wikibase:language "es".
## Selecting the prefered label
?wikidata skos:altLabel ?spanishaltlabel .
?wikidata rdfs:label ?spanishLabel .
?wikidata schema:description ?spanish ;
}
SERVICE wikibase:label {
bd:serviceParam wikibase:language "ur".
## Selecting the prefered label
?wikidata skos:altLabel ?urdualtlabel .
?wikidata rdfs:label ?urduLabel .
?wikidata schema:description ?urdu ;
}
SERVICE wikibase:label {
bd:serviceParam wikibase:language "pt".
## Selecting the prefered label
?wikidata skos:altLabel ?portuguesealtlabel .
?wikidata rdfs:label ?portugueseLabel .
?wikidata schema:description ?portuguese ;
}
SERVICE wikibase:label {
bd:serviceParam wikibase:language "ha".
## Selecting the prefered label
?wikidata skos:altLabel ?hausaaltlabel .
?wikidata rdfs:label ?hausaLabel .
?wikidata schema:description ?hausa ;
}
SERVICE wikibase:label {
bd:serviceParam wikibase:language "de".
## Selecting the prefered label
?wikidata skos:altLabel ?germanaltlabel .
?wikidata rdfs:label ?germanLabel .
?wikidata schema:description ?german ;
}
}
Once results obtained. End sparql point from links and save the file in.xml extension.
Sparql mapping is used to validate in ami formate using ami dict.
Code:
amidict -vv --dictionary drug --directory dic --input sparql_drug9.xml create --informat wikisparqlxml --sparqlmap wikidataURL=wikidata,wikipediaURL=wikipedia,altNames=wikidataAltLabel,name=wikidataLabel,term=wikidataLabel,description=wikidatadescription,formulae=wikidataformule,picture=wikidatapicture,Hindi=hindiLabel,Hindi_description=hindi,Hindi_altNames=hindiAltLabel,Tamil=tamilLabel,Tamil_description=tamil,Tamil_altNames=tamilAltLabel,Urdu=urduLabel,Urdu_description=urdu,Urdu_altNames=urduAltLabel,Sanskrit=sanskritLabel,Sanskrit_description=sanskrit,Sanskrit_altNames=sanskritAltLabel,Spanish=spanishLabel,Spanish_description=spanish,Spanish_altNames=spanishAltLabel,Portuguese=portugueseLabel,Portuguese_description=portuguese,Portuguese_altNames=portugueseAltLabel,Hausa=hausaLabel,Hausa_description=hausa,Hausa_altNames=hausaAltLabel, German=germanLabel,German_description=german,German_altNames=germanAltLabel --transformName wikidataID=EXTRACT(wikidataURL,.*/(.*)) --synonyms=wikidataAltLabel
Collected freely available papers from EUROPMC. Once getpapers command executive, #Time taken 2:00 mins.
FIGURE 1: OUTPUT OF GETPAPERS
FULL RESULTS
Results of ami section. It sections the papers in the directory.#Time taken 1:30 mins.
FIGURE 2: OUTPUT OF AMI SECTION
Results are in the form of table , histogram and in the each folder results. #Time taken 1:00 mins.
FIGURE 3: OUTPUT OF AMI SEARCH IN FOLDER
FIGURE 4: OUTPUT OF AMI SEARCH IN TABLE WITH FREQUENCY
FULL RESULTS
FIGURE 5: ALL PLOTS OF .SVG FILE
FULL RESULTS
Results will be displayed. It contains Wikidata Id,molecular name,molecular formula,compound picture,molecular Alt Label,molecular description,wikipedia link(English,தமிழ்,हिन्दी,اردو,spanish,sanskrit,español,Português,Deutsche,Hausa).#Time taken less than a min.
FIGURE 5: RESULTS OF SPARQL
After end sparql point from links and save the file in.xml extension at local machine. File looks like. #Time taken 1:00 mins.
FIGURE 6: SPARQL.XML OUTPUT
FULL RESULTS
Refines the sparql out. Output looks like. #Time taken 1:00 mins.
FIGURE 7: ami dict OUTPUT
FULL RESULTS
# DISCLAMIER TIME TAKEN DEPENDS ON THE NETWORK
PAGE UNDER CONSTRUCTION