GOFindBias is developed to provide the user with some insightful statistics about the GAF file to determine if the conclusions on the gene ontology studies can be biased because of abstract terms or the high throughput experiments(1)..
Statistics Provided by the tool are as follows:
- Shannon's equitability.
- Top 'n' PubMed and GO terms.
- KS test to compare two different GAF files.
- Mutual terms between two GAF files which are among 'n' most frequent terms and their respective frequencies.
Modules are available in most GNU/Linux distributions, or from their respective websites.
Installing from source
git clone https://github.com/Pranavkhade/GOFindBias
cd GOFindBias
python setup.py install
Installing with pip
pip install GOFindBias
OR
pip install git+git://github.com/Pranavkhade/GOFindBias
- Collect the GAF file you wish to analyse. For reference GAF files you can visit ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/
- You can also use the GAF files obtained as an output from the debias program.
- Instructions and help for the parameter is as follows:
usage: GOFindBias.py [-h]
(-i GAF_FILE [GAF_FILE ...] | -cmpr FILENAME FILENAME)
[-ls 1 OR O] [-ts TOP]
[-e EVIDENCE_CODE [EVIDENCE_CODE ...]]
GOFindBias is an analytical tool built to analyse the .gaf files. Please
visit: https://github.com/Pranavkhade/GOFindBias for more details.
optional arguments:
-h, --help show this help message and exit
-i GAF_FILE [GAF_FILE ...], --input GAF_FILE [GAF_FILE ...]
Names of the input GAF file(s).
-cmpr FILENAME FILENAME, --compare FILENAME FILENAME
Names of the two GAF files
-ls 1 OR O, --logscale 1 OR O
For graphs, 0: Counts in normal scale 1: Counts in log
scale [default=0]
-ts TOP, --topstat TOP
Top n statistics sorted from highest to lowest
[default=10]
-e EVIDENCE_CODE [EVIDENCE_CODE ...], --evidence EVIDENCE_CODE [EVIDENCE_CODE ...]
Accepts Standard Evidence Codes outlined in
(http://geneontology.org/page/guide-go-evidence-
codes). All 3 letter code for each standard evidence
is acceptable. In addition to that EXPEC is accepted
which will pull out all annotations which are made
experimentally. COMPEC will extract all annotations
which have been done computationally. Similarly,
AUTHEC and CUREC are also accepted.
-
GOFindBias -i test/2014.gaf -ls 1 -ts 10
This command will parse 2014.gaf file for the analysis and all the GO Term counts will be represented on the Natural Log scale for better comparitive visualisation of the data. The last argument is the number of top 'n' entries with highest count in the GAF file. The output of graphs will be posted in the/graph_output
folder with names corrosponding to GO/PubMed ID count and the ontology level (F/C/P). File namedShannon's_Statistics.txt
will have the information about the diversity of a given .gaf file. -
GOFindBias -cmpr test/2014.gaf test/2015.gaf -ts 50
This command will compare two files and will give KS(non-parametric) Test p-value. Along with it, it will createCOMPARE.txt
having common GO terms and PMID between top 50(n) most frequent terms from each GAF file. -
GOFindBias -i test/2014.gaf test/2015.gaf test/2016.gaf -e EXPEC
This command will parse all the mentioned files and will fetch statistics for only Experimental Evidence Codes.
- If you are using Anaconda environment, make sure that the Python reads libraries from "~anaconda2/lib/python2.7/site-packages/lib". You can also simply copy the files in that location to an appropriate path where other python libraries are readable (importable) by Python.
- You can find few test .gaf files in the /lib/test/ folders.