Kraken2-output-manipulation

Kraken2 output generates a report for each datasets, this script takes these individual output reports and combines them to one file in the formal

Taxa ID	sample1	sample2	sample3	sample4	.....
1234	1909	10	100	0	.....

or output

Taxa	sample1	sample2	sample3	sample4	.....
Pseudomonas	1909	10	100	0	.....

The Taxa ID number is the same as the column5 in the kraken2 output report, " NCBI taxonomic ID number". The numbers under sample1, sample2, ... etc can be determined by the user to select which value they would like reported

col1: Percentage of fragments covered by the clade rooted at this taxon
col2: Number of fragments covered by the clade rooted at this taxon
col3: Number of fragments assigned directly to this taxon
col4: A rank code
col5: NCBI taxonomic ID number (this doesnt make sense to report for every column but its possible)
col6: Indented scientific name

For documentation on Krkaen2 installation and maunal, here is a link to kraken2 documentation

Pre-requisites

Run Kraken2 command and generate a report
kraken2 --db $KRAKEN_DB --paired read_1.fastq read_2.fastq --threads 1 --use-names --report kraken_report --report-zero-counts --output kraken.out

Make sure to add the parameter --report-zero-counts in the kraken2 command, if this paramter is not added, then the result from this script will be hard to parse through.

Dependecies

Python 3
Python packages

numpy
scipy
argparse
pandas
collection

Usage

python kraken-multiple.py --help
usage: kraken-multiple.py [-h] [-d DIRECTORY] [-r {U,R,D,K,P,C,O,F,G,S}] [-c {1,2,3,4,5,6}] [-o OUTPUT]

Take multiple kraken output files and consolidate them to one output

optional arguments:
-h, --help show this help message and exit
-d DIRECTORY Enter a directory with kraken summary reports
-r {U,R,D,K,P,C,O,F,G,S} Enter a rank code
-c {1,2,3,4,5,6} Enter the column number in the report you would like to include in the output
-o OUTPUT Enter the output file name

For getting taxa information instead of taxa ID

python kraken-multiple-taxa.py --help
usage: kraken-multiple-taxa.py [-h] [-d DIRECTORY] [-r {U,R,D,K,P,C,O,F,G,S}] [-c {1,2,3,4,5,6}] [-o OUTPUT]

Take multiple kraken output files and consolidate them to one output

optional arguments:
-h, --help show this help message and exit
-d DIRECTORY Enter a directory with kraken summary reports
-r {U,R,D,K,P,C,O,F,G,S} Enter a rank code
-c {1,2,3,4,5,6} Enter the column number in the report you would like to include in the output
-o OUTPUT Enter the output file name

Detailed usage description

The input for this script is

directory with kraken reports only. Use the -d flag to point to this directory. The format of the output should be 39.87 290930 290930 U 0 unclassified 60.13 438756 117 R 1 root 59.67 435435 723 R1 131567 cellular organisms 58.38 425979 4039 D 2 Bacteria 33.55 244810 2293 P 1224 Proteobacteria 16.06 117202 1091 C 28211 Alphaproteobacteria
rank, since the kraken report includes results for each rank ranging from "(U)nclassified, (R)oot, (D)omain, (K)ingdom, (P)hylum, (C)lass, (O)rder, (F)amily, (G)enus, or (S)pecies." The user can define the level of rank they would like to look at. The flag -r helps define this paramater
The -c flag sets the which column the user would like reported in the final report.
- col1: Percentage of fragments covered by the clade rooted at this taxon
- col2: Number of fragments covered by the clade rooted at this taxon
- col3: Number of fragments assigned directly to this taxon
- col4: A rank code
- col5: NCBI taxonomic ID number (this doesnt make sense to report for every column but its possible)
- col6: Indented scientific name
output, The -o flag sets the name of the output file to write the final report.

Example command

python kraken-multiple.py -d kraken_report/ -r F -c 2 -o kraken-report-final

Output

Taking a look at the kraken-report-final
TaxaID ['sample1','sample2','sample3','sample4','sample5','sample6']
135621 ['210', '859', '2843', '595', '281', '1064']
468 ['80', '359', '1054', '361', '164', '299']
72275 ['66', '1838', '4664', '462', '75', '2074']
267888 ['45', '1407', '59440', '930', '120', '79']

If you ran the kraken-multiple-taxa.py, then the output will be
Taxa ['sample1','sample2','sample3','sample4','sample5','sample6']
Actinomycetaceae ['210', '859', '2843', '595', '281', '1064']
Budviciaceae ['80', '359', '1054', '361', '164', '299']
Mycoplasmataceae ['66', '1838', '4664', '462', '75', '2074']
Vibrionaceae ['45', '1407', '59440', '930', '120', '79']

Downstream

Run the bash command to generate a csv file that can be easily imported to R

sed -e "s/\[//g;s/\]//g;s/'//g;s|\t|,|g" kraken_report_all >kraken_report_all_table.csv

TaxaID	sample1	sample2	sample3	sample4	sample5	sample6
135621	210	859	2843	595	281	1064
468	'80	359	1054	361	164	299
72275	66	1838	4664	462	75	2074
267888	45	1407	59440	930	120	79

Visualization

In the directory Rscripts, there are two scripts

Rarefaction-curves.R : that takes the output from this program to plot the rarefaction curve for the samples
Ordination_plots.R : that takes the output from this program to plot ordination plots for the samples

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
Rscripts		Rscripts
CHANGES.md		CHANGES.md
CITATAION		CITATAION
INSTALLATION.md		INSTALLATION.md
LICENSE		LICENSE
README.md		README.md
VERSION		VERSION
kraken-multiple-taxa.py		kraken-multiple-taxa.py
kraken-multiple.py		kraken-multiple.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kraken2-output-manipulation

Pre-requisites

Dependecies

Usage

For getting taxa information instead of taxa ID

Detailed usage description

Example command

Output

Downstream

Visualization

About

Releases 1

Packages

Languages

License

npbhavya/Kraken2-output-manipulation

Folders and files

Latest commit

History

Repository files navigation

Kraken2-output-manipulation

Pre-requisites

Dependecies

Usage

For getting taxa information instead of taxa ID

Detailed usage description

Example command

Output

Downstream

Visualization

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages