Skip to content

Commit

Permalink
added scripts of latest pathogen analyses, using the human genome as …
Browse files Browse the repository at this point in the history
…input for kneaddata
  • Loading branch information
vrmarcelino committed Aug 19, 2019
1 parent ce91a66 commit c9fb475
Show file tree
Hide file tree
Showing 5 changed files with 14 additions and 8 deletions.
6 changes: 4 additions & 2 deletions 02_pathogen/s1_QC_Trimm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
#PBS -P FGEN
#PBS -l select=1:ncpus=6:mem=100GB
#PBS -l walltime=12:00:00

#PBS -M [email protected]
#PBS -m ae

cd $PBS_O_WORKDIR

Expand All @@ -18,7 +19,7 @@ process=4
r1=00_reads/patmg_CAMI2_short_read_R1.fastq.gz
r2=00_reads/patmg_CAMI2_short_read_R2.fastq.gz

db=/home/vros8020/FGEN_project/databases/kneadData/knead_human/
db=/home/vros8020/FGEN_project/databases/kneadData/knead_human_genome/

output_dir=01_QualityControl
mkdir $output_dir
Expand All @@ -28,3 +29,4 @@ kneaddata -i $r1 -i $r2 -o $output_dir/patmg_CAMI2_QCd -db $db -t $th -p $proces

mv 01_QualityControl/patmg_CAMI2_QCd/patmg_CAMI2_short_read_R1_kneaddata_paired_1.fastq 01_QualityControl/patmg_CAMI2_QCd_R1.fq
mv 01_QualityControl/patmg_CAMI2_QCd/patmg_CAMI2_short_read_R1_kneaddata_paired_2.fastq 01_QualityControl/patmg_CAMI2_QCd_R2.fq

2 changes: 1 addition & 1 deletion 02_pathogen/s2_CCM.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/bin/bash
#PBS -P MYCOB
#PBS -P FGEN
#PBS -l select=1:ncpus=4:mem=500GB
#PBS -l walltime=1:00:00

Expand Down
Binary file added 03_marine/.DS_Store
Binary file not shown.
8 changes: 5 additions & 3 deletions 03_marine/s4_Convert_2_CAMI.sh
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
#!/bin/bash
#PBS -P MYCOB
#PBS -P FGEN
#PBS -l select=1:ncpus=1:mem=100GB
#PBS -l walltime=10:00:00


cd $PBS_O_WORKDIR

module load python/3.6.5
PATH=$PATH:/home/vros8020/scratches/11_CAMI2/marine/convertion_scrips
PATH=$PATH:/home/vros8020/scratches/11_CAMI2/marine/Illumina_reads/convertion_scrips

input_dir=03_CCMetagen
output_dir=04_Results2Submit
Expand All @@ -27,8 +27,10 @@ done
for r12 in $input_dir/*.csv; do
o_part1=$output_dir/${r12/$input_dir\//''}
o=${o_part1/.csv/.profile}
sample_name=${r12/$input_dir\//''}
sample_name_part1=${r12/$input_dir\/marmgCAMI2_short_read_sample_/''}
sample_name=${sample_name_part1/.csv/''}
echo "$o"
ccm2cami.py -i $r12 -n $sample_name -o $o
done


6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ These versions can also be found in the folder [00_software](https://github.com/
For the pathogen challenge, I used [KneadData](http://huttenhower.sph.harvard.edu/kneaddata) v0.6.1 to filter out human and low quality sequences.
kneaddata used Bowtie2 v.2.2.5 and Trimmomatic v.0.38.

I used the human transcriptome (hg38) reference database to filter out human reads, which can be downloaded as: `kneaddata_database --download human_transcriptome bowtie2 $DIR`.
I used the human reference database (hg37_and_human_contamination) to filter out human reads, which can be downloaded as: `kneaddata_database --download human_genome bowtie2 $DIR`.

For the marine challenge, I used Trimmomatic v.0.38.

Expand Down Expand Up @@ -157,7 +157,6 @@ This allows us to flag obvious errors (possible assembly errors in the nt databa
CCMetagen_merge.py -i 03_CCMetagen -kr r -tlist Mammalia,Insecta,Oomycetes -l Class -o all_samples_marine_only # final results
```


**Step 5** Convert to Cami:
Finally, we need to convert the CCMetagen results to the CAMI2 format.
As they require one file per sample, I removed the taxa filtered out with CCMetagen_merge (whihc produces one table for all samples) from the original CCMetagen .csv files (one per sample) using sed:
Expand All @@ -174,6 +173,9 @@ ccm2cami.py -i $r12 -n $sample_name -o $o
```
The ccm2cami.py script and dependencies can be found [here](https://github.com/vrmarcelino/CriticalAss2/tree/master/03_marine/convertion_scrips)

The profiles of all samples were concatenated into a single file: `cat *.profile > all_short_read_samples.profile` and the fingerprint was generated on the concatenated file.

Done!



0 comments on commit c9fb475

Please sign in to comment.