added scripts of latest pathogen analyses, using the human genome as …

…input for kneaddata
vrmarcelino · Aug 19, 2019 · c9fb475 · c9fb475
1 parent ce91a66
commit c9fb475
Show file tree

Hide file tree

Showing 5 changed files with 14 additions and 8 deletions.
diff --git a/02_pathogen/s1_QC_Trimm.sh b/02_pathogen/s1_QC_Trimm.sh
@@ -2,7 +2,8 @@
 #PBS -P FGEN
 #PBS -l select=1:ncpus=6:mem=100GB
 #PBS -l walltime=12:00:00
-
+#PBS -M [email protected]
+#PBS -m ae
 
 cd $PBS_O_WORKDIR
 
@@ -18,7 +19,7 @@ process=4
 r1=00_reads/patmg_CAMI2_short_read_R1.fastq.gz
 r2=00_reads/patmg_CAMI2_short_read_R2.fastq.gz
 
-db=/home/vros8020/FGEN_project/databases/kneadData/knead_human/
+db=/home/vros8020/FGEN_project/databases/kneadData/knead_human_genome/
 
 output_dir=01_QualityControl
 mkdir $output_dir
@@ -28,3 +29,4 @@ kneaddata -i $r1 -i $r2 -o $output_dir/patmg_CAMI2_QCd -db $db -t $th -p $proces
 
 mv 01_QualityControl/patmg_CAMI2_QCd/patmg_CAMI2_short_read_R1_kneaddata_paired_1.fastq 01_QualityControl/patmg_CAMI2_QCd_R1.fq
 mv 01_QualityControl/patmg_CAMI2_QCd/patmg_CAMI2_short_read_R1_kneaddata_paired_2.fastq 01_QualityControl/patmg_CAMI2_QCd_R2.fq
+
diff --git a/02_pathogen/s2_CCM.sh b/02_pathogen/s2_CCM.sh
@@ -1,5 +1,5 @@
 #!/bin/bash
-#PBS -P MYCOB
+#PBS -P FGEN
 #PBS -l select=1:ncpus=4:mem=500GB
 #PBS -l walltime=1:00:00
 

diff --git a/03_marine/.DS_Store b/03_marine/.DS_Store
diff --git a/03_marine/s4_Convert_2_CAMI.sh b/03_marine/s4_Convert_2_CAMI.sh
@@ -1,13 +1,13 @@
 #!/bin/bash
-#PBS -P MYCOB
+#PBS -P FGEN
 #PBS -l select=1:ncpus=1:mem=100GB
 #PBS -l walltime=10:00:00
 
 
 cd $PBS_O_WORKDIR
 
 module load python/3.6.5
-PATH=$PATH:/home/vros8020/scratches/11_CAMI2/marine/convertion_scrips
+PATH=$PATH:/home/vros8020/scratches/11_CAMI2/marine/Illumina_reads/convertion_scrips
 
 input_dir=03_CCMetagen
 output_dir=04_Results2Submit
@@ -27,8 +27,10 @@ done
 for r12 in $input_dir/*.csv; do
 	o_part1=$output_dir/${r12/$input_dir\//''}
 	o=${o_part1/.csv/.profile}
-	sample_name=${r12/$input_dir\//''}
+	sample_name_part1=${r12/$input_dir\/marmgCAMI2_short_read_sample_/''}
+	sample_name=${sample_name_part1/.csv/''}
 	echo "$o"
 	ccm2cami.py -i $r12 -n $sample_name -o $o
 done
 
+
diff --git a/README.md b/README.md
@@ -38,7 +38,7 @@ These versions can also be found in the folder [00_software](https://github.com/
 For the pathogen challenge, I used [KneadData](http://huttenhower.sph.harvard.edu/kneaddata) v0.6.1 to filter out human and low quality sequences.
 kneaddata used Bowtie2 v.2.2.5 and Trimmomatic v.0.38.
 
-I used the human transcriptome (hg38) reference database to filter out human reads, which can be downloaded as: `kneaddata_database --download human_transcriptome bowtie2 $DIR`. 
+I used the human reference database (hg37_and_human_contamination) to filter out human reads, which can be downloaded as: `kneaddata_database --download human_genome bowtie2 $DIR`. 
 
 For the marine challenge, I used Trimmomatic v.0.38.
 
@@ -157,7 +157,6 @@ This allows us to flag obvious errors (possible assembly errors in the nt databa
 CCMetagen_merge.py -i 03_CCMetagen -kr r -tlist Mammalia,Insecta,Oomycetes -l Class -o all_samples_marine_only # final results
 ```
 
-
 **Step 5** Convert to Cami:
 Finally, we need to convert the CCMetagen results to the CAMI2 format. 
 As they require one file per sample, I removed the taxa filtered out with CCMetagen_merge (whihc produces one table for all samples) from the original CCMetagen .csv files (one per sample) using sed:
@@ -174,6 +173,9 @@ ccm2cami.py -i $r12 -n $sample_name -o $o
 ```
 The ccm2cami.py script and dependencies can be found [here](https://github.com/vrmarcelino/CriticalAss2/tree/master/03_marine/convertion_scrips)
 
+The profiles of all samples were concatenated into a single file: `cat *.profile > all_short_read_samples.profile` and the fingerprint was generated on the concatenated file.
+
 Done!
 
 
+