bam to consensus sequence

codemeleon · Aug 14, 2023 · ecdf053 · ecdf053
1 parent 9a6bc2b
commit ecdf053
Showing 1 changed file with 10 additions and 4 deletions.
diff --git a/README.md b/README.md
@@ -18,7 +18,7 @@ SeqPanther consists of the following set of commands:
 
 ## Operating system compatibility
 
-Unix and OS X Commandline Application. 
+Unix and OS X Commandline Application.
 
 ## Dependencies
 
@@ -63,7 +63,7 @@ The tool relies on multiple external open source programs and python modules as
 
 ## Option 2: Install directly from Git
 
-To install directly from the Github repo, run the command: 
+To install directly from the Github repo, run the command:
 
 `pip install git+https://github.com/codemeleon/seqPanther.git`
 
@@ -81,6 +81,12 @@ This command help is accessible using `seqpanther cc2ns` or `seqpanther cc2ns --
 
 ## nucsubs
 
+You might need to convert bam to consensus before running `seqpanther nucsubs`. Consensus sequences can be generated using following commands.
+
+- `samtools index <sorted_bamfile>`
+- `bcftools mpileup -f <reference_fasta> <sorted_bamfile> | bcftools call -c --ploidy 1 | vcfutils.pl vcf2fq > <sorted_bamfile>.fq`
+- `seqtk seq seq -aQ64 <sorted_bamfile>.fq > <sorted_bamfile>.fasta`
+
 This command help is accessible using `seqpanther nucsubs` or `seqpanther nucsubs --help`.
 
 ## SeqPatcher
@@ -101,11 +107,11 @@ This command help is accessible at `seqpanther seqpatcher` or `seqpanther seqpat
 
 The command will generate four outputs in the current folder including: `sub_output.csv` containing details of the nucleotide substitutions, `indel_output.csv` containing details of the indel events, `codon_output.csv` containing details of the codon changes and `output.pdf` which is a plot of genome depth and breadth of coverage annotated with the positions with mutations and indels.
 
-6. If you only want to generate the results for a single BAM file, run the command as `seqpanther codoncounter -bam ./bam/K032282-consensus_alignment_sorted.bam -rid NC_045512.2 -ref GCF_009858895.2_ASM985889v3_genomic.fna -gff GCF_009858895.2_ASM985889v3_genomic.gff -coor_range 21563-25384` replacing the BAM file name with your specific bam file name in the command. 
+6. If you only want to generate the results for a single BAM file, run the command as `seqpanther codoncounter -bam ./bam/K032282-consensus_alignment_sorted.bam -rid NC_045512.2 -ref GCF_009858895.2_ASM985889v3_genomic.fna -gff GCF_009858895.2_ASM985889v3_genomic.gff -coor_range 21563-25384` replacing the BAM file name with your specific bam file name in the command.
 
 7. Outputs can be explored using a text file reader (for the text files) and pdf reader (e.g Adobe Reader) for the PDFs. An example command to view the text files would be: `cat sub_output.csv | sed 's/,/ ,/g' | column -t -s, | less -S`. The user needs to explore those files and remove the changes they would like not to be integrated. A text editor of your choice e.g. bbedit or notepad++ can be used to edit the files.
 
-8. In case you decide that there are certain mutations that you need to change, you will have to convert the outputs from `codoncounter` to the format required by the `nucsubs` command and run the command `seqpanther cc2ns -s sub_output.csv -i sub_output.csv -o changes`. It generates a CSV file for each sample in the `./change` folder. 
+8. In case you decide that there are certain mutations that you need to change, you will have to convert the outputs from `codoncounter` to the format required by the `nucsubs` command and run the command `seqpanther cc2ns -s sub_output.csv -i sub_output.csv -o changes`. It generates a CSV file for each sample in the `./change` folder.
 
 9. Then execute seqpanther as follows: `seqpanther nucsubs -i NC_045512.2 -r NC_045512.2.fasta -c consensus -t changes -o results` to integrate relevant changes to the consensus sequences. The output will be generated in a folder named `results`.