-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
a47ddc4
commit e0ab682
Showing
4 changed files
with
116 additions
and
44 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
.. _Sequencing_technologies-page: | ||
|
||
*********************************** | ||
4 Quality of the Mapping | ||
*********************************** | ||
|
||
Introduction to Mapping and tools | ||
================================== | ||
|
||
Once our reads are clean and with good Quality, most of the analysis requires the aligment of this reads respect a reference genome. | ||
Depending on the origin of our sequencing data (WGS, WES, RNA-seq, Chip-seq, ...) and the downstream analysis, several alingers are available to adjust to the necessities of our analysis. | ||
|
||
- BWA-MEM | ||
- bowtie | ||
- STAR | ||
- | ||
|
||
Previous aligment of the reads, a reference genome in fasta format is needed, Typical sources to look up are UCSC, Ensembl or Gencode. An indexing of the reference genome is perfomed to create a dictionary database of the redundant sequences of the genome and facilitate and accelerate the query of the reads respect this regions, thus, minimizing the the memory footprint. | ||
|
||
SAM format | ||
---------- | ||
|
||
|
||
|
||
BAM QC | ||
=========================== | ||
|
||
Even if the Quality control of the reas was correct, there are some problems with the reads that are visualized after mapping (Low coverage, homopolymers biases, experimental artifacts, etc) | ||
Most of the tools to asses Mapping Quality relies on the values of MAPQ, which is a Phred-scaled probability that the alignment is wrong. | ||
|
||
|
||
.. math:: | ||
MAPQ = -10*log10(P) | ||
Where P is the probability that the alignment is wrong. | ||
For example, for a MAPQ value of 20 the probability that the alignment is wrong is 1 in 100 (0.01), | ||
|
||
.. math:: | ||
MAPQ = -10*log10(0.01) = 20 | ||
The confidence of the alignment is higher when the MAPQ value is higher. | ||
|
||
Main Tools to asses the quality of the mapping are: | ||
|
||
- **SAMStat**: Is a CLI tool that offers Statistics of SAM/BAM files of unmapped, poorly and accuretly mapped raads. | ||
.. seealso:: | ||
.. _SAMStat: https://github.com/TimoLassmann/samstat | ||
|
||
|
||
BAM format. Note, that the BAM file has to be sorted by chromosomal coordinates. Sorting can be performed with samtools sort. |