Skip to content

Commit

Permalink
update 1 2 3
Browse files Browse the repository at this point in the history
  • Loading branch information
FabianAndradeLozano committed Sep 9, 2024
1 parent 1d8268e commit df90eb8
Show file tree
Hide file tree
Showing 4 changed files with 47 additions and 47 deletions.
60 changes: 30 additions & 30 deletions docs/1- Library preparation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -141,68 +141,68 @@ On this section are presented the main source of bias in RNA-seq, and the soluti

1. Degradation of RNA:

.. tip::
Minimizing the sample processing and freezing and thawing cycles, ensures that RNA is preserved as best as possible.
.. tip::
Minimizing the sample processing and freezing and thawing cycles, ensures that RNA is preserved as best as possible.

2. RNA extraction:

.. tip::
If possible use high concentrations of RNA samples or avoid TRIzol extraction altogether.
.. tip::
If possible use high concentrations of RNA samples or avoid TRIzol extraction altogether.

**Library Construction**
-------------------------

1. **Low-quality and/or low-quantity RNA samples**:

.. tip::
RNase H has been the best method for detecting low-qualityRNA and even could effectively replace the standard RNA-seq method based on oligo (dT).
For low-quantity RNA,the SMART and NuGEN approaches had lower duplication rates and significantly decreased the necessary amount of starting material compared to other methods.
.. tip::
RNase H has been the best method for detecting low-qualityRNA and even could effectively replace the standard RNA-seq method based on oligo (dT).
For low-quantity RNA,the SMART and NuGEN approaches had lower duplication rates and significantly decreased the necessary amount of starting material compared to other methods.

2. **mRNA enrichment bias**: In eukaryotes enrich for polyadenylated RNA transcripts with oligo (dT) primers have shown that this method remove all non-poly (A) RNAs, such a reolication-dependant histones and lncRNAs (lacking of polyA),or incomplete mRNAs.

.. tip::
Targeting rRNA as depletion method will not limit to only mRNA molecules, may capture more immature transcripts, leading to a complexity increase of sequencing data (also is more expensive).
Subtractive hybridization using rRNA-specific probes as the method that introduced the least bias in relative transcript abundance,
.. tip::
Targeting rRNA as depletion method will not limit to only mRNA molecules, may capture more immature transcripts, leading to a complexity increase of sequencing data (also is more expensive).
Subtractive hybridization using rRNA-specific probes as the method that introduced the least bias in relative transcript abundance,

3. **RNA fragmentation bias**: There are two major approaches of RNA fragmentation: chemical (using metal ions) and enzymatic (using RNase III). During this process could be introduced lenght biases or errors (propagated to later cycles).

.. tip::
Studies have shown that methods that involve non specific restriction endonucleases indicate less sequence bias and have been shown to perform similarly to the physical methods. Also enzymatic methoda are easy to automate
.. tip::
Studies have shown that methods that involve non specific restriction endonucleases indicate less sequence bias and have been shown to perform similarly to the physical methods. Also enzymatic methoda are easy to automate

4. **Primer bias**: During reverse transcription into cDNA by random hexamers can lead to deviation of nucleotide content of RNA sequencing reads, resulting in low complexity of RNA sequencing data.

.. tip::
Could be avoid using the Illumina Genome Analyzer, which perform the reverse transcription directly on the flowcells, avoiding the PCR.
Also has been proposed a bioinformatics tool in a reweighing scheme to adjust for the bias and make the distribution of the reads more uniform.
.. tip::
Could be avoid using the Illumina Genome Analyzer, which perform the reverse transcription directly on the flowcells, avoiding the PCR.
Also has been proposed a bioinformatics tool in a reweighing scheme to adjust for the bias and make the distribution of the reads more uniform.

5. **Adapter ligation bias**: Adapter ligation introduces a significant but widely overlooked bias in the results of NGS small RNA sequencing.

.. tip::
As a solution, several groups propose to randomize the 3' end of the 5'adapter and the 5'end of the 3'adapter.
The strategy is based on the hypothesis that a population of degenerate adapters would average out the sequencing bias because the slightly different adapter molecules would form stable secondary structures with a more diverse population of RNAsequences - Reverse transcription bias: reverse transcriptases tend to produce false second strand cDNA throughDNA-dependent DNA polymerase. ActinomycinD, a compound that specifically inhibits DNA-dependent DNAsynthesis, has been proposed as an agent to eliminate antisense artifacts
.. tip::
As a solution, several groups propose to randomize the 3' end of the 5'adapter and the 5'end of the 3'adapter.
The strategy is based on the hypothesis that a population of degenerate adapters would average out the sequencing bias because the slightly different adapter molecules would form stable secondary structures with a more diverse population of RNAsequences - Reverse transcription bias: reverse transcriptases tend to produce false second strand cDNA throughDNA-dependent DNA polymerase. ActinomycinD, a compound that specifically inhibits DNA-dependent DNAsynthesis, has been proposed as an agent to eliminate antisense artifacts

6. **Reverse Transcription**: A known feature of reverse transcriptases is that they tend to produce false second strand cDNA through DNA-dependent DNA polymerase. This may not be able to distinguish the sense and antisense transcript and create difficulties for the data analysis.

.. tip::
- The deoxyuridine triphosphate (dUTP) method, one of the leading cDNA-based strategies, can be specifically removed by enzymatic digestion
- Another method is to synthesize the first strand of cDNA using labeled random hexamer primer and SSS using DNA-RNA template-switching primer
.. tip::
- The deoxyuridine triphosphate (dUTP) method, one of the leading cDNA-based strategies, can be specifically removed by enzymatic digestion
- Another method is to synthesize the first strand of cDNA using labeled random hexamer primer and SSS using DNA-RNA template-switching primer

7. **PCR amplification bias**: main source of artifacts and base composition bias in the process of library construction:

7.1. Extremely AT/GC-Rich: Fragments of GC-neutral can be amplified more than GC-rich or AT-rich fragments.

.. tip::
- Through the use of custom adapters, the samples without amplification and ligation can be hybridized directly with the oligonucleotides on the flowcell surface, thus avoiding the biases and duplicates of PCR.
- However, the amplification-free method requires high sample input, which limits its widely used. The most effective PCR enhancing additives currently used are betaine.
It is an amino acid mimic that acts to balance the differential T m between AT and GC base pairs and has been effectively used to improve the coverage of GC-rich templates
- Presence of tetramethylammonium chloride (TMAC) showed that can remarkably increase the amplification of AT-rich regions in Kapa HiFi in the presence. Additionally,
a number of additives have been reported to play an important role in reducing the bias of PCR ampli-fication, including small amides such as formamide, small sulfoxides such as dimethyl sulfoxide (DMSO),
or reducingcompounds such as β-mercaptoethanol or dithiothreitol(DTT).
.. tip::
- Through the use of custom adapters, the samples without amplification and ligation can be hybridized directly with the oligonucleotides on the flowcell surface, thus avoiding the biases and duplicates of PCR.
- However, the amplification-free method requires high sample input, which limits its widely used. The most effective PCR enhancing additives currently used are betaine.
It is an amino acid mimic that acts to balance the differential T m between AT and GC base pairs and has been effectively used to improve the coverage of GC-rich templates
- Presence of tetramethylammonium chloride (TMAC) showed that can remarkably increase the amplification of AT-rich regions in Kapa HiFi in the presence. Additionally,
a number of additives have been reported to play an important role in reducing the bias of PCR ampli-fication, including small amides such as formamide, small sulfoxides such as dimethyl sulfoxide (DMSO),
or reducingcompounds such as β-mercaptoethanol or dithiothreitol(DTT).

7.2. PCR cyle: PCR can exponentially amplify DNA/cDNA templates, thus leading to a significant increase of amplification bias with the number of PCR cycles.

.. tip::
it is recommended that PCR be performedusing as few cycle numbers as possible to mitigation bias.
.. tip::
it is recommended that PCR be performedusing as few cycle numbers as possible to mitigation bias.

.. seealso::
For more information see the publication `Library preparation methods for next generation sequencing Tone down the bias <http://dx.doi.org/10.1016/j.yexcr.2014.01.008>`_ and `Bias in RNA-seq Library Preparation: Current Challenges and Solutions <https://doi.org/10.1155/2021/6647597>`_.
Expand Down
14 changes: 7 additions & 7 deletions docs/2- Sequencing technologies.rst
Original file line number Diff line number Diff line change
Expand Up @@ -104,15 +104,16 @@ For Illumina Paired end sequencing, two FASTQ files are generated, one for each

For each read, the information it's divided in four lines:

1. Sequence identifier: starts with '@' and contains information about the read. Such as the instrument, run ID, flow cell ID, lane, tile, x, y coordinates, and read number.

.. Note::
The @ symbol can not be used for count the number of reads, because it could also appear as a quality score symbol.
1. Sequence identifier: starts with '@' and contains information about the read. Such as the instrument, run ID, flow cell ID, lane, tile, x, y coordinates, and read number.

.. Note::
The @ symbol can not be used for count the number of reads, because it could also appear as a quality score symbol.

2. Sequence: the nucleotide sequence of the read.
3. Quality identifier: starts with '+' and contains the same information as the sequence identifier. Or it may be empty and in some cases is used for metadata.
4. Quality scores: the Phred quality score for each base in the read. The Phred quality score is a measure of the quality of the base call,

3. Quality identifier: starts with '+' and contains the same information as the sequence identifier. Or it may be empty and in some cases is used for metadata.

4. Quality scores: the Phred quality score for each base in the read. The Phred quality score is a measure of the quality of the base call,

.. math::
Q = -10 * log10(P)
Expand All @@ -127,4 +128,3 @@ For each read, the information it's divided in four lines:




20 changes: 10 additions & 10 deletions docs/3- Quality Control and Preprocessing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ the extensions supported are:

It is available for usage by CLI and GUI. It generates a html report for each file and its divided in the following Quality metrics modules:

1. **Basic Statistics**: Display the information related with the file, number and lenght of the sequences, and overall %GC.
1. **Basic Statistics**: Display the information related with the file, number and lenght of the sequences, and overall %GC.

2. **Per base sequence quality**: Shows how the quality score (y axis) varys throughout the sequence reads (x axis).
For each position a BoxWhisker is displayed, the red line represents the median and the blue the mean.
Expand Down Expand Up @@ -70,9 +70,9 @@ It is available for usage by CLI and GUI. It generates a html report for each fi
:align: center
:alt: *Per Sequence GC Content FASTQC module*

.. danger::
If the GC content is not close to the normal distribution, or more than one peak is found, this could indicate a contamination or a problem in the library preparation.
Also, depending on the organism the GC content could vary, so if possible it's good to know the GC content of the organism of interest previously and avoid compare it with the human modelled distribution.
.. danger::
If the GC content is not close to the normal distribution, or more than one peak is found, this could indicate a contamination or a problem in the library preparation.
Also, depending on the organism the GC content could vary, so if possible it's good to know the GC content of the organism of interest previously and avoid compare it with the human modelled distribution.

7. **Per Base N content**: If the sequencer is unable to determine the base in a position, it will be represented as an 'N'. This section shows the distribution of Ns in the reads.

Expand Down Expand Up @@ -125,8 +125,8 @@ Also, other sources of contaminats could be checked:

- PhiX: is a control used by Illumina to check the quality of the sequencing run (if the library is under or overloaded).
- rRNA: in RNA-seq is a good control of rRNA depletion during library preparation.
- Lambda
- Vectors: to check that vectors used during library preprartion have not been amplified.
- Lambda: cloning vector.
- Vectors: other vectors used during library preprartion.
- Adapters

Example of a FASTQ-Screen report:
Expand Down Expand Up @@ -155,10 +155,10 @@ the reads need to be pre-processed in order to get rid of them and improve quali

Typical tools used for pre-processing are:

- Trimmomatic <http://www.usadellab.org/cms/index.php?page=trimmomatic>
- Cutadapt, only remove the adapaters (it needs to be used in combination with sickle), requires the adapter sequence to be known <https://cutadapt.readthedocs.io/en/stable/>
- Sickle, remove low quality tail bases <https://github.com/najoshi/sickle>
- fastp <https://github.com/OpenGene/fastp>
- Trimmomatic `<http://www.usadellab.org/cms/index.php?page=trimmomatic>`_.
- Cutadapt, only remove the adapaters (it needs to be used in combination with sickle), requires the adapter sequence to be known `<https://cutadapt.readthedocs.io/en/stable/>`_.
- Sickle, remove low quality tail bases `<https://github.com/najoshi/sickle>`_.
- FASTP `<https://github.com/OpenGene/fastp>`_.


Fastp performs in all one the following corrections:
Expand Down
Binary file modified docs/images/fastq_format.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit df90eb8

Please sign in to comment.