RNA editing is a co-/post-transcriptional mechanism involving the insertion/deletion or substitution of specific bases in precise RNA localizations. Substitutional RNA editing, mostly by the adenosine to inosine (A-to-I) deamination, is prominent in mammals. It has profound functional consequences and its deregulation has been linked to a variety of human diseases including neurological and neurodegenerative disorders or cancer. RNA editing can be profiled in deep transcriptome data but its detection is not trivial. Indeed the number of A-to-I candidates as well as the accuracy of predictions depends on the adopted computational strategy. Individual steps or specific software used can dramatically alter the quality of results. Quantifying RNA editing is relevant to compare independent samples and study its potential role in different experimental conditions. Here we provide simple scripts to calculate different metrics for RNA editing quantification. Although a variety of programs to identify RNA editing candidates in RNAseq data has been released, we will profile inosinomes using REDItools and REDIportal.
The bioinformatics pipeline used to investigate RNA editing in RNAseq datasets is depicted in the following Figure. Below each step, we report main bioinformatics tools used, whose commad lines are included in this repository.
RNA editing candidates can be detected using REDItools. There are two current versions: 1) REDItools 1.3 or 2) REDItools 2.0. REDItools2 is a faster re-implementation of REDItools1 for HPC clusters. Its serial version is about ten times faster than REDItools1. The complete workflow for detecting de novo RNA editing events with REDItools is described here.
While the de novo approach provides a list of most likely editing candidates, the "known" approach focuses on a limited pool of known events in order to better investigate RNA editing dynamics in different experimental contexts. The "known" approach can be carried out using the REDItools package and a list of events from own data or from public databases such as DARNED, RADAR and REDIportal.
Hyper editing can be detected through a specific computational protocol in which not aligned sequences are rescued and mapped again onto a transformed genome replacing As with Gs, described in detail in Porath et al. (2014). The computational pipeline is freely available here.
The quantification of RNA editing is important to compare values across samples and study its potential role in different experimental conditions or in human disorders.
The overall editing is defined as the total number of reads with G at all known editing positions over the number of all reads covering the positions without imposing specific sequencing coverage criteria. It can be calculated using REDItools tables obtained imposing loosing parameters.
Download REDIportal annotations
wget http://srv00.recas.ba.infn.it/webshare/rediportalDownload/table1_full.txt.gz
gunzip table1_full.txt.gz
Index your REDItools output table by tabix
bgzip outTable.txt
tabix -s 1 -b 2 -e 2 -c R outTable.txt.gz
Run the overall script on a REDItools table
python getOverallEditing.py outTable.txt.gz table1_full.txt
The Alu editing index (AEI) is a metric to quantify the global RNA editing activity of sample and is defined as the weighted average of editing events occurring in all Alu elements. The pipeline to calculate AEI is described in Roth et al. (2019) and available here.
Install the RNAEditingIndex program following the instructions provided here and create a directory containing all BAM files.
RNAEditingIndex -d ALLBAMS -o ALLBAMS/indexer -os ALLBAMS --genome hg38 --paired_end -f .bam
The overall editing calculated at recoding positions residing in coding protein genes is named recoding index (REI). It has been initially described in Silvestris et al. (2019). This metric, used to investigate the activity of ADAR2, can be calculated using REDItools tables obtained imposing loosing parameters and a list of recoding sites from REDIportal.
Download REDIportal annotations
wget http://srv00.recas.ba.infn.it/webshare/rediportalDownload/table1_full.txt.gz
gunzip table1_full.txt.gz
Select recoding events
grep "nonsynonymous" table1_full.txt > recoding.txt
Index your REDItools output table by tabix
bgzip outTable.txt
tabix -s 1 -b 2 -e 2 -c R outTable.txt.gz
Run the REI script on a REDItools table
python getREI.py -i outTable.txt.gz -r recoding.txt
The identification of differential RNA editing is still an open question. Nonetheless, dysregulated RNA editing at recoding events can be calculated employing the Mann-Whitney U-test described in Silvestris et al. (2019) or the statistical pipeline proposed by Tran et al. (2019) Both pipelines are embedded with the get_DE_events.py script.
Prepare a comma separated sample informations file (e.g ArteryTibial_vs_BrainCerebellum.sif) required as input by the get_DE_events.py script. Run sample_status_file_creator.py providing:
- A csv sample file containing the main informations about each sample to be used in the experiment.
An example file is provided here. - A name for Samples group1 (e.g. ArteryTibial)
- A name for Samples group2 (e.g BrainCerebellum)
sample_status_file_creator.py csv_input_file, sample_group1_name, sample_group2_name
Create a confortable workdir (e.g. ArteryTibial_vs_BrainCerebellum) and enter it
mkdir ArteryTibial_vs_BrainCerebellum && cd ArteryTibial_vs_BrainCerebellum
Run sample_path_folder_creator.py that will copy the Reditools tables in different directories following the sample/Group subdivisions reported in the sample informations file (.sif).
python sample_path_folder_creator.py csv_sample_file.sif
Note. The script assumes that REDItools outputs (e.g. SRR1071289, SRR1101591) are contained in a "tables" folder in your main working directory, otherwise modify the last line of the script accordingly.
Run the get_DE_events.py script (Mann-Whitney U-test) on multiple REDItools tables following the sample/Group subdivions reported in the sample informations file (.sif). The option -sig yes in combination with -cpval 2 (BH correction), returns only significantly edited positions. MtsA and mtsB, represents the minimum threshold of samples per group on which the statistical tests are applied.
python get_DE_events.py -input_file ArteryTibial_vs_BrainCerebellum.sif -cpval 2 -mtsA 25.0 -mtsB 21.0 -sig yes
Alternatively, run the get_DE_events.py script on the same samples applying the the statistical pipeline proposed by Tran et al. (2019)
python get_DE_events.py -linear -input_file ../ArteryTibial_vs_BrainCerebellum.sif
A detailed explanation, about the get_DE_events.py script options can be found here.