An R Shiny application for detecting and annotating hidden sources of variation in single cell RNA-seq data
Authors: Nathan Lawlor, Eladio J. Marquez, Donghyung Lee, and Duygu Ucar
App Maintainer: Nathan Lawlor
App Location: https://vsva.jax.org
Manuscript now published in Bioinformatics! For more information on our Shiny app, please see: https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btaa128/5771333
Single cell RNA-sequencing (scRNA-seq) technology enables studying gene expression programs from individual cells. However, these data are subject to diverse sources of variation, including “unwanted” variation that needs to be removed in downstream analyses (e.g., batch effects) and “wanted” or biological sources of variation (e.g., variation associated with a cell type) that needs to be precisely described. Surrogate variable analysis (SVA) based algorithms, are commonly used for batch correction and more recently for studying “wanted” variation in scRNA-seq data. We recently developed Iteratively Adjusted-Surrogate Variable Analysis (IA-SVA), which can effectively estimate surrogate variables associated with hidden variation (wanted or unwanted) in scRNA-seq data. However, interpreting whether these variables are biologically meaningful or stemming from technical reasons remains a challenge.
To facilitate the interpretation of surrogate variables detected by algorithms including IA-SVA, SVA, or ZINB-WaVE, we developed an R Shiny application (Visual Surrogate Variable Analysis (V-SVA)) that provides a web-browser interface for the identification and annotation of hidden sources of variation in scRNA-seq data. This interactive framework includes tools for discovery of genes associated with detected sources of variation, gene annotation using publicly available databases and gene sets, and data visualization using dimension reduction methods.
For an in depth guide of how to use this shiny app, please see the Word Document at: https://github.com/nlawlor/V-SVA/blob/master/Data/VSVA_Shiny_App_Manual.docx
- The app requires first a matrix of gene expression data (rows as genes, columns as samples). This file may be formatted as a tab-delimited text file, .CSV file, or .Rds object.
An example 10X Genomics single cell RNA-seq expression matrix obtained from Kang et al. 2017 which contains a subset of 200 human peripheral blood mononuclear cells (PBMCs) is provided at: https://github.com/nlawlor/V-SVA/blob/master/Data/test.exp.txt and can be used to test the app.
This matrix, when loaded into R:
df <- read.delim("Data/test.exp.txt", header = T, check.names = F, stringsAsFactors = F, row.names = 1)
df[1:4, 1:4]
- (Optional) Users may also provide a matrix of sample metadata (rows as sample names, columns as traits). Like the expression matrix, this file may be formatted as a tab-delimited text file, .CSV file, or .Rds object.
An example metadata file for the above expression matrix can be found at: https://github.com/nlawlor/V-SVA/blob/master/Data/test.metadata.txt.
This matrix, when loaded into R:
meta <- read.delim("Data/test.metadata.txt", header = T, check.names = F, stringsAsFactors = F, row.names = 1)
meta[1:4, ]
Please Note: The sample identifiers contained in the rows of the metadata matrix should be the same as the identifiers provided in the columns of the expression matrix
- Identify hidden sources of variability in transcriptomic data:
- Discover marker genes associated with a variable of interest:
- Use molecular pathways/genesets to annotated sources of variation:
- Dimension reduction and interactive visualization of data:
- V-SVA is currently hosted on dedicated The Jackson Laboratory Shiny servers (two 24 Core and 192GB RAM servers) and is freely available here.
Our method has been published in Bioinformatics as an open access application note! To read more, please find the article which is freely available here: https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btaa128/5771333