Skip to content

Latest commit

 

History

History
157 lines (110 loc) · 6.31 KB

6.Tcell_Trajectory.md

File metadata and controls

157 lines (110 loc) · 6.31 KB

Basic Pipeline for scRNAseq Data Analysis: Trajectory Analysis

Instructors : Somi Kim, Eunseo Park, Donggon Cha 2021/07/06

Preprocessing

Extracting Specific Cell Type

After specifying cell types, it is able to extract specific cells for further analysis. For efficient downstream analysis, genes without detected UMIs which has values of zeros is removed.

#extract T cell population
seurat_t <- seurat[, seurat$celltype == "T.cell"]
seurat_t <- seurat_t[rowSums(seurat_t@assays$originalexp@counts) != 0, ]

Since the population of interest is changed specifically into T cells, expression data is normalized again in cell-level.

sce_t <- as.SingleCellExperiment(seurat_t)

clusters <- quickCluster(sce_t)
sce_t <- computeSumFactors(sce_t, clusters = clusters)
sce_t <- logNormCounts(sce_t)

Highly variable genes are also changed specifically in T cell population.

#HVG selection for T cell
dec <- modelGeneVar(sce_t)
plot(dec$mean, dec$total, xlab="Mean log-expression", ylab="Variance")
curve(metadata(dec)$trend(x), col="blue", add=TRUE)

hvg.t <- getTopHVGs(dec, fdr.threshold = 0.05)
length(hvg.t) # 200 genes
## [1] 200

Trajectory Analysis

Here, we will infer the trajectory of extracted T cell population, using monocle3 package.

Generate CDS Object

It uses differently structured object named cell_data_set (cds), so normalized expressions, metadata for cells, and metadata for genes shoud be recombined for creating cds.

library(monocle3)
cell_metadata = colData(sce_t)
gene_metadata = data.frame(gene_short_name = rownames(sce_t), row.names = rownames(sce_t))

cds <- new_cell_data_set(sce_t@assays@data$logcounts,
                         cell_metadata = cell_metadata,
                         gene_metadata = gene_metadata) # generate cell_data_set object

Dimension reduction for CDS Object

monocle3 also allows dimension reduction using hvgs. As we import normalized count in cds object, we preprocess the object without additional normalization.

cds <- preprocess_cds(cds, "PCA", num_dim = 50, norm_method = "none", use_genes = hvg.t)
monocle3::plot_pc_variance_explained(cds)

cds <- reduce_dimension(cds, reduction_method = "UMAP")
plot_cells(cds, color_cells_by = "ID", cell_size = 1, group_label_size = 5)

Correcting Batch Effects

As we observe batch effects above, they should be corrected before further trajectory analysis. Here we perform Mutual Nearest Neighbor (MNN) batch effect correction implemented batchelor, which is included in monocle3 package.

cds_aligned <- preprocess_cds(cds, "PCA", num_dim = 30, norm_method = "none", use_genes = hvg.t)
cds_aligned <- align_cds(cds_aligned, alignment_group = "ID") #batch correction
cds_aligned <- reduce_dimension(cds_aligned, preprocess_method = "Aligned")
plot_cells(cds_aligned,
           color_cells_by = "ID",
           cell_size = 1,
           label_cell_groups = FALSE)

Clustering and Trajectory Inference

Known markers genes allows us to check the brief estimated trajectory, and to compare a trajectory after further analysis. We bring naive (CCR7), CD4, and CD8 T cell marker expressions.

plot_cells(cds_aligned,
           genes = c("CCR7", "CD4", "CD8A"),
           cell_size = 1,
           norm_method = "size_only")

Further clustering and trajectory inference can be done by functions below. Clustering by cluster_cells() is resolution-sensitive, and the clustering result affects further trajectory lines calculated by learn_graph(). By specifying the starting node of the learned graph from the interactive user interfaces after running order_cells(), pseudotime is calculated.

cds_aligned <- cluster_cells(cds_aligned, resolution = 0.001)
cds_aligned <- learn_graph(cds_aligned)
cds_aligned <- order_cells(cds_aligned)

Estimated clusters, trajectory, and pseudotimesis plotted as below.

plot_cells(cds_aligned,
           color_cells_by = "cluster",
           cell_size = 1,
           group_label_size = 5,
           label_leaves = FALSE,
           label_branch_points = FALSE)

plot_cells(cds_aligned,
           color_cells_by = "pseudotime",
           cell_size = 1,
           label_groups_by_cluster = FALSE,
           label_leaves = FALSE,
           label_branch_points = FALSE)

References

L. Ma, M.O. Hernandez, Y. Zhao, M. Mehta, B. Tran, M. Kelly, Z. Rae, J.M. Hernandez, J.L. Davis, S.P. Martin, D.E. Kleiner, S.M. Hewitt, K. Ylaya, B.J. Wood, T.F. Greten, X.W. Wang. Tumor cell biodiversity drives microenvironmental reprogramming in liver cancer. Canc. Cell, 36 (4): 418-430 (2019)

Lun, A. T. L. et al. EmptyDrops: distinguishing cells from empty droplets in droplet-based single-cell RNA sequencing data. Genome Biol. 20, 63 (2019)

McCarthy, D. J., Campbell, K. R., Lun, A. T. & Wills, Q. F. Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R. Bioinformatics 33, 1179–1186 (2017)

Lun, A. T., McCarthy, D. J. & Marioni, J. C. A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor. F1000Res 5, 2122 (2016).

Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).

Cao, J. et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature 566, 496–502 (2019).

Haghverdi L, Lun ATL, Morgan MD, Marioni JC (2018). 'Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors.' Nat. Biotechnol., 36(5), 421-427. doi: 10.1038/nbt.4091