Skip to content

Latest commit

 

History

History
125 lines (98 loc) · 5.68 KB

5.singler.md

File metadata and controls

125 lines (98 loc) · 5.68 KB

Basic Pipeline for scRNAseq Data Analysis: SingleR automatic cell annotation

Instructors : Somi Kim, Eunseo Park, Donggon Cha 2021/07/06

load data

Seurat object should be loaded for automatic cell annotation.

To annotate cells automatically, we can use SingleR r package. Before performing SingleR, SingleCellExperiment (SCE) object is set with raw counts and normalized counts. Also, the Human Primary cell Atlas, represented as a SummarizedExperiment object containing a matrix of log-expression values with sample-level labels.

library(SingleR)
library(celldex)
library(SingleCellExperiment)
library(dplyr)
library(Seurat)

sce <- SingleCellExperiment(list(counts = seurat@assays$originalexp@counts,
                                 logcounts = seurat@assays$originalexp@data))

hpca.se <- HumanPrimaryCellAtlasData()
hpca.se
## class: SummarizedExperiment 
## dim: 19363 713 
## metadata(0):
## assays(1): logcounts
## rownames(19363): A1BG A1BG-AS1 ... ZZEF1 ZZZ3
## rowData names(0):
## colnames(713): GSM112490 GSM112491 ... GSM92233 GSM92234
## colData names(3): label.main label.fine label.ont

We use hpca.se reference to annotate each cell in SCE object via the SingleR() function. By this function, marker genes are identified from the reference and based on the Spearman correlation across markers, assignment scores are computed for each cell. Each cell is annotated as the cell type label with the highest score.

pred.hesc <- SingleR(test = sce, ref = hpca.se, assay.type.test=1,
                     labels = hpca.se$label.main)
head(pred.hesc)
## DataFrame with 6 rows and 5 columns
##                                            scores      first.labels
##                                          <matrix>       <character>
## AAACCTGAGGCGTACA-1 0.183971:0.116038:0.104517:... Tissue_stem_cells
## AAACGGGAGATCGATA-1 0.222493:0.142291:0.137941:... Tissue_stem_cells
## AAAGCAAAGATCGGGT-1 0.275393:0.155992:0.149990:...       Fibroblasts
## AAATGCCGTCTCAACA-1 0.220968:0.133970:0.122140:... Tissue_stem_cells
## AACACGTCACGGCTAC-1 0.156163:0.138000:0.115655:... Endothelial_cells
## AACCGCGAGACGCTTT-1 0.289759:0.176489:0.156293:... Tissue_stem_cells
##                          tuning.scores              labels       pruned.labels
##                            <DataFrame>         <character>         <character>
## AAACCTGAGGCGTACA-1 0.1433232:0.0623682   Tissue_stem_cells   Tissue_stem_cells
## AAACGGGAGATCGATA-1 0.1010954:0.0346291   Tissue_stem_cells   Tissue_stem_cells
## AAAGCAAAGATCGGGT-1 0.0969523:0.0856622 Smooth_muscle_cells Smooth_muscle_cells
## AAATGCCGTCTCAACA-1 0.2313040:0.1259849   Tissue_stem_cells   Tissue_stem_cells
## AACACGTCACGGCTAC-1 0.2704928:0.2038585   Endothelial_cells   Endothelial_cells
## AACCGCGAGACGCTTT-1 0.3468541:0.2477355   Tissue_stem_cells   Tissue_stem_cells

The result performing SingleR is below.

table(pred.hesc$labels)
## 
##               B_cell         Chondrocytes                  CMP 
##                  649                   10                    1 
##                   DC    Endothelial_cells     Epithelial_cells 
##                   48                  885                  138 
##         Erythroblast          Fibroblasts                  GMP 
##                    4                   72                    5 
##          Hepatocytes            iPS_cells           Macrophage 
##                  480                    1                   96 
##             Monocyte                  MSC Neuroepithelial_cell 
##                  219                    3                    1 
##              NK_cell          Osteoblasts     Pre-B_cell_CD34- 
##                  133                    3                   20 
##     Pro-B_cell_CD34+  Smooth_muscle_cells              T_cells 
##                    3                  135                 1330 
##    Tissue_stem_cells 
##                  513
table(pred.hesc$labels) %>% sum()
## [1] 4749
identical(colnames(seurat), rownames(pred.hesc))
## [1] TRUE

To compare the automatic cell type annotation result with our manual cell type annotation, we assign the result labels to each cell in our Seurat object, and visualize them in UMAP plot. The results are below.

seurat$singler_labels = pred.hesc$labels

DimPlot(seurat, 
        reduction = "umap",
        group.by = "singler_labels", 
        label=TRUE)

DimPlot(seurat, 
        reduction = "umap",
        group.by = "celltype", 
        label=TRUE)

References

L. Ma, M.O. Hernandez, Y. Zhao, M. Mehta, B. Tran, M. Kelly, Z. Rae, J.M. Hernandez, J.L. Davis, S.P. Martin, D.E. Kleiner, S.M. Hewitt, K. Ylaya, B.J. Wood, T.F. Greten, X.W. Wang. Tumor cell biodiversity drives microenvironmental reprogramming in liver cancer. Canc. Cell, 36 (4): 418-430 (2019)

Aran, D., Looney, A. P., Liu, L., Wu, E., Fong, V., Hsu, A., ... & Bhattacharya, M. (2019). Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nature immunology, 20(2), 163-172.

Mabbott, N.A., Baillie, J.K., Brown, H. et al. An expression atlas of human primary cells: inference of gene function from coexpression networks. BMC Genomics 14, 632 (2013), 14-632.