Spatial transcriptomics is a cutting-edge technique that enhances traditional transcriptomics by adding a spatial dimension to gene expression analysis. Unlike conventional methods that can lose spatial context, spatial transcriptomics combines high-throughput RNA sequencing with histological imaging to map transcripts back to their original locations in a tissue sample. This is particularly useful in complex studies such as cancer research, where spatial organization of cells is crucial. Platforms like Visium 10X are commonly employed for this, offering high-resolution, genome-wide expression profiles. By leveraging this technique, researchers can gain unparalleled insights into tissue microenvironments, cell-cell interactions, and other aspects that necessitate a spatially resolved understanding of gene expression.
In this process, the fundamental element of the analysis relies on the data format employed in the #SpatialExperiment (). Below, you can observe the data's structure, which will be utilized in different analyses. By using this class, we can store data at the point of analysis, such as data from sequencing platforms (e.g. 10x Genomics Visium) at the point of analysis.
The entire pipeline and codes, due to be available in a separate R Markdown file, are currently delayed and will be revealed after the method's publication, in compliance with data privacy considerations. This delay is attributed to the development of novel approaches in cell type annotation for spatial transcriptomics data. These approaches are expected to introduce new perspectives in manual annotation and the combined use of machine learning algorithms, and their details will be shared post-publication to maintain data confidentiality.
Spatial transcriptomics data can be analyzed using multiple software packages available on the benchmark, including Seurat, Scanpy, and Giotto. In this workflow, instructions are based on the Seurat package in R. Initially, specifying the directory where the data resides is necessary for loading it via Seurat:
```R slice <- "PDAC-9137-A" root_dir <- "~/Documents/Visium/outs/" setwd(root_dir) obj <- Load10X_Spatial( data.dir = root_dir, filename = "filtered_feature_bc_matrix.h5", assay = "Spatial", slice = "slice1", filter.matrix = TRUE, to.upper = FALSE, image = NULL ) ```
Following this, both the features and counts within the sample can be visualized to better understand the data and to eliminate mitochondrial genes.
```R plot1 <- VlnPlot(obj, features = "nCount_Spatial", pt.size = 0.1) + NoLegend() plot2 <- SpatialFeaturePlot(obj, features = "nCount_Spatial") + theme(legend.position = "right") wrap_plots(plot1, plot2) # In the default setting, it's assumed that the sample is of human origin. To visualize the features and counts, this helps to get a grasp of the data and allows for the removal of mitochondrial genes. If dealing with a rat sample, uncomment and use the corresponding code. obj[["percent.mt"]] <- PercentageFeatureSet(obj, pattern = "^MT-") #obj[["percent.mt"]] <- PercentageFeatureSet(obj, pattern = "^Mt-") # for rat ```
In this spatial transcriptomics analysis, SCTransform is utilized as an advanced normalization method within Seurat, recognized for its superior capabilities in managing technical noise and biological variability compared to the NormalizeData function. Through regularized negative binomial regression, technical variance is effectively stabilized across features. Not only is correction for sequencing depth achieved, but the influence of unwanted technical factors is also mitigated, while biological heterogeneity is preserved. With the application of SCTransform, more accurate feature selection and enhanced detection of subtle biological signals are facilitated, providing a robust foundation for subsequent analyses such as clustering and differential expression studies.
for (i in 1:length(samples_list)) {
samples_list[[i]] <- SCTransform(samples_list[[i]], assay = "Spatial", verbose = FALSE)
samples_list[[i]][["batch"]] <- names(samples_list)[i]
}
In the above code, it normalizes each sample using SCTransform and in the last line, it addes batch information as metadata.
Feature selection is a critical step in spatial transcriptomics analysis, especially when preparing to integrate data from multiple samples. This process involves identifying a set of features (genes) that are the most informative across the dataset, which can help in improving the accuracy of downstream analyses like clustering and dimensional reduction. In this workflow, we utilize SelectIntegrationFeatures
from Seurat to choose a defined number of features that contribute most to the variability across the samples.
features <- SelectIntegrationFeatures(object.list = samples_list, nfeatures = 2000)
The parameter nfeatures = 2000
specifies that the top 2000 features with the highest variability are selected, which is a common practice to balance between capturing enough biological variability and computational efficiency.
Integrating multiple spatial transcriptomics samples is essential to correct for batch effects and align different datasets to a common space, facilitating comparative and joint analyses. This is achieved using a series of functions from Seurat to prepare the samples, find integration anchors, and finally integrate the data based on these anchors. The integration process uses the selected features from the Feature Selection
step to ensure that only the most informative features are used to harmonize the datasets.
samples_list <- PrepSCTIntegration(object.list = samples_list, anchor.features = features)
anchors <- FindIntegrationAnchors(object.list = samples_list, normalization.method = "SCT", anchor.features = features)
samples_integrated <- IntegrateData(anchorset = anchors, normalization.method = "SCT")
Dimensionality reduction is employed to simplify the high-dimensional dataset into a more interpretable form while preserving essential information. This analysis uses PCA to initially reduce the data, followed by UMAP and t-SNE to visualize the dataset in two dimensions, facilitating easier identification of patterns and groupings within the data. The process is crucial for uncovering inherent structures and driving further analyses like clustering.
samples_integrated <- RunPCA(samples_integrated, features = features)
samples_integrated <- RunUMAP(samples_integrated, dims = 1:20)
samples_integrated <- RunTSNE(samples_integrated, dims = 1:20)
umap_plot <- DimPlot(samples_integrated, reduction = "umap", group.by = "cell.type.annot")
print(umap_plot + ggtitle("UMAP Plot"))
tsne_plot <- DimPlot(samples_integrated, reduction = "tsne", group.by = "cell.type.annot")
print(tsne_plot + ggtitle("t-SNE Plot"))
ElbowPlot(samples_integrated, ndims = 50)
Clustering groups cells based on their gene expression patterns, revealing biological distinctions across the dataset.
This process is crucial for identifying different cell populations or states within the spatial transcriptomics data.
We use the Louvain algorithm implemented in the FindClusters
function of Seurat, which considers the previously computed PCA for determining cell similarity.
samples_integrated <- FindClusters(samples_integrated, resolution = 0.1)
cluster_plot <- DimPlot(samples_integrated, reduction = "umap", group.by = "seurat_clusters")
print(cluster_plot + ggtitle("UMAP with Clusters"))
Differential expression analysis is conducted to identify genes that show statistically significant differences in expression between the clusters identified in the previous step. This analysis helps in characterizing the biological differences between the cell states or types and can guide further biological interpretation and validation.
# Find differentially expressed genes between clusters
de_results <- FindMarkers(samples_integrated, ident.1 = 1, ident.2 = 2, min.pct = 0.25, only.pos = T)
head(de_results)