SageNet is a robust and generalizable graph neural network approach that probabilistically maps dissociated single cells from an scRNAseq dataset to their hypothetical tissue of origin using one or more reference datasets aquired by spatially resolved transcriptomics techniques. It is compatible with both high-plex imaging (e.g., seqFISH, MERFISH, etc.) and spatial barcoding (e.g., 10X visium, Slide-seq, etc.) datasets as the spatial reference.
SageNet is implemented with pytorch and pytorch-geometric to be modular, fast, and scalable. Also, it uses anndata
to be compatible with scanpy and squidpy for pre- and post-processing steps.
Note
v1.0
The dependency torch-geometric
should be installed separately, corresponding the system specefities, look at this link for instructions. We recommend to use Miniconda.
First, clone the repository using git
:
git clone https://github.com/MarioniLab/sagenet
Then, cd
to the sagenet folder and run the install command:
cd sagenet python setup.py install #or pip install .
The easiest way to get SageNet is through pip using the following command:
pip install sagenet
import sagenet as sg import scanpy as sc import squidpy as sq import anndata as ad import random random.seed(10)
Input:
- Expression matrix associated with the (spatial) reference dataset (an
anndata
object)
adata_r = sg.MGA_data.seqFISH1()
- gene-gene interaction network
glasso(adata_r, [0.5, 0.75, 1])
- one or more partitionings of the spatial reference into distinct connected neighborhoods of cells or spots
adata_r.obsm['spatial'] = np.array(adata_r.obs[['x','y']]) sq.gr.spatial_neighbors(adata_r, coord_type="generic") sc.tl.leiden(adata_r, resolution=.01, random_state=0, key_added='leiden_0.01', adjacency=adata_r.obsp["spatial_connectivities"]) sc.tl.leiden(adata_r, resolution=.05, random_state=0, key_added='leiden_0.05', adjacency=adata_r.obsp["spatial_connectivities"]) sc.tl.leiden(adata_r, resolution=.1, random_state=0, key_added='leiden_0.1', adjacency=adata_r.obsp["spatial_connectivities"]) sc.tl.leiden(adata_r, resolution=.5, random_state=0, key_added='leiden_0.5', adjacency=adata_r.obsp["spatial_connectivities"]) sc.tl.leiden(adata_r, resolution=1, random_state=0, key_added='leiden_1', adjacency=adata_r.obsp["spatial_connectivities"])
Training:
sg_obj = sg.sage.sage(device=device) sg_obj.add_ref(adata_r, comm_columns=['leiden_0.01', 'leiden_0.05', 'leiden_0.1', 'leiden_0.5', 'leiden_1'], tag='seqFISH_ref', epochs=20, verbose = False)
Output:
- A set of pre-trained models (one for each partitioning)
!mkdir models !mkdir models/seqFISH_ref sg_obj.save_model_as_folder('models/seqFISH_ref')
- A set of Spatially Informative Genes
ind = np.where(adata_r.var['ST_all_importance'] <= 5)[0] SIGs = list(adata_r.var_names[ind]) with rc_context({'figure.figsize': (4, 4)}): sc.pl.spatial(adata_r, color=SIGs, ncols=4, spot_size=0.03, legend_loc=None)
Input:
- Expression matrix associated with the (dissociated) query dataset (an
anndata
object)
adata_q = sg.MGA_data.scRNAseq()
Mapping:
sg_obj.map_query(adata_q)
Output:
- The reconstructed cell-cell spatial distance matrix
adata_q.obsm['dist_map']
- A consensus scoring of mappability (uncertainity of mapping) of each cell to the references
adata_q.obs
import anndata dist_adata = anndata.AnnData(adata_q.obsm['dist_map'], obs = adata_q.obs) knn_indices, knn_dists, forest = sc.neighbors.compute_neighbors_umap(dist_adata.X, n_neighbors=50, metric='precomputed') dist_adata.obsp['distances'], dist_adata.obsp['connectivities'] = sc.neighbors._compute_connectivities_umap( knn_indices, knn_dists, dist_adata.shape[0], 50, # change to neighbors you plan to use ) sc.pp.neighbors(dist_adata, metric='precomputed', use_rep='X') sc.tl.umap(dist_adata) sc.pl.umap(dist_adata, color='cell_type', palette=celltype_colours)
To see some examples of our pipeline's capability, look at the notebooks directory. The notebooks are also available on google colab:
If you have a question or new architecture or a model that could be integrated into our pipeline, you can post an issue or reach us by email.
This work is led by Elyas Heidari and Shila Ghazanfar as a joint effort between MarioniLab@CRUK@EMBL-EBI and RobinsonLab@UZH.