-
Notifications
You must be signed in to change notification settings - Fork 5
Analysis
- A) Raw image processing
- (A.1) Image extraction
- (A.2) Image normalisation
- (A.3) Image thresholding and masking
- B) Pixel-based analysis
- C) Cell-based analysis
- (C.1A) Cell segmentation with CellProfiler
- (C.1B) Cell segmentation with StarDist
- (C.2) Cell masking
- (C.3) Cell masking visualisation
- (C.4A.1) Unsupervised clustering
- (C.4A.2) Unsupervised clustering visualisation
- (C.4B.1) Expression thresholding
- (C.4B.2) Expression thresholding visualisation
- (C.5A.1) Homotypic spatial analysis
- (C.5A.2) Homotypic spatial analysis visualisation
- (C.5B.1) Heterotypic spatial analysis
- (C.5B.2) Heterotypic spatial analysis visualisation
- (C.5B.3) Heterotypic analysis permutation test
- (C.5B.4) Heterotypic analysis permutation test visualisation
The first step in SIMPLI analysis workflow is the preprocessing of raw images and it consists of 3 processes:
- (A.1) Image extraction
- (A.2) Image normalisation
- (A.3) Image thresholding and masking
In this process tiff files are extracted from the raw acquisition data from imaging mass cytometry (IMC) experiments. This process should be skipped if the input data does not consist of raw IMC data. See the input page for more details.
Inputs and parameters:
-
raw_metadata_file
with the ROI metadata. -
channel_metadata_file
with the [metal and channel metadata]https://github.com/ciccalab/SIMPLI/wiki/Input#channel-metadata-file). -
tiff_type
type of tiff output ("single"
or"ome"
).
Outputs:
- Images: Images (uncompressed 16 bit tiff) can be output in two different formats:
- single channel tiff files (one for each of the selected channels) (
$output_folder/Images/Raw/sample_name/sample_name-label-raw.tiff
) - .ome.tiff files (one per sample, the order of channels is the same as in the the
channel_metadata
file). ($output_folder/Images/Raw/sample_name/sample_name-all_raw.ome.tiff
)
- single channel tiff files (one for each of the selected channels) (
- Metadata:
- Metadata for all images from all samples:
$output_folder/Images/Raw/raw_tiff_metadata.csv
- By sample metadata for the raw images is also output at at:
$output_folder/Images/Raw/sample_name/sample_name-raw_tiff_metadata.csv
- Metadata for all images from all samples:
The output of this process is located at: $output_folder/Images/Raw/
This process can be skipped by setting the skip_conversion
parameter to true
.
This process performs 99th percentile normalisation of the raw tiff images generated in the Image extraction process or specified by the user with if the image extraction process is skipped. Images are normalized individually by marker and by sample, thus enabling the use of a single threshold for the same marker across multiple samples. This might not always be desirable if the staining is not uniform within samples (images). Additionally, if the images for some markers have particularly low signal-to-noise ratios, as the 99th percentile cutoff for normalisation could be too stringent. In these cases the normalization can be skipped and sample specific thresholds can be used in the image thresholding and masking step.
Inputs and parameters:
-
raw_metadata_file
with the tiff image metadata. -
tiff_type
type of tiff output ("single"
or"ome"
).
Outputs:
- Normalised Images: Images (uncompressed 16 bit tiff) can be output in two different formats:
- single channel tiff files (one for each of the selected channels) (
$output_folder/Images/Normalized/sample_name/sample_name-label-normalized.tiff
) - .ome.tiff files (one per sample, the order of channels is the same as in the the
channel_metadata
file). (output_folder/Images/Normalized/sample_name/sample_name-ALL-normalized.ome.tiff
)
- single channel tiff files (one for each of the selected channels) (
- Metadata:
- Metadata for all images from all samples:
$output_folder/Images/Normalized/normalized_tiff_metadata.csv
- By sample metadata for the normalized images is also output at at:
-
$output_folder/Images/Normalized/sample_name/sample_name-normalized_tiff_metadata.csv
in long format. -
$output_folder/Images/Normalized/sample_name/sample_name-normalized_tiff_metadata.csv
in CellProfiler4 compatible wide format.
-
- Metadata for all images from all samples:
The output of this process is located at: $output_folder/Images/Normalized/
This process can be skipped by setting the skip_normalization
parameter to true
.
This process is used to perform the image preprocessing that will generate the final images, which can then be used as input for the pixel-based or the cell-based analysis. The input images for this process can be derived from:
- images generated in the Image normalisation process.
- images generated in the Image extraction process if the Image normalisation process is skipped.
- images specified by the user with the
normalized_metadata_file
file if the image extraction and the image normalisation processes are skipped.
Inputs and parameters:
-
cp4_preprocessing_cppipe
Path to the CellProfiler4 pipeline file used for image preprocessing. See the CellProfiler4 pipeline page for its requirements. -
normalized_metadata_file
with the normalised tiff image metadata.
Outputs:
- Preprocessed Images: (uncompressed 16 bit single-channel tiff)
$output_folder/Images/Preprocessed/sample_name/sample_name-label-Preprocessed.tiff
- Metadata:
- Metadata for all images from all samples
$output_folder/Images/Preprocessed/preprocessed_tiff_metadata.csv
- By sample metadata for the preprocessed images is also output at at:
-
$output_folder/Images/Preprocessed/sample_name/sample_name-preprocessed_metadata.csv
in long format. -
$output_folder/Images/Preprocessed/sample_name-cp4-preprocessed_metadata.csv
in CellProfiler4 compatible wide format.
-
- Metadata for all images from all samples
The output of this process is located at: $output_folder/Images/Preprocessed/
This process can be skipped by setting the skip_preprocessing
parameter to true
.
The pixel-based approach implemented in SIMPLI enables the quantification of pixels which are positive for a specific marker or combination of markers. These marker-positive areas can be normalised over the area of the whole image, or the areas of an image mask defined by a the combination of any of the input images with logical operators.
This process measures the areas of interest and normalises them on the selected image masks according to the input metadata. The input images for this process can be derived from:
- images generated in the image thresholding and masking process.
- images specified by the user with the
preprocessed_metadata_file
file if the image thresholding and masking process is skipped.
Inputs and parameters:
-
preprocessed_metadata_file
with the tiff image metadata. -
area_measurements_metadata
Path to thearea_measurements_metadata
file, it has two columns:-
marker
= Marker or combination of markers whose area should be measured. -
main_marker
= Marker or combination of markers whose area should be used to normalise the area of marker. Ifmain_marker
is the same asmarker
then the whole area of the image is used for normalisation.
-
marker
and main_marker
value should be either a value from the label
column of the preprocessed_metadata_file
or a combination of those values with logical operators (AND = &
, OR = |
, NOT = !
, ()
= round brackets).
Outputs:
The area measurements are saved in $output_folder/area_measurements.csv
. The file has the following columns:
-
sample_name
= Sample name. -
main_marker
= Combination of markers used to normalize themarker
area
. -
marker
= Main combination of markers measured. -
area
= Area positive for themarker
combination of markers. -
main_marker_area
= Area positive for themain_marker
combination of markers. -
total_ROI_area
= Total image area for this sample. -
percentage
= Area of the marker (area
) / area of the main marker (main_marker_area
) * 100.
All areas are in pixel2.
This process can be skipped by setting the skip_area
parameter to true
.
Generate boxplots showing the comparisons of the distributions of normalised marker-positive areas between 2 categories of samples. The input data for this process can be derived from:
- areas measured in the measurement of positive marker areas process.
- areas specified by the user with the
area_measurements_file
file if the measurement of positive marker areas process is skipped.
Inputs and parameters:
-
sample_metadata_file
with the metadata of all samples used in the analysis. -
area_measurements_file
Path to thearea_measurements_file
it should have the following columns:-
sample_name
= Sample name, should match a value in thesample_metadata_file
metadata file. -
main_marker
= Marker or combination of marker used for normalisation. -
marker
= Marker or combination of marker used to calculate the area. -
percentage
= Area of the marker / area of the main marker * 100.
-
FDR is calculated using the number of different marker
values for each value of main_marker
.
Outputs:
The area measurements are saved in $output_folder/Plots/Area_Plots/Boxplots/
a separate folder is created for each main_marker
.
For each main_marker
a pdf file ($output_folder/Plots/Area_Plots/Boxplots/main_marker/main_marker_area_boxplots.pdf
) containing a boxplot for each value of marker
associated to that main_marker
.
The output of this process is located at: $output_folder/Plots/Area_Plots/Boxplots/
This process can be skipped by setting the skip_area_visualization
parameter to true
.
The cell-based analysis aims to investigate the qualitative and quantitative cell representation within the imaged tissue through (1) cell segmentation, cell phenotyping by unsupervised clustering or expression thresholding and spatial analysis of cell densities (homotypic spatial analysis) and distances (heterotypic spatial analysis). The steps of the cell-based analysis are:
- Single-cell data extraction:
- (C.1A) Cell segmentation with CellProfiler4
- (C.1B) Cell segmentation with StarDist
- (C.2) Cell masking
- (C.3) Cell masking visualisation
- Cell phenotyping:
- (C.4A.1) Unsupervised clustering
- (C.4A.2) Unsupervised clustering visualisation
- (C.4B.1) Expression thresholding
- (C.4B.2) Expression thresholding visualisation
- Spatial analysis:
- (C.5A.1) Homotypic spatial analysis
- (C.5A.2) Homotypic spatial analysis visualisation
- (C.5B.1) Heterotypic spatial analysis
- (C.5B.2) Heterotypic spatial analysis visualisation
- (C.5B.3) Heterotypic analysis permutation test
- (C.5B.4) Heterotypic analysis permutation test visualisation
Generate single-cell data in .csv
format and the cell masks in tiff format. The input data for this process can be derived from:
- images generated in the image thresholding and masking process.
- images specified by the user with the
preprocessed_metadata_file
file if the image thresholding and masking process is skipped.
Inputs and parameters:
-
preprocessed_metadata_file
with the tiff image metadata. -
cp4_segmentation_cppipe
: path to a CellProfiler4 pipeline to be used for segmentation.
Outputs:
The output of this process is located at: $output_folder/CellProfiler4_Segmentation/
-
Single cell data:
- Single cell data for all samples:
$output_folder/CellProfiler4_Segmentation/CellProfiler4-unannotated_cells.csv
- Single cell data for each sample separately:
$output_folder/CellProfiler4_Segmentation/sample_name/sample_name-CellProfiler4-Cells.csv
The single-cell data is a.csv table with a row for each cell and the following annotations:
-
ImageNumber
: CellProfiler4 specific image identifier. -
ObjectNumber
: Unique identity number from 1 to 216-1, matches the corresponding pixels in the cell masks. -
Metadata_sample_name
: Matching thesample_name
values in thepreprocessed_metadata_file
. -
Location_Center_X
andLocation_Center_Y
: Location of the cell centroid in the image in pixel, used for both the homotypic and heterotypic spatial analyses. - CellProfiler4 marker intensity measurements: Used for cell phenotyping by Unsupervised clustering or by Expression thresholding
The exact set of fields and their order depends on the CellProfiler4 pipeline used in the analysis.
- Single cell data for all samples:
-
Cell masks:
Cell masks in uint16 tiff format:$output_folder/CellProfiler4_Segmentation/sample_name/sample_name-CellProfiler4-Cell_Mask.tiff
To each cell is associated a unique identity number from 1 to 216-1. All the pixel belonging to a given cell have their value set to its identity number. Pixels not belonging to any cell are set to 0.
These images are compatible with several other tools for downstream analysis including:- CellProfiler4: The cells can be imported as objects from the image.
- Histocat
- cytomapper
To use the cells identified with this process in the downstream steps:
-
cell_source
= "CellProfiler" not required if only one of the two segmentation tools is used.
This process can be skipped by setting the skip_cp_segmentation
parameter to true
.
Generate single-cell data in.csv
format and the cell masks in tiff format. The input data for this process can be derived from:
- images generated in the image thresholding and masking process.
- images specified by the user with the
preprocessed_metadata_file
file if the image thresholding and masking process is skipped.
For more details on segmentation with StarDist please refer to the following pages:
- Uwe Schmidt, Martin Weigert, Coleman Broaddus, and Gene Myers.
Cell Detection with Star-convex Polygons. (free link) International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), Granada, Spain, September 2018. - StarDist repository
- StarDist FAQ
For more details on StarDist Segmentation in SIMPLI please refer to this page.
Inputs and parameters:
-
preprocessed_metadata_file
with the tiff image metadata. -
sd_labels_to_segment
= markers to include in the image on which the segmentation is performed, must match the number of dimensions in the model. (comma separated list) -
sd_model_name
= model to use for the segmentation (name of default model or a pretrained one) -
sd_model_path
= path to the model or "default" for default models -
sd_prob_thresh
= probability threshold used for calling cells: 0 < value < 1 or "default" to use the default valuse saved in the model. -
sd_nms_thresh
= overlap threshold above which Non-Maximum Suppression is performed: 0 < value < 1 or "default" to use the default valuse saved in the model
Outputs:
The output of this process is located at: $output_folder/StarDist_Segmentation/
-
Single cell data:
- Single cell data for all samples:
$output_folder/StarDist_Segmentation/StarDist-unannotated_cells.csv
- Single cell data for each sample separately:
$output_folder/StarDist_Segmentation/sample_name/sample_name-StarDist-Cells.csv
The single-cell data is a.csv table with a row for each cell and the following annotations:
-
ObjectNumber
: Unique identity number from 1 to 216-1, matches the corresponding pixels in the cell masks. -
Metadata_sample_name
: Matching thesample_name
values in thepreprocessed_metadata_file
. -
Location_Center_X
andLocation_Center_Y
: Location of the cell centroid in the image in pixel, used for both the homotypic and heterotypic spatial analyses. - marker intensity measurements: minimum, maximun and mean
- spatial features computed with
skimage.measure.regionprops
from the skimage library.
- Single cell data for all samples:
-
Cell masks:
Cell masks in uint16 tiff format:$output_folder/StarDist_Segmentation/sample_name/sample_name-StarDist-Cell_Mask.tiff
To each cell is associated a unique identity number from 1 to 216-1. All the pixel belonging to a given cell have their value set to its identity number. Pixels not belonging to any cell are set to 0.
These images are compatible with several other tools for downstream analysis including:- CellProfiler4: The cells can be imported as objects from the image.
- Histocat
- cytomapper
To use the cells identified with this process in the downstream steps:
-
cell_source
= "StarDist" not required if only one of the two segmentation tools is used.
This process can be skipped by setting the skip_sd_segmentation
parameter to true
.
This process allows to identify cells belonging to different populations or tissue compartments according to the overlap of their areas with those of specific masks:
The input images for this process can be derived from:
- images generated in the image thresholding and masking process.
- images specified by the user with the
preprocessed_metadata_file
file if the image thresholding and masking process is skipped.
The input cell masks for this process can be derived from:
- cell masks generated in the cell segmentation process.
- cell masks specified by the user with the
single_cell_masks_metadata
file if the cell segmentation process is skipped.
The input cell data for this process can be derived from:
- cell data generated in the cell segmentation process.
- cell data specified by the user with the
preprocessed_metadata_file
file if the cell segmentation process is skipped.
Inputs and parameters:
-
preprocessed_metadata_file
with the tiff image metadata. -
single_cell_masks_metadata
with the following columns:-
sample_name
= Sample name matching a value in thepreprocessed_metadata_file
file -
label
="Cell_Mask"
-
file_name
= path to a cell mask in uint16 tiff format
-
-
single_cell_data_file
= A .csv file with the following columns:-
Metadata_sample_name
= Sample name matching a value in thepreprocessed_metadata_file
file -
ObjectNumber
= Unique number identifying the pixel belonging to the cell in cell mask.
-
-
cell_masking_metadata
= A .csv file indicating which masks to use and which thresholds of overlap to apply, it should have the following columns:-
cell_type
= name of the cell type being identified. -
threshold_marker
= marker to use as mask. It should match a value in the label column of thepreprocessed_metadata_file
. It can be a combination of markers specified with logical operators (AND =&
, OR =|
, NOT =!
,()
= round brackets). -
threshold_value
= 1 - fraction of area overlap between the cell and the mask. Cells whose area is overlapping the mask by a fraction higher than threshold marker are considered as positive.
-
If a cell is positive for more than one cell type, than it is assigned to the cell type defined first (by row order) in the cell_masking_metadata
file. Cells negative for all cell_types are marked as UNASSIGNED
.
Outputs:
The annotated cell table is a .csv table with the same columns as the input table plus the following annotations:
-
cell_type
: Name used to identify the cell type during the analysis. -
CellName
: Unique Cell identity string in the form:Metadata_sample_name_ObjectNumber
The cell type level table is saved at:$output_folder/annotated_cells.csv
This process can be skipped by setting the skip_cell_type_identification
parameter to true
.
This process allows to plot the results of the cell masking process. The input cell masks for this process can be derived from:
- cell masks generated in the cell segmentation process.
- cell masks specified by the user with the
single_cell_masks_metadata
file if the cell segmentation process is skipped.
The input cell data for this process can be derived from:
- cell data generated in the cell segmentation process.
- cell data specified by the user with the
preprocessed_metadata_file
file if the cell segmentation process is skipped.
Inputs and parameters:
-
sample_metadata_file
with the metadata of all samples used in the analysis. -
single_cell_masks_metadata
with the following columns:-
sample_name
= Sample name matching a value in thesample_metadata_file
file -
label
="Cell_Mask"
-
file_name
= path to a cell mask in uint16 tiff format
-
-
annotated_cell_data_file
= A .csv file with the following columns:-
Metadata_sample_name
= Sample name matching a value in thesample_metadata_file
file -
cell_type
= name of the cell type being identified.
-
-
cell_masking_metadata
= A .csv file indicating which masks to use and which thresholds of overlap to apply, it should have the following columns:-
cell_type
= name of the cell type being identified. -
threshold_marker
= marker to use as mask. It should match a value in the label column of thepreprocessed_metadata_file
. It can be a combination of markers specified with logical operators (AND =&
, OR =|
, NOT =!
,()
= round brackets). -
threshold_value
= 1 - fraction of area overlap between the cell and the mask. Cells whose area is overlapping the mask by a fraction higher than threshold marker are considered as positive. -
color
= Color used to represent this cell type. Accepted values are color names or hexadecimal #RGB or #RGBA format ("#RRGGBB" or "#RRGGBBAA"). Cells ofcell_type
="UNASSIGNED"
are automatically assigned the color"#888888".
-
Outputs:
The cell type level plots are saved in $output_folder/Plots/Cell_Type_Plots/
and they are divided in:
-
Barplots:
$output_folder/Plots/Cell_Type_Plots/Barplots
.pdf files with barplots with the proportions of all cell types + unassigned cells in:- Each sample: one bar per sample.
- Category (optional): one bar per category, If the comparison column in the
sample_metadata_file
file contains 2 categories. The barplots are divided in the following .pdf files:-
dodged_barplots.pdf
= dodged barplots including"UNASSIGNED"
cells. -
dodged_assigned_ony_barplots.pdf
= dodged barplots excluding"UNASSIGNED"
cells. -
stacked_barplots.pdf
= stacked barplots including"UNASSIGNED"
cells. -
stacked_assigned_only_barplots.pdf
= stacked barplots excluding"UNASSIGNED"
cells.
-
-
Overlays:
$output_folder/Plots/Cell_Type_Plots/Overlays/
- One overlay-sample_name.tiff image per sample. Each cell is coloured by cell type according to the color specified in the cell types metadata file
- overlay_legend.pdf: legend mapping each cell type to its color.
-
Boxplots (Optional):
$output_folder/Plots/Cell_Type_Plots/Boxplots/
If the comparison column in thesample_metadata_file
file contains 2 categories,two pdf files are porduced each, with a boxplot for each cell type:-
boxplots.pdf
= boxplots including"UNASSIGNED"
cells. -
assigned_ony_boxplots.pdf
= boxplots excluding"UNASSIGNED"
cells.
The FDR is calculated with the Benjamini-Hochberg procedure.
-
This process can be skipped by setting the skip_type_visualization
parameter to true
.
This process allows to perform unsupervised clustering on cells from one or more set of cells. The input cell data for this process can be derived from:
- cell data annotated in the cell masking process.
- cell data specified by the user with the
annotated_cell_data_file
file if the cell masking process is skipped.
Inputs and parameters:
-
sample_metadata_file
with the metadata of all samples used in the analysis. If the value of thecomparison
column for the sample is"NA"
all cells from the sample are excluded from the clustering. -
annotated_cell_data_file
= A .csv file with the following columns:-
Metadata_sample_name
= Sample name matching a value in thesample_metadata_file
file. -
cell_type
= name of the cell type being identified. -
ObjectNumber
= number identifiying a cell. Needs to be unique within each sample. - Columns with the expression values of the markers used for clustering, the names should match the values in the
clustering_markers
column in thecell_clustering_metadata
file.
-
-
cell_clustering_metadata
metadata file with the parameters for the cell phenotyping by unsupervised clustering. It contains the following columns:-
cell_type
= name of the cell type to use for phenotyping. Set to"NA"
to use all cells in the sample. -
clustering_markers
=@
separated list of markers to use for clustering. The markers must match a column name from theannotated_cell_data_file
. -
clustering_resolutions
=@
separated list of resolutions used to extract the clusters from the graph, use a value above (below) 1.0 if you want to obtain a larger (smaller) number of clusters.
-
See the original Seurat function for details.
Outputs:
The output files divided by cell type are saved in separate subfolders named after the cell type at: $output_folder/Cell_Clusters/CELLTYPE
. For each clustered cell type this step outputs:
- Cell cluster table:
CELLTYPE-clusters.csv
with the following columns:-
CellName
: Cell identity string in the form:Metadata_sample_name
_ObjectNumber
-
Metadata_sample_name
: sample name as in thesample_metadata_file
file. - Clustering resolution columns: res-RESOLUTION-ids for each clustered cell type. Clusters are numbered from 0, the same numbering is used in the plots.
-
ObjectNumber
: Unique identity number from 1 to 216-1, matches the corresponding pixels in the cell masks. - Marker intensity measurements.
-
cell_type
: Name used to identify the clustered cell type during the analysis.
-
- Cell cluster RData:
CELLTYPE-clusters.RData
The Seurat 2.3.0 object. See the original [Seurat page]https://github.com/satijalab/seurat/blob/v2.3.0/R/seurat.R) for details. This can be converted to a Seurat object compatible with the latest Seurat version with the UpdateAssay function.
A collected clustered cells table is saved at: $output_folder/clustered_cells.csv
. This file is a .csv table with a row for each cell in the cell types that underwent clustering and the following annotations:
-
comparison
: Name of the cell cluster table divided by cell type containing the cell. -
CellName
: Cell identity string in the form:Metadata_sample_name
_ObjectNumber
-
Metadata_sample_name
: sample name as in thesample_metadata_file
file. - Clustering resolution columns: res-RESOLUTION-ids for each clustered cell type. Clusters are numbered from 0, the same numbering is used in the plots.
-
ObjectNumber
: Unique identity number from 1 to 216-1, matches the corresponding pixels in the cell masks. - Marker intensity measurements.
-
cell_type
: Name used to identify the cell type during the analysis.
This process can be skipped by setting the skip_cell_clustering
parameter to true
This process allows to plot the results of the unsupervised clustering process. The input annotated cell data for this process can be derived from:
- Annotated cell data generated in the unsupervised clustering process.
- Annotated cell data specified by the user with the
clustered_cell_data_file
file if the unsupervised clustering process is skipped.
Inputs and parameters:
-
sample_metadata_file
with the metadata of all samples used in the analysis. If the value of thecomparison
column for the sample is"NA"
all cells from the sample are excluded from the clustering. -
clustered_cell_data_file
= A .csv file with the following columns:-
Metadata_sample_name
= Sample name matching a value in thesample_metadata_file
file. -
cell_type
= name of the cell type being identified. -
ObjectNumber
= number identifiying a cell. Needs to be unique within each sample. - Columns with the expression values of the markers used for clustering, the names should match the values in the
clustering_markers
column in thecell_clustering_metadata
file. - Columns with the cluster annotation for each cell. The column names should match this format:
res_RESOLUTION_ids
whereRESOLUTION
matches one of the values of the resolution column in:
-
-
cell_clustering_metadata
metadata file with the parameters for the cell phenotyping by unsupervised clustering. It contains the following columns:-
cell_type
= name of the cell type to use for phenotyping. Set to"NA"
to use all cells in the sample. -
clustering_markers
=@
separated list of markers to use for clustering. The markers must match a column name from theannotated_cell_data_file
. -
clustering_resolutions
=@
separated list of resolutions used to extract the clusters from the graph, use a value above (below) 1.0 if you want to obtain a larger (smaller) number of clusters.
-
-
high_color
= Color for the max expression value in the heatmap or UMAP defaults to"'#FF0000'"
-
mid_color
= Color for the midpoint of the expression value in the heatmap or UMAP defaults to"'#FFFFFF'"
-
low_color
= Color for the minimum expression value in the heatmap or UMAP defaults to"'#0000FF'"
Accepted values are color names or hexadecimal #RGB or #RGBA format ("#RRGGBB" or "#RRGGBBAA").
Outputs:
The plots illustrating the results of the unsupervised clustering are saved in $output_folder/Plots/Cell_Cluster_Plots/
and they are divided in:
-
UMAPs:
$example_output/Plots/Cell_Cluster_Plots/CELL_TYPE/UMAPs/
For each clustering resolution a .pdf file with UMAP plots colored by:- Sample
- Cluster: clusters at this level of resolution.
- Marker: markers used for the clustering.
-
Boxplots (Optional):
$output_folder/Plots/Cell_Cluster_Plots/Cluster_Comparisons/
If the comparison metadata column of thesample_metadata_file
has exactly 2 (non"NA"
) categories: For each level of resolution a .pdf file is produced, the file contains:
+ Heatmap: showing for each cluster the expression of the markers used for the clustering.
+ Boxplots: one for each cluster, with the percentage of cells belonging to that cluster on the total cells in the clustered cell type. The FDR is calculated using the Benjamini-Hochberg procedure for all clusters. -
Heatmaps (Optional) If the comparison metadata column does not have exactly 2 (non
"NA"
) categories. For each level of resolution a .pdf file is produced containing an heatmap showing for each cluster the expression of the markers used for the clustering.
This process can be skipped by setting the skip_cluster_visualization
parameter to true
This process allows to phenotype cells from one or more set by expression thresholding. The input cell data for this process can be derived from:
- cell data annotated in the cell masking process.
- cell data specified by the user with the
annotated_cell_data_file
file if the cell masking process is skipped.
Inputs and parameters:
-
cell_thresholding_metadata
Metadata file defining thresholds and phenotypes:-
cell_type
= name of the cell type to use for phenotyping. Set to"NA"
to use all cells in the sample. -
phenotype_name
= name of the phenotype to use for cells passing the expression thresholding. -
threshold_expression
= Definition of the threshold or thresholds to apply. It should be written asmarker_expression_column
comparison operator
(accepted operators are>
,<
,>=
,<=
,==
,!=
)value
. Marker intesities can be combined with arithmetic operators (+
,-
,*
,/
, ...). Expressions for different thresholds can be combined with logical operators (&
,|
,!
). If a cell passes more than onethreshold_expression
for the samecell_type
then it is with the value ofphenotype_name
corresponding to the last passedthreshold_expression
in thecell_thresholding_metadata
file.
-
-
annotated_cell_data_file
= A .csv file with the following columns:-
Metadata_sample_name
= Sample name matching a value in thesample_metadata_file
file. -
cell_type
= name of the cell type being identified. -
ObjectNumber
= number identifiying a cell. Needs to be unique within each sample. - Columns with the expression values of the markers used for clustering, the names should match the values in the
threshold_expression
column in thecell_thresholding_metadata
file.
-
Outputs:
The thresholded cell table is a .csv table with the same columns as the input table plus the following annotations:
-
CellType_Thresholded
: columns, one for each value of thecell_type
column in thecell_thresholding_metadata
file. Its value can be one of:-
NA
= This cell was not annotated because it has a differentcell_type
from the one being phenotyped. -
cell_phenotype
= One of the values of thephenotype_name
column. If a cell passes more than onethreshold_expression
for the samecell_type
then it is with the value ofphenotype_name
corresponding to the last passedthreshold_expression
in thecell_thresholding_metadata
file. -
UNASSIGNED
= If the cell does not pass anythreshold_expression
for thatcell_type
.
-
The thresholded cell table is saved at: $output_folder/thresholded_cells.csv
This process can be skipped by setting the skip_cell_thresholding
parameter to true
This process allows to plot the results of the cell phenotyping by the expression thresholding process. The input annotated cell data for this process can be derived from:
- Annotated cell data generated in the expression thresholding process.
- Annotated cell data specified by the user with the
thresholded_cell_data_file
file if the expression thresholding process is skipped.
The input cell masks for this process can be derived from:
- cell masks generated in the cell segmentation process.
- cell masks specified by the user with the
single_cell_masks_metadata
file if the cell segmentation process is skipped.
Inputs and parameters:
-
sample_metadata_file
with the metadata of all samples used in the analysis. If the value of thecomparison
column for the sample is"NA"
, then the sample is excluded from the plotting. -
cell_thresholding_metadata
Metadata file defining thresholds and phenotypes:-
cell_type
= name of the cell type to use for phenotyping. Set to"NA"
to use all cells in the sample. -
phenotype_name
= name of the phenotype to use for cells passing the expression thresholding. -
threshold_expression
= Definition of the threshold or thresholds to apply. It should be written asmarker_expression_column
comparison operator
(accepted operators are>
,<
,>=
,<=
,==
,!=
)value
. Marker intesities can be combined with arithmetic operators (+
,-
,*
,/
, ...). Expressions for different thresholds can be combined with logical operators (&
,|
,!
). If a cell passes more than onethreshold_expression
for the samecell_type
then it is with the value ofphenotype_name
corresponding to the last passedthreshold_expression
in thecell_thresholding_metadata
file. -
color
= Color used to represent the cell phenotype in barplots, and density plots. -
plotting_markers
=@
separated list of markers to include in the heatmap for thiscell_type
they must match the names of themarker_expression_column
in thethresholded_cell_data_file
-
-
thresholded_cell_data_file
= A .csv file with the following columns:-
Metadata_sample_name
= Sample name matching a value in thesample_metadata_file
file. -
cell_type
= name of the cell type being identified. -
ObjectNumber
= number identifiying a cell. Needs to be unique within each sample. -
CellType_Thresholded
columns: one for each value of thecell_type
column in thecell_thresholding_metadata
file. -
marker_expression_column
columns: columns with the expression values of the markers used for clustering, the names should match the values in thethreshold_expression
column in thecell_thresholding_metadata
file.
-
-
single_cell_masks_metadata
with the following columns:-
sample_name
= Sample name matching a value in thesample_metadata_file
file -
label
="Cell_Mask"
-
file_name
= path to a cell mask in uint16 tiff format
-
-
high_color
= Color for the max expression value in the heatmap or UMAP defaults to"'#FF0000'"
-
mid_color
= Color for the midpoint of the expression value in the heatmap or UMAP defaults to"'#FFFFFF'"
-
low_color
= Color for the minimum expression value in the heatmap or UMAP defaults to"'#0000FF'"
Accepted values are color names or hexadecimal #RGB or #RGBA format ("#RRGGBB" or "#RRGGBBAA").
This process can be skipped by setting the skip_thresholding_visualization
parameter to true
.
Outputs:
The output plots are saved at: $output_folder/Plots/Cell_Threshold_Plots/
:
-
Barplots:
$output_folder/Plots/Cell_Type_Plots/Barplots
.pdf files (one for eachcell_type
) with barplots with the proportions of allphenotype_name
+ unassigned cells in:- Each sample: one bar per sample.
- Category (optional): one bar per category, If the comparison column in the
sample_metadata_file
file contains 2 categories.
-
Overlays:
$output_folder/Plots/Cell_Type_Plots/Overlays/
- One
cell_type-overlay-sample_name.tiff
image for eachcell_type
for each sample. Each cell is coloured byphenotype_name
according to the color specified in thecolor
of thecell_thresholding_metadata
file. - overlay_legend.pdf: legend mapping each cell type to its color.
- One
-
Boxplots:
$output_folder/Plots/Cell_Threshold_Plots/Boxplots/
If the comparison metadata column of thesample_metadata_file
has exactly 2 (non"NA"
) categories: For eachcell_type
in thecell_thresholding_metadata
file a .pdf file is produced, the file contains one boxplot for eachphenotype_name
, with the percentage of cells belonging to thatphenotype_name
on the total cells in the cell phenotype. The FDR is calculated using the Benjamini-Hochberg procedure for all cell phenotypes. -
Density Plots:
Plots/Cell_Threshold_Plots/Density_Plots/
Density plots showing the distribution of cells of thecell_type
according to the expression of all the markers in thethreshold_expression
of eachphenotype_name
. The number of cells and the expression values are represented in Log scale. -
Heatmaps:
$output_folder/Plots/Cell_Threshold_Plots/Heatmaps
For eachcell_type
a .pdf file is produced containing an heatmap showing for eachphenotype_name
the expression of the markers specified in theplotting_markers
column of thecell_thresholding_metadata
file.
This process can be skipped by setting the skip_thresholding_visualization
parameter to true
.
This process allows to identify high-density aggregations of cells of a given cell type or phenotype using the DBSCAN: Density-Based Spatial Clustering and Application with Noise algorithm as implemented in the fpc R Package. The input annotated cell data for this process can be derived from:
- Annotated cells from the cell masking process or supplied through the
annotated_cell_data_file
file parameter. - Cells phenotyped from the unsupervised clustering process or supplied through the
clustered_cell_data_file
file parameter. - Cells phenotyped from the expression thresholding process or supplied through
thresholded_cell_data_file
file parameter. The analyses can be performed on cell types and phenotypes from any combination of these three sources.
Inputs and parameters:
-
homotypic_interactions_metadata
Metadata file with these columns:-
cell_file
: File to read the cell annotations from; it must be one of:-
identification
: annotated cells from the cell masking process or supplied through theannotated_cell_data_file
file parameter. -
thresholding
: cells phenotyped from the unsupervised clustering process or supplied through theclustered_cell_data_file
file parameter. -
clustering
: cells phenotyped from the expression thresholding process or supplied throughthresholded_cell_data_file
file parameter.
-
-
cell_type_column
: Name of the column in thecell_file
containing the annotation of the cell tyte or phenotype to cluster. -
cell_type_to_cluster
: Name of the cell type or phenotype to cluster, must match one of the values of thecell_type_column
in thecell_file
. -
reachability_distance
:eps
argument of thedbscan
function of the fpc R package. Reachability distance, see Ester et al. (1996). -
min_cells
: MinPts argument of thedbscan
function of the fpc R package. MinPtsReachability minimum no. of points, see Ester et al. (1996).
-
- File/s with the cell data:
One for each of the values of thecell_file
column of thehomotypic_interactions_metadata
file. It must have these columns:-
Metadata_sample_name
: Sample name matching a value in thesample_metadata_file
file. -
CellName
: Unique identifier for each cell. -
Location_Center_X
: X coordinate of the cell centroid in the image. -
Location_Center_Y
: Y coordinate of the cell centroid in the image.
-
Location Center X
and Location Center Y
can be otained in several ways for instance from the IdentifyPrimaryObjects module in CellProfiler4 as SIMPLI does, or a they could also be derived with the computeFeatures function from the EBImage R Package.
Outputs:
The output of this process is stored at: $output_folder/Homotypic_interactions
- Files for individual
cell_types
are stored at:$output_folder/Homotypic_interactions/cell_type/cell_type-homotypic_clusters.csv
- A total file collecting the annotations for all
cell_types
are stored at:$output_folder/Homotypic_interactions/homotypic_interactions.csv
These files contains the following columns:
CellName
= Unique identifier for each cell.
Metadata_sample_name
= Sample name matching a value in the sample_metadata_file
file.
Location_Center_X
= X coordinate used for the DBSCAN clustering.
Location_Center_Y
= Y coordinate used for the DBSCAN clustering.
spatial_analysis_cell_type
= Contains the cell types or phenotypes that were annotated in the cell_type_columnin the
cell_file.
cluster= column indicating cluster membership with noise observations (singletons) coded as 0.
isseed` = column indicating whether a point is a seed (not border, not noise).
See the fpc::dbscan documentation for details.
This process can be skipped by setting the skip_homotypic_interactions
parameter to true
.
This process allows to plot the results of the homotypic spatial analysis process. The input annotated cell data for this process can be derived from:
- Cells data with DBSCAN cluster annotations from the homotypic spatial analysis process.
- Cells data with DBSCAN cluster annotations specified by the user with the
homotypic_interactions_file
file if the homotypic spatial analysis process is skipped.
The input cell masks for this process can be derived from:
- cell masks generated in the cell segmentation process.
- cell masks specified by the user with the
single_cell_masks_metadata
file if the cell segmentation process is skipped.
Inputs and parameters:
-
homotypic_interactions_metadata
Metadata file with these columns:-
cell_file
: File to read the cell annotations from; it must be one of:-
identification
: annotated cells from the cell masking process or supplied through theannotated_cell_data_file
file parameter. -
thresholding
: cells phenotyped from the unsupervised clustering process or supplied through theclustered_cell_data_file
file parameter. -
clustering
: cells phenotyped from the expression thresholding process or supplied throughthresholded_cell_data_file
file parameter.
-
-
cell_type_column
: Name of the column in thecell_file
containing the annotation of the cell tyte or phenotype to cluster. -
cell_type_to_cluster
: Name of the cell type or phenotype to cluster, must match one of the values of thecell_type_column
in thecell_file
. -
reachability_distance
:eps
argument of thedbscan
function of the fpc R package. Reachability distance, see Ester et al. (1996). -
min_cells
: MinPts argument of thedbscan
function of the fpc R package. MinPtsReachability minimum no. of points, see Ester et al. (1996). -
color
: color to use to represent the cell type / phenotype in the plots.
-
-
single_cell_masks_metadata
with the following columns:-
sample_name
= Sample name matching a value in thesample_metadata_file
file -
label
="Cell_Mask"
-
file_name
= path to a cell mask in uint16 tiff format
-
Outputs:
The output of this process is stored at: $output_folder/Plots/Homotypic_interactions_Plots
:
Position maps: Map of the image showing dots representing the position of the centroid of each cell in the image. Cells are colored in:
-
black
: cells not belonging to a DBSCAN cluster. -
color
from thehomotypic_interactions_metadata
: cells belonging to a DBSCAN cluster. One file for each cell type / phenotype for each sample named:$output_folder/Plots/Homotypic_Interaction_Plots/cell_type/cell_type-sample_name-homotypic.pdf
.
This process can be skipped by setting the skip_homotypic_visualization
parameter to true
.
This process allows to measure the distribution of the minimum distances between cells of two user defined cell types or phenotypes. This process measueres the distances between all cells of the 1st cell type or phenotype and all cells of the 2nd cell or phenotype, and for each cell of the first cell type or phenotype it returns the minimum distance to a cell of the 2nd cell type or phenotype. The input annotated cell data for this process can be derived from:
- Annotated cells from the cell masking process or supplied through the
annotated_cell_data_file
file parameter. - Cells phenotyped from the unsupervised clustering process or supplied through the
clustered_cell_data_file
file parameter. - Cells phenotyped from the expression thresholding process or supplied through
thresholded_cell_data_file
file parameter. The analyses can be performed on cell types and phenotypes from any combination of these three sources.
Inputs and parameters:
-
heterotypic_interactions_metadata
metadata file with the following columns:-
cell_file1
: File to read the cell annotations for the first cell type or phenotype from; it must be one of:-
identification
: annotated cells from the cell masking process or supplied through theannotated_cell_data_file
file parameter. -
thresholding
: cells phenotyped from the unsupervised clustering process or supplied through theclustered_cell_data_file
file parameter. -
clustering
: cells phenotyped from the expression thresholding process or supplied throughthresholded_cell_data_file
file parameter.
-
-
cell_type_column1
: Name of the column in thecell_file
containing the annotation of the first cell tyte or phenotype. -
cell_type1
: Name of the cell type or phenotype to cluster, must match one of the values of the firstcell_type_column
in thecell_file
. -
cell_file2
: File to read the cell annotations for the second cell type or phenotype from; it must be one of:-
identification
: annotated cells from the cell masking process or supplied through theannotated_cell_data_file
file parameter. -
thresholding
: cells phenotyped from the unsupervised clustering process or supplied through theclustered_cell_data_file
file parameter. -
clustering
: cells phenotyped from the expression thresholding process or supplied throughthresholded_cell_data_file
file parameter.
-
-
cell_type_column2
: Name of the column in thecell_file
containing the annotation of the second cell tyte or phenotype. -
cell_type2
: Name of the cell type or phenotype to cluster, must match one of the values of the secondcell_type_column
in thecell_file
.
-
-
cell_file
s:
One for each of the values of thecell_file
column of theheterotypic_interactions_metadata
file. It must have these columns:-
cell_type_column
: Name of the column in thecell_file
containing the annotation of the cell tyte or phenotype to cluster. -
Metadata_sample_name
: Sample name matching a value in thesample_metadata_file
file. -
CellName
: unique identifier for each cell. -
Location_Center_X
: X coordinate of the cell centroid in the image. -
Location_Center_Y
: Y coordinate of the cell centroid in the image.
-
Outputs:
The output of this process is saved at: $output_folder/Heterotypic_interactions/
- Files for individual combinations cell type or phenotype are stored at:
$output_folder/Heterotypic_interactions/cell_type1-cell_type2/cell_type1-cell_type2-distances.csv
- A total file collecting the annotations for all
cell_types
are stored at:$output_folder/Heterotypic_interactions/heterotypic_interactions.csv
These files contain the following columns:
-
Metadata_sample_name
: Sample name matching a value in thesample_metadata_file
file. -
CellName1
: Unique identifier for each cell. -
Location_Center_X1
: X coordinate of the cell centroid in the image. -
Location_Center_Y1
: Y coordinate of the cell centroid in the image. -
spatial_analysis_cell_type1
: Cell type or phenotype ofCellName1
. -
CellName2
: Unique identifier for each cell. -
Location_Center_X2
: X coordinate of the cell centroid in the image. -
Location_Center_Y2
: Y coordinate of the cell centroid in the image. -
spatial_analysis_cell_type2
: Cell type or phenotype ofCellName2
. -
distance
= Euclidean distance distance between:CellName1
(Location_Center_X1
,Location_Center_Y1
)CellName2
(Location_Center_X2
,Location_Center_Y2
). The distance is measured in pixel.
This process can be skipped by setting the skip_heterotypic_interactions
parameter to true
.
This process allows to plot the results of the heterotypic distance analysis by the heterotypic spatial analysis process. The input annotated cell-cell distance data for this process can be derived from:
- Distance data generated in the heterotypic spatial analysis process.
- Distance data supplied with the
heterotypic_interactions_file
parameter if the heterotypic spatial analysis process is skipped.
Inputs and parameters:
-
sample_metadata_file
with the metadata of all samples used in the analysis. If the value of thecomparison
column for the sample is"NA"
, then no plotting is performed for this sample. -
heterotypic_interactions_metadata
metadata file with the following columns:-
cell_file1
: File to read the cell annotations for the first cell type or phenotype from; it must be one of:-
identification
: annotated cells from the cell masking process or supplied through theannotated_cell_data_file
file parameter. -
thresholding
: cells phenotyped from the unsupervised clustering process or supplied through theclustered_cell_data_file
file parameter. -
clustering
: cells phenotyped from the expression thresholding process or supplied throughthresholded_cell_data_file
file parameter.
-
-
cell_type_column1
: Name of the column in thecell_file
containing the annotation of the first cell tyte or phenotype. -
cell_type1
: Name of the cell type or phenotype to cluster, must match one of the values of the firstcell_type_column
in thecell_file
. -
cell_file2
: File to read the cell annotations for the second cell type or phenotype from; it must be one of:-
identification
: annotated cells from the cell masking process or supplied through theannotated_cell_data_file
file parameter. -
thresholding
: cells phenotyped from the unsupervised clustering process or supplied through theclustered_cell_data_file
file parameter. -
clustering
: cells phenotyped from the expression thresholding process or supplied throughthresholded_cell_data_file
file parameter.
-
-
cell_type_column2
: Name of the column in thecell_file
containing the annotation of the second cell tyte or phenotype. -
cell_type2
: Name of the cell type or phenotype to cluster, must match one of the values of the secondcell_type_column
in thecell_file
.
-
-
heterotypic_interactions_file
file following columns:-
Metadata_sample_name
: Sample name matching a value in thesample_metadata_file
file. -
CellName1
: Unique identifier for each cell. -
Location_Center_X1
: X coordinate of the cell centroid in the image. -
Location_Center_Y1
: Y coordinate of the cell centroid in the image. -
spatial_analysis_cell_type1
: Cell type or phenotype ofCellName1
. -
CellName2
: Unique identifier for each cell. -
Location_Center_X2
: X coordinate of the cell centroid in the image. -
Location_Center_Y2
: Y coordinate of the cell centroid in the image. -
spatial_analysis_cell_type2
: Cell type or phenotype ofCellName2
. -
distance
= Euclidean distance distance between:CellName1
(Location_Center_X1
,Location_Center_Y1
)CellName2
(Location_Center_X2
,Location_Center_Y2
). The distance is measured in pixel.
-
Outputs:
The outputs of this process are stored at: $output_older/Plots/Heterotypic_Interaction_Plots/Distance
.
For each spatial_analysis_cell_type1
-spatial_analysis_cell_type2
pair there is a folder $test_output/Plots/Heterotypic_Interaction_Plots/Distance/spatial_analysis_cell_type1-spatial_analysis_cell_type2/
with the following plots:
-
spatial_analysis_cell_type1-spatial_analysis_cell_type2-all-heterotypic.pdf
: density plot with all the cells. -
spatial_analysis_cell_type1-spatial_analysis_cell_type2-by_category-heterotypic.pdf
(optional): density plot with the cells divided by sample category. If the comparison metadata column of thesample_metadata_file
has at least 2 (non"NA"
) categories.
This process can be skipped by setting the skip_heterotypic_visualization
parameter to true
.
This process generates a random distribution of the minimum distances between cells of the populations or phenotypes selected by the user. The distribution is generated by randomly reshuffling the labels of each cell. The input annotated cell data for this process can be derived from:
- Distance data generated in the heterotypic spatial analysis process.
- Distance data supplied with the
heterotypic_interactions_file
parameter if the heterotypic spatial analysis process is skipped.
Inputs and parameters:
-
heterotypic_interactions_metadata
metadata file with the following columns:-
cell_file1
: File to read the cell annotations for the first cell type or phenotype from; it must be one of:-
identification
: annotated cells from the cell masking process or supplied through theannotated_cell_data_file
file parameter. -
thresholding
: cells phenotyped from the unsupervised clustering process or supplied through theclustered_cell_data_file
file parameter. -
clustering
: cells phenotyped from the expression thresholding process or supplied throughthresholded_cell_data_file
file parameter.
-
-
cell_type_column1
: Name of the column in thecell_file
containing the annotation of the first cell tyte or phenotype. -
cell_type1
: Name of the cell type or phenotype to cluster, must match one of the values of the firstcell_type_column
in thecell_file
. -
cell_file2
: File to read the cell annotations for the second cell type or phenotype from; it must be one of:-
identification
: annotated cells from the cell masking process or supplied through theannotated_cell_data_file
file parameter. -
thresholding
: cells phenotyped from the unsupervised clustering process or supplied through theclustered_cell_data_file
file parameter. -
clustering
: cells phenotyped from the expression thresholding process or supplied throughthresholded_cell_data_file
file parameter.
-
-
cell_type_column2
: Name of the column in thecell_file
containing the annotation of the second cell tyte or phenotype. -
cell_type2
: Name of the cell type or phenotype to cluster, must match one of the values of the secondcell_type_column
in thecell_file
.
-
-
heterotypic_interactions_file
-
Metadata_sample_name
: Sample name matching a value in thesample_metadata_file
file. -
CellName1
: Unique identifier for each cell. -
Location_Center_X1
: X coordinate of the cell centroid in the image. -
Location_Center_Y1
: Y coordinate of the cell centroid in the image. -
spatial_analysis_cell_type1
: Cell type or phenotype ofCellName1
. -
CellName2
: Unique identifier for each cell. -
Location_Center_X2
: X coordinate of the cell centroid in the image. -
Location_Center_Y2
: Y coordinate of the cell centroid in the image. -
spatial_analysis_cell_type2
: Cell type or phenotype ofCellName2
. -
distance
= Euclidean distance distance between:CellName1
(Location_Center_X1
,Location_Center_Y1
)CellName2
(Location_Center_X2
,Location_Center_Y2
). The distance is measured in pixel.
-
-
permutations
= Number of permutation to perform (values > 10000 are recommended)
Outputs:
The output of this process is saved at: $output_folder/Heterotypic_interactions/
A total file collecting the annotations for all cell_types
is stored at: $output_folder/Heterotypic_interactions/permuted_interactions.csv
This file contain the following columns:
- permutation = Current round of permutation
-
Metadata_sample_name
: Sample name matching a value in thesample_metadata_file
file. -
CellName1
: Unique identifier for each cell. -
Location_Center_X1
: X coordinate of the cell centroid in the image. -
Location_Center_Y1
: Y coordinate of the cell centroid in the image. -
spatial_analysis_cell_type1
: Cell type or phenotype ofCellName1
. -
CellName2
: Unique identifier for each cell. -
Location_Center_X2
: X coordinate of the cell centroid in the image. -
Location_Center_Y2
: Y coordinate of the cell centroid in the image. -
spatial_analysis_cell_type2
: Cell type or phenotype ofCellName2
. -
distance
= Euclidean distance distance between:CellName1
(Location_Center_X1
,Location_Center_Y1
)CellName2
(Location_Center_X2
,Location_Center_Y2
). The distance is measured in pixel.
This process can be skipped by setting the skip_permuted_interactions
parameter to true
.
This process allows to plot the results of the heterotypic distance analysis permutation test by the heterotypic analysis permutation test process. The input annotated cell-cell distance data for this process can be derived from:
- Distance data generated in the heterotypic spatial analysis process.
- Distance data supplied with the
heterotypic_interactions_file
parameter if the heterotypic spatial analysis process is skipped. The input annotated cell data for this process can be derived from: - Permuted distance data generated in the heterotypic analysis permutation test process.
- Permuted distance data supplied with the
shuffled_interactions_file
parameter if the heterotypic analysis permutation test process is skipped.
Inputs and parameters:
-
sample_metadata_file
with the metadata of all samples used in the analysis. If the value of thecomparison
column for the sample is"NA"
, then no plotting is performed for this sample. -
heterotypic_interactions_metadata
metadata file with the following columns:-
cell_file1
: File to read the cell annotations for the first cell type or phenotype from; it must be one of:-
identification
: annotated cells from the cell masking process or supplied through theannotated_cell_data_file
file parameter. -
thresholding
: cells phenotyped from the unsupervised clustering process or supplied through theclustered_cell_data_file
file parameter. -
clustering
: cells phenotyped from the expression thresholding process or supplied throughthresholded_cell_data_file
file parameter.
-
-
cell_type_column1
: Name of the column in thecell_file
containing the annotation of the first cell tyte or phenotype. -
cell_type1
: Name of the cell type or phenotype to cluster, must match one of the values of the firstcell_type_column
in thecell_file
. -
cell_file2
: File to read the cell annotations for the second cell type or phenotype from; it must be one of:-
identification
: annotated cells from the cell masking process or supplied through theannotated_cell_data_file
file parameter. -
thresholding
: cells phenotyped from the unsupervised clustering process or supplied through theclustered_cell_data_file
file parameter. -
clustering
: cells phenotyped from the expression thresholding process or supplied throughthresholded_cell_data_file
file parameter.
-
-
cell_type_column2
: Name of the column in thecell_file
containing the annotation of the second cell tyte or phenotype. -
cell_type2
: Name of the cell type or phenotype to cluster, must match one of the values of the secondcell_type_column
in thecell_file
.
-
-
heterotypic_interactions_file
file following columns:-
Metadata_sample_name
: Sample name matching a value in thesample_metadata_file
file. -
CellName1
: Unique identifier for each cell. -
Location_Center_X1
: X coordinate of the cell centroid in the image. -
Location_Center_Y1
: Y coordinate of the cell centroid in the image. -
spatial_analysis_cell_type1
: Cell type or phenotype ofCellName1
. -
CellName2
: Unique identifier for each cell. -
Location_Center_X2
: X coordinate of the cell centroid in the image. -
Location_Center_Y2
: Y coordinate of the cell centroid in the image. -
spatial_analysis_cell_type2
: Cell type or phenotype ofCellName2
. -
distance
= Euclidean distance distance between:CellName1
(Location_Center_X1
,Location_Center_Y1
)CellName2
(Location_Center_X2
,Location_Center_Y2
). The distance is measured in pixel.
-
-
shuffled_interactions_file
file following columns:- permutation = Current round of permutation
-
Metadata_sample_name
: Sample name matching a value in thesample_metadata_file
file. -
CellName1
: Unique identifier for each cell. -
Location_Center_X1
: X coordinate of the cell centroid in the image. -
Location_Center_Y1
: Y coordinate of the cell centroid in the image. -
spatial_analysis_cell_type1
: Cell type or phenotype ofCellName1
. -
CellName2
: Unique identifier for each cell. -
Location_Center_X2
: X coordinate of the cell centroid in the image. -
Location_Center_Y2
: Y coordinate of the cell centroid in the image. -
spatial_analysis_cell_type2
: Cell type or phenotype ofCellName2
. -
distance
= Euclidean distance distance between:CellName1
(Location_Center_X1
,Location_Center_Y1
)CellName2
(Location_Center_X2
,Location_Center_Y2
). The distance is measured in pixel.
Outputs:
The outputs of this process are stored at: $output_older/Plots/Heterotypic_Interaction_Plots/Permutations
.
For each spatial_analysis_cell_type1
-spatial_analysis_cell_type2
pair there is a folder $test_output/Plots/Heterotypic_Interaction_Plots/Permutations/spatial_analysis_cell_type1-spatial_analysis_cell_type2/
with the following plots:
-
spatial_analysis_cell_type1-spatial_analysis_cell_type2-all-heterotypic_permutations.pdf
: density plot with the expected distribution of the minimum distances between the two cell types in all samples (nonNA
in thesample_metadata_file
). -
spatial_analysis_cell_type1-spatial_analysis_cell_type2-category-heterotypic_permutations.pdf
(optional): density plot with the expected distribution of the minimum distances between the two cell types in all samples (nonNA
in thesample_metadata_file
) divided by sample category. If the comparison metadata column of thesample_metadata_file
has at least 2 (non"NA"
) categories. -
spatial_analysis_cell_type1-spatial_analysis_cell_type2-category-heterotypic_permutations.pdf
(optional): ddensity plot with the expected distribution of the minimum distances between the two cell types in all samples (nonNA
in thesample_metadata_file
) in the first category minus the second category. If the comparison metadata column of thesample_metadata_file
has exactly least 2 (non"NA"
) categories.
The FDR is calculated with the correction across all spatial_analysis_cell_type1
spatial_analysis_cell_type2
combinations for each set of plots.
This process can be skipped by setting the skip_permuted_visualization
parameter to true
.