Skip to content

Latest commit

 

History

History
 
 

4.interpret_model

4. Interpret Model

In this module, we interpret the ML models.

After training the final and baseline models in 2.train_model, we load the coefficents of these models from models/. These coefficients are interpreted with the following diagrams:

  • We use seaborn.heatmap to display the coefficient values for each phenotypic class/feature.
  • We use seaborn.clustermap to display a hierarchically-clustered heatmap of coefficient values for each phenotypic class/feature
  • We use seaborn.kedeplot to display a density plot of coeffiecient values for each phenotypic class.
  • We use seaborn.barplot to display a bar plot of average coeffiecient values per phenotypic class.

In model_coefficient_correlations.ipynb, we compare the coefficients from the mutli-class and single-class models. The coefficients matrix from multi-class models are of shape (# phenotypic classes, # features), while the coefficients from single-class models are of shape (1, # features). Thus, we are able to compare the coefficient vectors for each phenotypic class per model.

We graph these coefficient vectors in a scatterplot where the coordinate pairs represent (mutli-class model coefficient value, single-class model coefficient value) for a particular feature. For each of the coefficient vectors for the multi-class and single-class mdoels, we derive the Pearson correlation coefficient with numpy.coercoef to get an idea of how correlated these vectors are. We also derive the Clustermatch Correlation Coefficient (CCC) introduced in Pividori et al, 2022. This is a not-only-linear coefficient based on machine learning models and gives an idea of how correlated the feature coefficients are (where 0 is no relationship and 1 is a perfect relationship). The correlations for each pair of coefficient vectors are displayed above their scatterplots.

Results

Each model's interpretations are located in interpret_model_coefficients.ipynb.

Notes:

  1. Intermediate .tsv data are stored in tidy format, a standardized data structure (see Tidy Data by Hadley Wickham for more details).
  2. SCM stands for "single cell model(s)" and is used as an abbrevation for the binary, sinlge-class models throughout this module.

Step 1: Interpret Model

Use the commands below to interpret the ML models:

# Make sure you are located in 4.interpret_model
cd 4.interpret_model

# Activate phenotypic_profiling conda environment
conda activate phenotypic_profiling

# Interpret model
bash interpret_model.sh