This repository contains the code and data for the following work: Yifeng Tao, Haoyun Lei, Adrian V. Lee, Jian Ma, and Russell Schwartz. Phylogenies Derived from Matched Transcriptome Reveal the Evolution of Cell Populations and Temporal Order of Perturbed Pathways in Breast Cancer Brain Metastases. Proceedings of the International Symposium on Mathematical and Computational Oncology (ISMCO). 2019.
The code runs on Python 2.7
.
- Common Python packages need to be installed:
os
,random
,numpy
,pandas
,pickle
,scipy
,sklearn
,matplotlib
,seaborn
,cStringIO
,collections
. - These additional Python packages are required in some experiments:
statsmodels
,networkx
,skbio
,Bio
,PyTorch
.
We will introduce the three-step pipeline below.
You can load the preprocessed mapped data (df_modu
) using the following pieces of scripts in Python environment:
from DataProcessor import DataProcessor
data_proc = DataProcessor()
df_modu, len_kegg = data_proc.load_modu_data()
As you can see, we use the DataProcessor
class to conduct the data preprocessing and mapping. The returned df_modu
is a pandas.DataFrame
, where each row is a gene module/cancer pathway, and each column is a sample.
We want to conduct cross-validation to determine the proper number of cell communities/components for deconvolution, and then use the optimal number of components to unmix the bulk data:
python run_nnd.py
where models.NND
is called to perform neural network deconvolution (NND).
The result of cross-validation is available at data/ica/results_cv.pkl
. The unmixed matrices are available at data/ica/BCF.pkl
.
Some cell components are missing in some patients. We can aggregate the different patterns of exiting components in patients:
import pickle
from DataProcessor import DataProcessor
from utils_analysis import component_portion, classify_patients, plot_phylo
# Load preprocessed data
data_proc = DataProcessor()
df_modu, len_kegg = data_proc.load_modu_data()
# Load deconvolved components and fraction matrix
BCF = pickle.load(open( "data/ica/BCF.pkl", "rb" ))
B, C, F = BCF["B"], BCF["C"], BCF["F"]
# Index of the primary component/community
comp_p = component_portion(F, plot_mode=True)
# Aggregate different patterns of components in patients
list_patterns = classify_patients(F, threshold_0=2.5e-2)
Here, the list_patterns
contains four different patterns of phylogenies. In order to visualize a specific pattern, e.g., the first one, and print out the differentially perturbed pathways along edges of this phylogeny:
pattern = list_patterns[0]
plot_phylo(C, F, list(df_modu.index), len_kegg, comp_p, pattern, threshold=0.05)
python run_nnd.py
It plots Fig. 2b, Fig. A1, Fig. A2 in the paper.
python analysis.py
It prints out the following figures or plots tables of the paper in the order of: Fig. 3b, Table 1, Table A1, Fig. 3a, Fig. 3c, Fig. A3, Fig. 3d, Fig. 3e, Table A2-A5.
The repository uses MIT license, so feel free to share or adapt the materials. If you find this work useful, please cite:
@inproceedings{tao2019brm,
title = {Phylogenies Derived from Matched Transcriptome Reveal the Evolution of Cell Populations and Temporal Order of Perturbed Pathways in Breast Cancer Brain Metastases},
author = {Tao, Yifeng and
Lei, Haoyun and
Lee, Adrian V. and
Ma, Jian and
Schwartz, Russell},
booktitle = {Proceedings of the International Symposium on Mathematical and Computational Oncology},
month = {Oct},
year = {2019},
}
You are welcome to reach out to us for any questions.
Contact: Yifeng Tao ([email protected]), Russell Schwartz ([email protected])