GitHub - Frank-Gu-Lab/infrno: A framework for modelling the interactions that result between large molecule systems to inform materials design.

🔥 INFRNO: Interpretable framework for uncovering interaction opportunities in macromolecules

Samantha Stuart, Jeffrey Watchorn, Frank Gu

Institute of Biomedical Engineering, University of Toronto, Toronto, Ontario, Canada

Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, Ontario, Canada

This formal analysis repository accompanies the work: An Interpretable Machine Learning Framework for Modelling Macromolecular Interaction Mechanisms with Nuclear Magnetic Resonance.

In this work, we developed a framework for modelling the interactions that result between large molecule systems to inform biomaterial design. In addition to modelling structure-activity, the framework identifies "undervalued" ligand sites as engineering design opportunities to unlock receptor interaction. The input data and feature descriptors are obtained from experimental screening with DISCO-NMR. Any receptor-ligand interaction dataset generated from DISCO-NMR screening 🕺 can be analyzed equivalently with INFRNO 🔥.

Using INFRNO, we can:

Model Atomic-Level Macromolecular Interaction Trends: We apply linear principal component analysis to DISCO NMR data descriptors and labels, and train a binary decision tree classifier to construct proton structure-interaction trends across ligand chemical species.
Identify Opportunities for Designed Interaction: Inert-labeled protons bordering cross-species decision regions indicate opportunities for physical property tuning towards interaction without additional chemical functionalization.
Create a runway to interaction prediction: The decision tree for a given receptor can be re-trained to "grow" as increasingly diverse ligands are screened, while informing ligand design with data-driven insights along the way.

Quick Start on Google Colab:

To get quick intuition for the framework we provide a tutorial in Google Colab which can be run without any local environment setup.

The input dataset to upload to the Colab notebook can be downloaded from this repository in: data/raw/proton_binding_dataset.xlsx

Project Organization

├── LICENSE
├── README.md          <- The top-level README for this project.
├── data
│   ├── processed      <- The benchmarking result files output from scripts
│   └── raw            <- The training dataset
│
├── notebooks          <- Notebooks and scripts for formal analysis
│   ├── benchmark_CDEpipe.py         <- Benchmarking script for cumulative  
│   │                                   disco effect pipeline
│   ├── benchmark_maxsteadyslope.py  <- Benchmarking script for curve attribute 
│   │                                   pipeline
│   ├── benchmark_meandiscoeff.py    <- Benchmarking script for mean disco effect 
│   │                                   pipeline
│   ├── benchmark_chemonly.py        <- Benchmarking script for pipeline without 
│   │                                   disco effect
│   ├── benchmarking_analysis.ipynb  <- Global pipeline benchmarking analysis (SI)
│   ├── final_model_paper_CDE_rs148.ipynb  <- Formal analysis and figure generation
│   └── utils                        <- Utility functions
│       └── feature_generation.py    <- DISCO NMR feature generation script
│
├── figures           
│   ├── main           <- Main formal analysis figures
│   ├── misc           <- Misc. figure files
│   └── supplementary  <- SI figures
│
└── requirements.txt   <- The requirements for the analysis environment

Setup to run the code locally:

1. Clone or download this GitHub repository:

Do one of the following:

Clone this repository to a directory of your choice on your computer using the command line or GitHub Desktop.
Download the ZIP file of archive of the repository, move and extract it in the directory of your choice on your computer.

2. Install dependencies using Anaconda or Pip

Instructions for installing dependencies via Anaconda:

Download and install Anaconda
Navigate to the project directory
Open Anaconda prompt in this directory (or Terminal)
Run the following commend from Anaconda prompt (or Terminal) to automatically create an environment from the requirements.txt file: $ conda create --name infrno --file requirements.txt
Run the following command to activate the environment: conda activate infrno
You are now ready to open and run files in the repository in a code editor of your choice that runs your virtual environment (ex: VSCode)

For detailed information about creating, managing, and working with Conda environments, please see the corresponding help page.

Instructions for installing dependencies with pip

If you prefer to manage your packages using pip, navigate in Terminal to the project directory and run the command below to install the preqrequisite packages into your virtual environment:

$ pip install -r requirements.txt

With either install option, you may need to create an additional Jupyter Notebook kernel containing your virtual environment, if it does not automatically appear. See this guide for more information.

3. Run the model

Navigate to the notebook notebooks/final_model_paper_CDE_rs148.ipynb
Execute all cells sequentially

4. Run the benchmarking

Execute each benchmarking script in notebooks
- benchmark_CDEpipe.py
- benchmark_chemonly.py
- benchmark_maxsteadyslope.py
- benchmark_meandiscoeff.py
Open notebooks/benchmarking_analysis.ipynb
Execute all cells sequentially to compare pipelines

Re-using repository with new dataset

Replace training dataset in data/raw with any new DISCO NMR screening results named proton_binding_dataset.xlsx. The name proton_binding_dataset must be preserved to maintain compatibility with all file read operations in this repository
Open notebooks/final_model_paper_CDE_rs148.ipynb, re-run all cells for file reading, feature generation, and model generation until updated tree figures are generated and displayed in the console
- adjustment of hyperparameter grid and random seed may be required for new datasets to yield the best tree
To interpret the resulting tree decisions, customize the provided exemplary figure generation cells, and proton average properties, according to updated high importance principal components and decision rules
Where cross-polymer decision rules result, examine identities of inert protons near the interactive border as "hypotheses" for physical property tuning towards achieving interaction
If desired, execute benchmark_CDEpipe.py script to evaluate the out of sample error of the updated model for the updated dataset
- Note that if the hyperparameter grid and random seeds have been altered, the benchmarking script should be equivalently adjusted to reflect updates
- The majority classifier baseline F1 score should also be updated in accordance with new datasets

How to cite

@article{TBD,
  title={An Interpretable Machine Learning Framework for Modelling Macromolecular Interaction Mechanisms with Nuclear Magnetic Resonance},
  author={Stuart, Samantha and Watchorn, Jeffrey and Gu, Frank},
  journal={TBD},
  year={2022},
  publisher={TBD}
}

License

MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔥 INFRNO: Interpretable framework for uncovering interaction opportunities in macromolecules

Samantha Stuart, Jeffrey Watchorn, Frank Gu

Institute of Biomedical Engineering, University of Toronto, Toronto, Ontario, Canada

Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, Ontario, Canada

Quick Start on Google Colab:

Project Organization

Setup to run the code locally:

1. Clone or download this GitHub repository:

2. Install dependencies using Anaconda or Pip

Instructions for installing dependencies via Anaconda:

Instructions for installing dependencies with pip

3. Run the model

4. Run the benchmarking

Re-using repository with new dataset

How to cite

License

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
data		data
figures		figures
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

Frank-Gu-Lab/infrno

Folders and files

Latest commit

History

Repository files navigation

🔥 INFRNO: Interpretable framework for uncovering interaction opportunities in macromolecules

Samantha Stuart, Jeffrey Watchorn, Frank Gu

Institute of Biomedical Engineering, University of Toronto, Toronto, Ontario, Canada

Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, Ontario, Canada

Quick Start on Google Colab:

Project Organization

Setup to run the code locally:

1. Clone or download this GitHub repository:

2. Install dependencies using Anaconda or Pip

Instructions for installing dependencies via Anaconda:

Instructions for installing dependencies with pip

3. Run the model

4. Run the benchmarking

Re-using repository with new dataset

How to cite

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages