This repository contains all the codes for the classsification of MGUS and MM and identification of pivotal biomarkers that helps to distinguish the MGUS and MM using AI-based workflow.
|----- BDL_SP_Model_results
|----- Supplementory_File_1_Significant_Genes.xlsx
|----- Supplementory_File_2_significant_pathways_MM_MGUS.xlsx
|----- Supplementory_File_3_SHAP_Analysis_Beeswarm_plot.xlsx
|----- Supplementory_File_4_combinedSHAPRanking.xlsx
|----- Supplementory_File_5_Sanky_Diagrams.docx
|----- Supplementory_File_6_Graph_Convolutional_Network.docx
|----- Supplementory_File_7_Pseudo_codes_best_shap_score_estimation.docx
|----- LICENSE
|----- figures
|----- bdl-sp-architecture_v6.jpg
|----- src
|----- bdl-sp-top-feature-extraction.py
|----- Notebooks
|----- BDL_SP_SHAP_Analysis.ipynb
|----- samplewise_shap_analysis.ipynb
|----- shap_individual_feature_plot.ipynb
|----- README.md
|----- requirements.txt
Presently, ML codes are tested only for the Linux OS.
System Requirements:
• 64bit, 8.00 GB RAM
• OS version used for this pipeline: Ubuntu 18.04.
All the prequisites are mentioned in requirements.txt
- Started with BAM files from WES data.
- Generated vcf filef from 4 variant callers softwares i.e. MuSE, Mutect2, Somatic-Sniper and Varscan2.
- Annotation of above vcf files are done using the software ANNOVAR.
- Identification of significantly mutated genes (SNV's) from the above vcf file using software dndscv (.csv).
For model training, you need to follow the following steps in order to train the model.
• Get the annotated vcf files and significantly mutated genes for MGUS and MM.
• Run bdl-sp-top-feature-extraction.py
model and train the cost-sensitive BDL-SP model using 5-fold cross validation.
• Once you have the trained BDL-SP model, open BDL_SP_SHAP_Analysis.ipynb
, samplewise_shap_analysis.ipynb
, shap_individual_feature_plot.ipynb
for group-level and sample-level post-hoc model explainability using SHAP algorithm.
- If you use BDL-SP for your research, please cite the following paper:
Ruhela, V., Jena, L., Kaur, G., Gupta, R. and Gupta, A., 2023. BDL-SP: A Bio-inspired DL model for the identification of altered Signaling Pathways in Multiple Myeloma using WES data. American Journal of Cancer Research, 13(4), p.1155.
See the LICENSE file for license rights and limitations (Apache2.0).
-
Authors would like to gratefully acknowledge the grant from Department of Biotechnology, Govt. of India [Grant: BT/MED/30/SP11006/2015] and Department of Science and Technology, Govt. of India [Grant: DST/ICPS/CPS-Individual/2018/279(G)].
-
Authors would like to gratefully acknowledge the support of SBILab, Deptt. of ECE & Centre of Excellence in Healthcare, Indraprastha Institute of Information Technology-Delhi (IIIT-D), India for providing guidance in tool methology and development.
-
Authors would like to gratefully acknowledge the support of Computational Biology Dept., Indraprastha Institute of Information Technology-Delhi (IIIT-D), India for providing resources for tool development.