Skip to content

dtborders/JHU_ProjectX_2020

Repository files navigation

Data-Driven Bacteriophage-Host Association

Abstract

Climate change is contributing to a sizable increase in antimicrobial resistant bacteria, fungi, viruses, and parasites. Bacteriophages are an alternative treatment for resistant bacteria, and they have been used to treat infection since the early 1900s. However, there are too many bacteriophages to experimentally determine each ones host. On the other hand, it is cheap and time-effective to sequence the phages. Here, we present a new data-set for creating computational algorithms to match phage to host. This new data-set contains 4,827 phage-host interaction pairs with complete genomes, gene annotations, and protein sequences for the phages and hosts. We provide a review of historical algorithms that have shown success and strategies for developing new algorithms. Using features we extract and random forest algorithms, we achieve a 94% test accuracy for predicting whether a phage infects a given host with a random forest classifier.

Running the code

Run the notebook jupyter_notebooks/Processing.ipynb with the data in the directory "newprotein" at this link: https://drive.google.com/drive/folders/16ofXFoms7HcS5vhn4yjexLRz_zRzJ2US?usp=sharing You will need to unzip the 4 zip files in this directory and set the corresponding directories in the notebook cells to the file path to which you extract them.

About

JHU 2020 project x team

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •