This repository contains code related to the pub "The known protein universe is phylogenetically biased".
This repository uses conda to manage software environments and installations. You can find operating system-specific instructions for installing miniconda here. After installing conda and mamba, run the following command to create the pipeline run environment.
mamba env create -n protein_universe --file envs/dev.yml
conda activate protein_universe
The repository is organized into the following top-level directories.
- code: R scripts used for downloading and cleaning data, performing analysis, and generating figures presented in the pub.
- data: .RDS files used in analyses.
- envs: YAML file including the packages and dependencies used for creating the conda environment.
─ code
├── README.md
├── protein-universe-analysis.R
├── protein-universe-data.R
└── protein-universe-utils.R
─ data
├── README.md
├── afdb_cluster_stats.RDS
├── afdb_cluster_taxonomy.RDS
├── afdb_genome_size_stats.RDS
├── pdb_metadata.RDS
├── pdb_taxonomy.RDS
├── timetree_phylogeny_cleaned.RDS
└── timetree_taxonomy.RDS
─ envs
├── dev.yml
└── install_r_packages.R
- Download, clean, and organize data using
protein-universe-data.R
.- Load supporting functions using
protein-universe-utils.R
- Run analyses using
protein-universe-analysis.R
All analyses were done on an Apple MacBook Pro running macOS Montery with 32GB RAM, 10 cores, and 1TB of storage.
See how we recognize feedback and contributions to our code.