2024-protein-universe

Purpose

This repository contains code related to the pub "The known protein universe is phylogenetically biased".

Installation and Setup

This repository uses conda to manage software environments and installations. You can find operating system-specific instructions for installing miniconda here. After installing conda and mamba, run the following command to create the pipeline run environment.

mamba env create -n protein_universe --file envs/dev.yml
conda activate protein_universe

Data

Overview

Description of the folder structure

The repository is organized into the following top-level directories.

code: R scripts used for downloading and cleaning data, performing analysis, and generating figures presented in the pub.
data: .RDS files used in analyses.
envs: YAML file including the packages and dependencies used for creating the conda environment.

─ code
  ├── README.md
  ├── protein-universe-analysis.R
  ├── protein-universe-data.R
  └── protein-universe-utils.R
─ data
  ├── README.md
  ├── afdb_cluster_stats.RDS
  ├── afdb_cluster_taxonomy.RDS
  ├── afdb_genome_size_stats.RDS
  ├── pdb_metadata.RDS
  ├── pdb_taxonomy.RDS
  ├── timetree_phylogeny_cleaned.RDS
  └── timetree_taxonomy.RDS
─ envs
  ├── dev.yml
  └── install_r_packages.R

Methods

Download, clean, and organize data using protein-universe-data.R.

Load supporting functions using protein-universe-utils.R

Run analyses using protein-universe-analysis.R

Compute Specifications

All analyses were done on an Apple MacBook Pro running macOS Montery with 32GB RAM, 10 cores, and 1TB of storage.

Contributing

See how we recognize feedback and contributions to our code.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.github		.github
code		code
data		data
envs		envs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

2024-protein-universe

Purpose

Installation and Setup

Data

Overview

Description of the folder structure

Methods

Compute Specifications

Contributing

About

Releases 1

Packages

Contributors 2

Languages

License

Arcadia-Science/2024-protein-universe

Folders and files

Latest commit

History

Repository files navigation

2024-protein-universe

Purpose

Installation and Setup

Data

Overview

Description of the folder structure

Methods

Compute Specifications

Contributing

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages