Skip to content

Evaluating peptide predictions made by the peptigate pipeline using orthogonal data

License

Notifications You must be signed in to change notification settings

Arcadia-Science/2024-peptigate-evaluation

Repository files navigation

Evaluating the results of the peptigate peptide prediction pipeline

run with conda

Purpose

This repository assesses the accuracy of the peptigate pipeline by comparing peptide predictions from the human transcriptomes against orthogonal data sets (ribosome profiling, peptide databases, and peptidomics mass spectrometry).

For more information, see the pub, "Predicting bioactive peptides from transcriptome assemblies with the peptigate workflow.".

Installation and Setup

This repository uses conda to manage software environments and installations. You can find operating system-specific instructions for installing miniconda here. After installing conda and mamba, run the following command to create the pipeline run environment.

mamba env create -n pepeval --file envs/dev.yml
conda activate pepeval

The arcadiathemeR R package isn't available to install via conda. After activating the conda environment, use the following Rscript to install it.

Rscript scripts/install_arcadiathemer.R

The notebooks can also be run using the same environment.

Overview

This reposity assess whether the peptigate pipeline predicts real peptides from the human transcriptome assembly. It does this by comparing the peptigate peptide predictions against four orthogonal data sources: ribosome profiling, peptide databases, bona fide long non-coding RNAs, and strength of translation initiation sequences (Kozak sequences). See the README and notebook in each sub-folder for a description of the analysis and results of each comparison. Note that each notebook name is prepended with its creation date.

Description of the folder structure

  • LICENSE: specifies terms for re-use of the code in this repo.
  • README.md: describes the contents of this repo and how to interact with it.
  • envs/: documents conda software environments used for analyses in this repo.
  • evaluation/: contains code, notebooks, documentation, and results for comparing the peptigate results against orthogonal data sets.
    • kozak_scores/: compares the strength of Kozak sequences (translation initiation sequences) in peptigate-predicted peptides against TransDecoder-predicted open reading frames in the human transcriptome.
    • noncoding_rnas/: tests whether peptigate predicted peptides from any bona fide long non-coding RNAs.
    • peptipedia/: compares the peptigate peptide predictions against Peptipedia, a large database of bioactive peptide sequences.
    • riborf/: compares the human transcriptome sORF-encoded peptides predicted by peptigate against open reading frames predicted by the tool ribORF from over 600 human ribosomal profiling data sets.
  • peptigate/: contains documentation of how we ran peptigate on the human RefSeq transcriptome as well as results files output by peptigate.
  • .github/, .vscode/, .gitignore, .pre-commit-config.yml, Makefile, pyproject.toml: Control the developer behavior of the repository.

Data

This repository predicts peptides in the human RefSeq transcriptome. All peptide predictions (the results of running peptigate) are in the peptigate results folder. Download instructions for other auxiliary files required to reproduce the results in this repository are located in analysis-specific READMEs.

Compute Specifications

  • Platform: x86_64-apple-darwin13.4.0 (64-bit)
  • Running under: macOS Big Sur ... 10.16
  • Ram: 64 GB

Contributing

See how we recognize feedback and contributions to our code.

About

Evaluating peptide predictions made by the peptigate pipeline using orthogonal data

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages