Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rules to characterize peptide sequences #7

Merged
merged 11 commits into from
Feb 22, 2024
Merged

Conversation

taylorreiter
Copy link
Member

PR checklist

  • Describe the changes you've made.
  • Describe any tests you have conducted to confirm that your changes behave as expected.
  • If you've added new software dependencies, make sure that those dependencies are included in the appropriate conda environments.
  • If you've added new functionality, make sure that the documentation is updated accordingly.
  • If you encountered bugs or features that you won't address, but should be addressed eventually, create new issues for them.

PR Description

This PR adds two rules (and one python script) to annotate features of the predicted peptides. DeepSig is the tool we used over in Arcadia-Science/protein-data-curation to predict the presence of a signal peptide. The output looks like this:

Transcript_0.p1_CLASS_I_LANTIPEPTIDE_134_180_nlpprecursor       DeepSig Chain   1       46      .       .       .       evidence=ECO:0000256
Transcript_1.p1_NONRIPP_84_120_nlpprecursor     DeepSig Chain   1       36      .       .       .       evidence=ECO:0000256
Transcript_1.p6_NONRIPP_62_91_nlpprecursor      DeepSig Chain   1       29      .       .       .       evidence=ECO:0000256
Transcript_10001.p1_NONRIPP_39_102_nlpprecursor DeepSig Chain   1       63      .       .       .       evidence=ECO:0000256

The peptides library has functions to calculate characteristics of peptides. I included all of the ones that are somewhat apparent by name (many of the ones that I didn't include are dimensionality reduction measures of >100 peptide properties, or per-residue scores). The output looks like this:

id      aliphatic_index boman_index     charge  hydrophobicity  instability_index       isoelectric_point       molecular_weight        pd1_residue_volume      pd2_hydrophilicity      z1_lipophilicity        z2_steric_properties    z3_electronic_properties        z4_electronegativity_etc        z5_electronegativity_etc
Transcript_0.p1_CLASS_I_LANTIPEPTIDE_134_180_nlpprecursor       16.956521739130434      3.0858695652173918      -0.8241857322925498     -1.3239130434782609     55.38913043478261       7.08135670889169        5349.88624      -0.39369565217391306    0.4402173913043477      1.203695652173913       -0.006086956521739143   0.427391304347826       0.05043478260869566     -0.15586956521739131
Transcript_1.p1_NONRIPP_84_120_nlpprecursor     59.72222222222222       2.5302777777777776      3.5103192807650974      -0.9888888888888889     72.49722222222222       9.372886321507394       4318.929939999999       0.017499999999999974    0.19916666666666663     0.5744444444444444      0.23027777777777775     -0.09916666666666668    0.6338888888888888      0.12305555555555557
Transcript_1.p6_NONRIPP_62_91_nlpprecursor      80.6896551724138        0.9093103448275861      -1.9721173743045763     0.2482758620689655      51.213793103448275      4.366027359385043       3194.58084      -0.5679310344827586     -0.19862068965517238    -0.09034482758620677    -0.6006896551724137     0.17862068965517242     -0.5027586206896553     0.1024137931034483

Tests

This PR runs successfully on the demo data. I confirmed that DeepSig sees the GPU in the conda env i created.

documentation

punt

@taylorreiter taylorreiter marked this pull request as draft February 20, 2024 19:40
@taylorreiter taylorreiter marked this pull request as ready for review February 21, 2024 17:46
Copy link
Member

@keithchev keithchev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! A few minor comments.

envs/deepsig.yml Show resolved Hide resolved
Snakefile Show resolved Hide resolved
scripts/characterize_peptides.py Show resolved Hide resolved
scripts/characterize_peptides.py Outdated Show resolved Hide resolved
@taylorreiter taylorreiter merged commit e1502d2 into main Feb 22, 2024
2 checks passed
@taylorreiter taylorreiter deleted the ter/peptide-annotate branch February 22, 2024 13:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants