ProteinCartography v0.5.0
Overview
This release includes a number of minor improvements and also introduces a new organization for the output directories generated by the pipeline. Because snakemake is a file-based workflow engine, this change unfortunately means that this version of the pipeline is not compatible with previous versions. In other words, it will not be possible to re-run the new version of the pipeline with output directories that were initially generated by prior versions of the pipeline. Instead, it will be necessary to re-run the pipeline from scratch.
New features and improvements
- Reorganize the directory of output files to improve clarity and more clearly distinguish the final outputs of the pipeline from intermediate outputs. (This is a breaking change; see above.)
- Merge
Snakefile_ff
(the "cluster" mode of the pipeline) into the mainSnakefile
and add a config parameter to specify whether to run the pipeline in "search" or "cluster" mode. - Update and clarify some sections of the main README.
- Add developer docs.
Fixes
- Generate TM scores for each of the input proteins versus all of the query proteins (previously, some input-query protein pairs did not have a TM score due to Foldseek's filtering).
- Fix a bug that may have prevented the pipeline from running when only input FASTA files (rather than PDBs) are provided.
- Use unverified requests to query the ESMFold API as a work-around for ESMFold's expired SSL certs (from external contributor @naailkhan28).
- Add integration tests for the "cluster" mode of the pipeline.