Data, documentation, analysis and nextflow pipeline for the manuscript "Highly significant improvement of protein sequence alignments with AlphaFold2".
This work has been carried out in Notredame Lab at the Centre for Genomic Regulation - CRG
The authors who contributed to the analysis and manuscript are:
- Athanasios Baltzis
- Leila Mansouri
- Suzanne Jin
- Bjorn Langer
- Ionas Erb
- Cedric Notredame
This repository contains a series of Jupyter Notebooks that contain the steps for replicating the analysis, tables and figures in the manuscript using R.
The pipeline for predicting the AF2 models and producing the MSAs is built using Nextflow. It comes with a singularity container (the recipe is available here) for running AF2 and a docker container (available on DockerHub here).
- Download the genetic databases required for AlphaFold2 using the provided script.
- Download and format the database used for PSI-Coffee blast search (by default Uniref50).
- Make sure you have singularity installed in your system.
- Install the Nextflow runtime by running the following command:
curl -fsSL get.nextflow.io | bash
- You can launch the pipeline execution by entering the command shown below:
nextflow run athbaltzis/msa-af2-nf
By default the pipeline is executed against the provided example dataset. You can modify the input data as well as the other available parameteres listed below:
Input sequences (FASTA)
Input lists of sequences
Input template lists
Input experimentally determined PDB structures
Input path to Database for PSI-Coffee
Predict structures with AF2 [true or false(default)]
Path to AF2 predicted models (if --predict false)
Input PDB structures for secondary structure assignment