A Easy-to use Template for mmseqs search and cluster. Reduce identity between protein sequences.
Just install mmseqs in the official website. Remember the path of your mmseqs binary, this will be used in our script later.
Check the scripts in usage
folder and replace the template with your own paths:
- mmseqs binary path (eg.
/usr/local/mmseqs/bin/mmseqs
) - input fasta path (eg.
/root/project/MMseqsTemplate/data/protein.fasta
) - output fasta path (eg.
/root/project/MMseqsTemplate/data/protein_rep50.fasta
)
Then run the scripts directly.
run usage/mmseq_cluster.sh
- input: fasta file
- output: fasta file (smaller)
run usage/mmseq_group.sh
- input: fasta file
- output: fasta-like file (larger)
eg. A fasta-like file may be like this:
>sequence96 >sequence96 AVPVAVWLVSALAMGAGVAGG >sequence192 AVPVAVWLVSALAMGAGMAGG >sequence165 >sequence1 >sequence1 FLGFLLGVGSAIASGVAVSKV >sequence0 FLGFLLGVGSAIASGTAVSKV >sequence72 FLGFLLGIGSAIASGVAVSKV
run usage/mmseq_search.sh
- input: 2 fasta files
- output: list of the filtered sequence names