PMPrimer: a Python-based tool for automated design and evaluation of multiplex PCR primer pairs for diverse templates
Unlike single sequence, primer design on multiple sequences needs data quality filtering, multiple sequence alignment, conservative region identification, primer design for multiple templates, and evaluation of coverage and taxon specificity on multiple templates.
PMPrimer is a Python-based tool for automated design and evaluation of multiplex PCR primer pairs for diverse templates. The greatest strength of PMPrimer is the ability to identify conservative region using Shannon's entropy method, tolerate gaps using haplotype-based method, and evaluate multiplex PCR primer pairs according to template coverage and taxon specificity. PMPrimer were tested in datasets with diversified conservation and data size, including tuf genes of Staphylococci, hsp65 genes of Mycobacteriaceae, and 16S ribosomal RNA genes of Archaea. By comparison with previously designed primers and existing tools, PMPrimer has outstanding performance, such as automation, big data support, higher accuracy, and reasonable running time.
PMPrimer is a command-line tool. To handle diversified characteristics of target sequences, PMPrimer has multiple built-in parameters to make appropriate adjustments for data processing. The only necessary input file for PMPrimer is multiple sequences of target gene in FASTA format. Results are provided as JSON and CSV format files, which can be easily loaded into other programming languages for further analysis.
The software requires third-party packages as follow: primer3-py, biopython, matplotlib, pandas, seaborn and blast (2.13.0).
All python-based third-party packages will be automatically installed by pip. User needs install blast according to the documentation at BLAST OFFICIAL WEBSITE.
PMPrimer has been tested with Python3.8+ on Linux and Windows.
-
Install by pip
We recommend that you install PMPrimer through pip using the command:
pip install PMPrimer
orpip3 install PMPrimer
. pip will automatically install all python-based packages and add a commandpmprimer
in your system. -
Install by code
Download PMPrimer source code from this repository, decompress the source code, install all dependency packages by yourself in your system, then you can run this software by use
python3 PMPrimer/pmprimer.py
.
Use help command pmprimer --help
to show usage message. The detailed information is provided as follow.
-
Input file
--file -f
The only necessary input file for PMPrimer is multiple sequences of target gene in FASTA format. It can be unaligned or aligned. If you assign an unaligned file, you need assignment
-a muscle
parameter in primer design model to do multiple sequence alignment.Use like :
pmprimer -f seqs.fasta
orpmprimer --file seqs.fasta
-
Basic process
--progress -p
In basic process, PMPrimer assesses data quality, calculates length distribution of input data, filters low quality templates (such as too small and too long templates) according to length distribution, removes redundant templates in terminal taxa in default. If input file has been manually checked, you can use
-p notlen
to disable length filter based on length distribution. Similarly, you can use-p notsameseq
to disable redundant filter based on sequence comparison in terminal taxa. Moreover, if you want to draw heatmap according to distances of multiple templates, you can use-p matrix
to trigger drawing heatmap according to distances of multiple templates.All parameter as follows :
- notlen Do not processing sequences according to distribution of length
- notsameseq Do not remove redundancy sequences in terminal taxa
- matrix Draw heatmap according to distances of multiple templates
Add parameters behind -p, use like :
-p notlen notsameseq matrix
, order doesn't matter. -
Primer design
--alldesign -a
This module includes three steps, multiple sequence alignment by
muscle
parameter, conservative region identification bythreshold minlen gaps merge
parameters, and primer design bytm primer2
parameters. If you assign an unaligned file, you must asignmuscle
parameter in multiple sequence alignment step. You can adjust parameters for conservative region identification to obtain optimal conservative regions. You need assignprimer2
to trigger primer design when you get optimal conservative regions.All parameter as follows :
- muscle Use MUSCLE to align multiple sequences, the alignment sequences will be saved as *.mc.fasta
- threshold Threshold for identify conservative region, default is 0.95 frequency for major allele, use like threshold:0.95
- minlen Minimum length for identifying conservative region, default is 15, use like minlen:15
- gaps Maximum rate of gaps, default is 0.1, use like gaps:0.1
- merge Use merge module to merge conservative regions
- rank1 Rank according to diversity score of non conservative regions and display them
- rank2 Rank according to diversity score of conservative regions and display them
- haplo Display haploType sequence number of conservative regions
- tm Minimum melting temperature, default is 50, use like tm:50.0
- primer2 Primer design
Add parameters behind -a, use like :
-a threshold:0.96 minlen:16 merge primer2
, order doesn't matter. -
Amplicon selection and evaluation
--evaluate -e
To evaluate amplicon specificity, we use blast to align amplicon primer pairs to target fasta files. We can assign multiple target fasta files by using "," to split file pathes.
All parameter as follows :
- hpcnt Maximum count of primer haploType sequences, default is 10, use like hpcnt:10
- minlen Minimum length of amplicon, default is 150bp, use like : minlen:150
- maxlen Maximum length of amplicon, default is 1500bp, use like : maxlen:1500
- blast Use blast to evaluate amplicon specificity by aligning amplicon primer pairs to target fasta files, use ',' to split multiple target files, use like blast:seqs.fasta,hosts.fasta
- rmlow Remove primers with melting temperature lower than parameter
tm
in primer design module - save Save final results
Add parameters behind -e, use like :
-e hpcnt:50 maxlen:1200 save
, order doesn't matter. -
Debug mode
--debuglevel -d
All parameter is number :
- 1 Basic debug info
- 2 Module debug info
- 4 Complex debug info
If need debug info, use like :
-d 1
in normal conditions, this number use binary calculate.
-
length filter result
If you use
default
parameter in basic process (not usenotlen
), the software will filter input sequences according to length distribution and save the length filter result asseqs.filt.fasta
. -
redundant filter result
If you use
default
parameter in basic process (not usenotsameseq
), the software will filter input sequences according to sequence comparison in terminal taxa and save the redundant filter result asseqs.filt.fasta
. -
heatmap result
If you use
matrix
parameter in basic process, the software will create a heatmap figure according to distances of multiple templates and save the figure astmp.png
. -
multiple sequence alignment
If you use
muscle
parameter in primer design module to do multiple sequence alignment, the software will save multiple sequence alignment asseqs.mc.fasta
. -
amplicon primer pairs
The final result is saved as
JSON
andCSV
format files (timestamp
_recommand_region_primer.json andtimestamp
_recommand_region_primer.csv). The JSON file can be easily loaded into Python, R, or other programming languages for further analysis. The CSV file provides a user readable result file, which has seven columns titled as "Amplicon, Coverage, Taxon specificity, Effective length, Forward degenerate primer, Forward haplotype primers, Forward primer info, Reverse degenerate primer, Reverse haplotype primers, Reverse primer info". In Forward/Reverse primer info, five numbers are included, which are occurrence number of primer in multiple templates, melting temperature of primer, melting temperature of hairpin structure, melting temperature of homodimer structure, and coverage rate of primer in multiple templates.
-
If input file need data process, use like :
pmprimer -f seqs.fasta -p default
Filt sequences according to distribution of length,remove redundancy sequences in terminal taxa.
-
If input file need align, use like :
pmprimer -f seqs.fasta -a muscle primer2
This command will save alignments as 'seqs.mc.fasta'
-
If input file is alignments, use like :
pmprimer -f seqs.fasta -a primer2
-
If you want check result of conservative region identification, you can use
rank1
,rank2
, andhaplo
parameters to show the result, use like :pmprimer -f seqs.fasta -a rank1 rank2 haplo
After you obtain optimal conservative regions, You can use
primer2
to trigger primer design.
Dataset in paper can obtained from PMPrimer Datasets
-
16S ribosomal RNA (rRNA) genes of Archaea
Command in paper is :
pmprimer -f Archaea_16SrRNA.rep.mc.fasta -a threshold:0.85 gaps:1.0 merge primer2 tm:45.0 -e hpcnt:600 save
-
hsp65 (groEL2) genes of Mycobacteriaceae
Command in paper is :
pmprimer -f Mycobacteriaceae_groEL2.filt.mc.fasta -a primer2 -e hpcnt:30 save
-
tuf genes of Staphylococci
Command in paper is :
pmprimer -f Staphylococcus_tuf.filt.mc.fasta -a threshold:0.995 minlen:5 merge primer2 -e save
Please cite this paper when useing PMPrimer for your publications.
A tool to automatically design multiplex PCR primer pairs for specific targets using diverse templates