SeqExtractor is a script for extracting sequences from a FASTA file based on a list of sequence IDs provided in a separate file.
- Python 3
- Biopython library (
pip install biopython
)
- Clone the repository:
git clone https://github.com/cavalheiromf10/SeqExtractor.git
cd SeqExtractor
- Make the script executable
foo@bar: ~$ chmod +x SeqExtractor.py
./SeqExtractor.py -i input_file -s sequence_file -o output_file
input_file
: File with one sequence ID per line.
sequence_file
: FASTA file containing sequences to extract.
output_file
: Name of the output file to save the extracted sequences.
./SeqExtractor.py -i IDs_DUFs.txt -s Esalsugineum_173_v1.0.protein.fa -o output.fasta
This example will extract sequences from Esalsugineum_173_v1.0.protein.fa
based on the sequence IDs listed in IDs_DUFs.txt
and save the results to output.fasta
.
If you encounter any issues with file permissions or missing files, check the error messages provided by the script. Ensure that Biopython is installed (pip install biopython).