This repository contains a pipeline to study the evolution of the order Lactobacillales using public genome data.
Step 1: clone this repository:
git clone https://github.com/swittouck/legen.git
Step 2: install all dependencies.
Step 3: create folders for data and results:
cd legen
mkdir data results
Step 4: download type strain names of validly published species from the LPSN and put them in data
.
Step 5: run all scripts in src
in the order indicated by the file/folder names. Run each script directly from its parent directory. E.g.:
cd src/01_prepare_genomes
./01_download_metadata.R
Software:
- R v4.2.3
- ProClasp v1.0
- Prodigal v2.6.3
- SCARAP v0.4.0
- trimAl 1.4.rev15
- IQ-TREE v1.6.12
R packages:
- tidyverse v2.0.0
- tidygenomes v0.1.3
- ape v5.7.1
genomes_lactobacillales_gtdb-r207.tsv
- metadata of all Lactobacillales genomes that are in release 207 of the GTDB
- downloaded by the script src/lactobacillales/01_download_metadata.R
genomes_lactobacillales_gtdb-r207
- a selection of one high-quality genome per species (for Carnobacteriaceae) or per genus (for non-Carnobacteriaceae) downloaded from the NCBI
- downloaded by the script src/lactobacillales/02_download_genomes.sh
lpsn_gss_2023-03-23.csv
- type strain names and other info for all validly published species, from LPSN
- downloaded from https://lpsn.dsmz.de/downloads (PNU account required)
This analysis was based on v4 of LEGEN and is available in the pangenome-toolkit repository. A manuscript describing the results has been submitted to an open access journal.
This analysis was based on v3 of LEGEN and is available in the lacto_genus repository. The results have been published in IJSEM:
This analysis was based on v3 of LEGEN and is available in the lacto_species repository. The results have been published in mSystems: