Installation

Two ways to install LOTUS are presented here, the easiest way is the following:

conda create -p lotus_env
conda activate lotus_env
conda install -c gsiekaniec -c conda-forge -c bioconda lotus

Warning In case the installation is done this way the external files will be missing and must be retrieved from the LOTUS github: reference genome annotation file Homo_sapiens.GRCh38.108.chr.gff3.gz, cytoband file hg38_cytoband.tsv and external databases file Lotus_ExternalBases_202301.xlsx.

LOTUS informations

LOTUS is composed of the following four modules to process vcf files from GATK output (annotated with Funcotator):

Note In order to simplify the output of the files it can be interesting to create a results folder with the following tree structure:

results
  |
  +-- filter
  |
  +-- summarise
  |
  +-- compare
  |
  +-- merge

Details of the output and input files for every module can be found in the inputs_outputs_description directory.

🧬 Preliminary steps

If we start from a basic fastq file, before using LOTUS many steps are necessary to go from fastq (sequences) to annotated vcf (variants).
For that there are many ways to proceed, we advise here to use the GATK best practices¹. ⚠️ In particular, LOTUS can currently only handle VCF annotations from GATK's Funcotator software.

🧬 Filter

Main purpose

Simple filters on the vcf file from Funcotator using multiple informations to keep only trustworthy somatic variants.

Inputs/Outputs (get more details)

Parameters

Parameters	Description	Default
--vcf, -v	Result vcf file from Funcotator output.
--output, -o	Filtered vcf file. The Passed vcf file is also create using this output name.	output.filtered.vcf and output.passed.vcf
--working-method, -w	"InMemory" (default) loads the vcf file in memory into a list (more speed but higher memory consumption) or "Direct" reads and modifies the vcf file on the fly (slow speed but low memory consumption).	InMemory
--MBQ	Minimum median base variant quality for variant.	20
--DP	Minimum variant coverage.	10
--AF	Minimum fractions of variant in the tumor.	0.1
--AD	Minimum variant depths.	5
--POPAF	Maximum population (often GnomAD) variant frequencies.	0.00001
--unpaired	Argument to use if the reads used are unpaired (single end), put False in the paired variable.	True

Command line examples

Basic

lotus filter -v {PATH_TO_VCF}/sample.funcotated.vcf -o {OUTPUT_PATH}/sample.vcf

Complete

lotus filter -v {PATH_TO_VCF}/sample_unpaired_reads.funcotated.vcf -o {OUTPUT_PATH}/sample.vcf -wm Direct --MBQ 20 --DP 10 --AF 0.1 --AD 5 --POPAF 0.00001 --unpaired

🧬 Summarise

Main purpose

The summarise module provides information on the variants from the vcf files, including statistics on the number and nature of variants passing or not passing the filters, graphs representing the mutational profiles or the size of the indels and the list of impacted genes and their tumor burden.

Inputs/Outputs (get more details)

Parameters

Parameters	Description	Default
--vcf, -v	Vcf file containing variants that pass filter (*.filtered.pass.vcf).	None
--vcf_pass, -vp	Vcf file containing variants that pass filter (*.filtered.pass.vcf).
--genome, -g	Genome fasta file (allowed extensions : .fasta, .fa, .fan) or pickle (.pk, .pickle) file created after a first run.
--statistics, -s	Output statistics file.	stats.txt
--genes, -genes	Output file containing genes impacted by variants.	genes.txt
--profile, p	SVG	PNG file that shows the mutations profile of the vcf file.
--indel, -i	SVG	PNG file that shows the indel mutations size of the vcf file.
--enrichment	Did the GO enrichment analysis on the genes list using ToppGene and Panther and returns the biological processes (works if the APIs are not down).	False

Command line examples

Basic

lotus summarise -vp {FILTER_OUTPUT_PATH}/sample.passed.vcf -g hg38.fasta

Complete

lotus summarise -vp {FILTER_OUTPUT_PATH}/sample.passed.vcf -v {FILTER_OUTPUT_PATH}/sample.filtered.vcf -s {SUMMARISE_OUTPUT_PATH}/sample.stats.txt -p {SUMMARISE_OUTPUT_PATH}/sample_profile.svg -i {SUMMARISE_OUTPUT_PATH}/sample_indel.svg -g ../hg38.fasta -genes {SUMMARISE_OUTPUT_PATH}/sample.tsv --enrichment

Note hg38.fasta represents the reference genome fasta file.

🧬 Compare

Main purpose

The compare module allows a longitudinal comparative genomic analysis of the vcf files of a sample in order to determine the variants present at a time point (TPn) and disappearing/appearing at a time point (TPn+1) as well as the genes impacted by these variants.

Inputs/Outputs (get more details)

Parameters

Parameters	Description	Default
--config, -c	Configuration file containing path to vcf file (filtered.vcf and pass.vcf file from LOTUS filter) and tsv files for indel and snp from LOTUS summarise. Example available here.
--gff3, -gff3	Gff3 file. This file can be found here or in LOTUS.
--output, -o	Excel file containing the genes specific to the first or second biopsy.	"genes.xlsx" wich give "{vcf1}_{vcf2}_genes.tsv/.xlsx"
--profile, -p	SVG	PNG file that shows the comparison between mutations profiles of the two vcf file.
--indel, -i	SVG	PNG file that shows the indel mutations size of the vcf file.
--enrichment	Did the GO enrichment analysis on the genes list using ToppGene and Panther and returns the biological processes (works if the APIs are not down).	False
--pickle_gff3	Did the gff3 file given is a pickle file from previous lauch ?	False
--additional_gene_information	Add gene informations using the LOTUS file containing information from tumorspecific database (CancerHotSpot, CIViC, COSMIC, DoCM, IntOGen and TSGene 2.0).	False
--profile_proportion_off	Get different y-axis for the snp profile plot. Useful when one of the two axes is flattened by the size of the other one.	False

Command line examples

Basic

lotus compare -c config_compare_sample.txt -gff3 LOTUS_external_files/Homo_sapiens.GRCh38.108.chr.gff3.gz

Complete

lotus compare -c config_compare_sample.txt -gff3 LOTUS_external_files/Homo_sapiens.GRCh38.108.chr.gff3.pk -i {COMPARE_OUTPUT_PATH}/sample_indel.svg -o {COMPARE_OUTPUT_PATH}/compare.tsv -p {COMPARE_OUTPUT_PATH}/sample_profile.svg --additional_gene_information --enrichment --pickle_gff3 --profile_proportion_off

🧬 Merge

Main purpose

The merge module allows to have an overview of all the samples, it allows to group and compare all TPn against all TPn+1.

Inputs/Outputs (get more details)

Parameters

Parameters	Description	Default
--config, -c	Configuration file containing genes list from all patients. Merged patients results.
--output, -o	Ouput file name.	union.xlsx
--cytoband, -cyto	Human cytoband file for the corresponding genome version. This file can be download here or find the LOTUS github (for hg38). If this file is not provided the chromosome.svg plot will not be created.	None
--chromosome-step, -step	Frame used for counting the number of genes along the chromosomes.	500000
--chromosomes_output, -co	Output file name for the chromosomes plot.	chromosomes.svg
--upset, -u	Output name for upset plot. The upset plot is not created if no name is given. ⚠️ It can actually only handle a maximum of 15 files due to the explosion of the combination but this limitation should be lifted in the next version.	None
--weakness_threshold, -w	Mean weakness threshold to take a gene into account.	100
--min_subset_size, -minsb	Minimum size of a subset (nb of genes by subset) to be shown in the UpSetPlot. All subsets with a size smaller than this threshold will be omitted from plotting.	1
--max_subset_size, -maxsb	Maximum size of a subset (nb of genes by subset) to be shown in the UpSetPlot. All subsets with a size greater than this threshold will be omitted from plotting.	0
--min_degree, -mind	Minimum degree of a subset (nb of patients) to be shown in the UpSetPlot.	1
--max_degree, -maxd	Maximum degree of a subset (nb of patients) to be shown in the UpSetPlot.	0
--additional_gene_information	Add gene informations using the LOTUS file containing information from tumorspecific database (CancerHotSpot, CIViC, COSMIC, DoCM, IntOGen and TSGene 2.0).	False
--enrichment	Did the GO enrichment analysis on the genes list using ToppGene and Panther and returns the biological processes (works if the APIs are not down).	False

Command line examples

Basic

lotus merge -c config_merge.txt

Complete

lotus merge -c config_merge.txt -o {MERGE_OUTPUT_PATH}/union.xlsx -cyto LOTUS_external_files/hg38_cytoband.tsv -w 99 -co {MERGE_OUTPUT_PATH}/chromosomes.svg -step 500000 --additional_gene_information --enrichment

Future plot

Currently LOTUS allows to create an UpsetPlot² representing for each sample set the corresponding impacted gene set. However, due to the high computational complexity, this graph is only available for a maximum of 15 samples. The passage to a larger number is envisaged in the future.

Name		Name	Last commit message	Last commit date
Latest commit History 150 Commits
LOTUS_external_files		LOTUS_external_files
conda		conda
img		img
inputs_outputs_description		inputs_outputs_description
python_scripts		python_scripts
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
example_config.txt		example_config.txt
lotus.py		lotus.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

LOTUS informations

🧬 Preliminary steps

🧬 Filter

Main purpose

Inputs/Outputs (get more details)

Command line examples

Basic

Complete

🧬 Summarise

Main purpose

Inputs/Outputs (get more details)

Command line examples

Basic

Complete

🧬 Compare

Main purpose

Inputs/Outputs (get more details)

Command line examples

Basic

Complete

🧬 Merge

Main purpose

Inputs/Outputs (get more details)

Command line examples

Basic

Complete

About

Releases

Packages

Contributors 2

Languages

License

gsiekaniec/LOTUS

Folders and files

Latest commit

History

Repository files navigation

Installation

LOTUS informations

🧬 Preliminary steps

🧬 Filter

Main purpose

Inputs/Outputs (get more details)

Command line examples

Basic

Complete

🧬 Summarise

Main purpose

Inputs/Outputs (get more details)

Command line examples

Basic

Complete

🧬 Compare

Main purpose

Inputs/Outputs (get more details)

Command line examples

Basic

Complete

🧬 Merge

Main purpose

Inputs/Outputs (get more details)

Command line examples

Basic

Complete

Footnotes

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages