- Replace RNU2-4 gene coordinates with RNU4-2 coordinates in wgs intersect bed.
- Sort processes in versions yaml and images in Scout yaml
- Update Genmod version allowing control of penalty
- Assign non-scored components for single GIAB
- Fix
compound_finder.pl
such that it converts floats to int (i.e. 5.0 -> 5) - Remap all start-from-BAM channels that flips the ID <-> group
- Copy in bam file to work dir when running from bam rather than accessing it directly in its original location
- Sort the order of vcfs to
gvcf_combine
for stable SNV-calls
-
- Ensure that input VCFs are always supplied in the same alphanumeric order to
svdb_merge
when running trio analysis (see #172)
- Ensure that input VCFs are always supplied in the same alphanumeric order to
- Add a process to get contamination values from verifybamid2 software.
- Update configs/nextflow.hopper.config with a specific verifybamid2 container.
- Update configs/nextflow.hopper.config with specific SVDPrefix files for panel and wgs.
- Added --format vcf to
vep_sv
to fix for cases where vcf file carries no variants.
- Add workaround to enable loqusdb export for runs where SV calling is disabled
- Rename myeloid_const loqusdb to
loqusdb_myeloid_const
- Disable artefact scoring in
myeloid_const
rank models
- Fix mito QC stats JSON conversion for samples started from old bams with updated sample ids.
- Update config for bed intersect
- Some fixes to the logging of the bed intersect script
- Use reduced gene_panel JSON to avoid adding dead/archived panels to new scout cases
- Add lennart-side script/worker CRON job to generate new gene panel JSON
- Extend the update_bed.pl script to handle multiple input files
- Rewrite to Python and add tests
- Reverted removed code in gene panel matches, caused missing gene panels for onco samples
- Solved trio eklipse image being wrongly added to yaml
- Removed outdated regex matches for genepanel, would remove important gene panels
- General clean-up of create_yml.pl
- Fix bug where wrong tuple value unpacked as group and sample id in
bqsr
when starting run from bam
- Fixed faulty if-condition for annotsv, would result in empty annotsv tsv everytime
- Use -K flag in bwa-mem for consistent results
- Re-optimized profiles wgs and onco. More memory allocations
- Added flag for reanalyze for bjorn to hook into
- Add updated and more communicative deploy script
- Remove or rename other deploy scripts
- Update MODY-cf configs to use the same as onco
- Clean up in MODY-cf config post merge
- Give the Sentieon container path by a parameter in the config file
- Update the Sentieon container to 202308 version
- Split out the
sentieon_qc
post-processing into its own processsentieon_qc_postprocess
- Update the Perl script used in
sentieon_qc_postprocess
to take input parameters as explicit arguments - Update intersect file to latest used version of ClinVar (20231230)
- Update fastp to 0.23.4 and move to own container to fix reproducibility issue (#143)
- Update CADD to v1.7
- Increase
inher_models
processing time - Updated VEP from 103.0 to 111.0
- Updated VEP fasta from 98.0 to 111.0
- Updated VEP cache from 103.0 to 111.0
- Moved VEP parameters from processes to config
- Disabled vep
--everything
to disable VEP annotation w/ GNOMAD - Removed deprecated
--af_esp
from--everything
- Tentative update of scout ranking.
- cleanVCF.py now removes records missing CSQ-field.
- Add
SVTYPE
VEP 111 bug workaround invep_sv
process. (See Ensembl/ensembl-vep#1631) - Add VEP105 - 111 annotations to all rank models in use
- Fix onco model filename version (v5 rank model was misnamed as v4 in production)
- Re-enable D4 file generation (for Chanjo2)
- Disable Chanjo2
- Add mody-cf profile
- Run D4 coverage for full file
- Further simplifications of the checklist template
- Trim down size of checklist template, and add check for entering used test samples
- Add d4 file path directly to Scout YAML
- Tag Mitochondrial variants with GQ, loqusdb enabling
- Add CRON file to load Chanjo2
- added csv-file to onComplete function to accomodate CCCP
- small name change for myeloid constitutional to match clarity
- removed custom_images header for samples without images as pydantic would crash in scout load
- Add d4 coverage calculations to the workflow
- Fix genmod caller-penalty bug for GATK GQC vals (#170)
- Remove bgzip and gunzip from versions
- Some cleanup in version documentation and code
- Use new docs as main entry point in repo
- Start removing old docs
- Update software responsible list in docs
- Added changelog reminder to github workflows
- Adding a new variant catalogue for expansionhunter/stranger/reviewer
- Add
documentation
to change type category in PR template.
- Changed melt configs, added flags: exome, removed flags: cov (was being used improperly)
- Added priors to
mei_list
, and changedmei_list
to a new location in config - Changes has been verified, report can be found internally
- Changed path to normal-pool-refs for gens. Uses masked hg38 references
- Add first iteration of updated documentation
- Move out resource files from
main.nf
tonextflow.config
- Move the selected fields for PHYLOP and PHASTCONS in vep to be specified in the process, similarly to the other plugins/custom fields
- Clean out unused files in repo root directory
- Add Github PR template/test documentation
- Update the cron log directory to use the
params.crondir
folder as base
- Add version outputs from all processes that use external software.
- Add stubs to processes to allow performing stub runs.
- Hotfix, increase melt sensivity by increasing amount of reads melt are alowed to use in RAM.
- MELT is no longer filtered on location based upon regex names INTRONIC/null/PROMOTER, instead added a intersect towards bedfile. This will show splice site variants
- Add REVEL (Rare Exome Variant Ensemble Learner) Scores to VEP annotations (VEP
REVEL_rankscore
andREVEL_score
)
- Two processes for computing mitochondrial seq QC data from mt bam files and saving to JSON:
- Script
bin/merge_json_files.py
to merge 1 or more JSON files into one JSON. Used to generate the final{id}.QC
from the json output of the processessentieon_qc
andsentieon_mitochondrial_qc
. - Script
bin/mito_tsv_to_json.py
to extract and convert mtQC data fromsentieon_mitochondrial_qc
process output to json
- process
sentieon_qc
outputs to intermediate{id}_qc.json
file instead of the final{id}.QC
- added two more genes to expansionhunter variant catalogue.
- dont print Mitochondrion, we handle the mitochondrion seperatly in the pipeline, caused loqusdb errors
- fixed filepaths for access-dir for myeloid profile in nextflow.config
- fixed assay name for create_yml.pl so yaml-file gets correct institute owner for myeloid samples
- added a script to update wgs-bed file with current clinvar intron + intergenic regions. Also produces a log file of what's been added and removed
- added support to dry run vcf for testing scoring
- merged cnv2bed branch, small updates to color scheme for Alamut import files for CNVs
- increase time limit of create_pedigree
- added retries to vcfanno, file-system caching bug out?
- removed deep caching from freebayes(onco only) weird bug?
- alt affect type was lost for SNV<->SV compound, would get mixed up
- added type and joined upon the value
- compounds for only SNVs for alt affect duos was wrongly renamed, added a sed-command
- oncov2-0 and wgs profiles now both use loqusdb dumps for SV artefact annotations
- create pedigree has completely changed, now it is a separate perl-skript
- creates one pedigree per affections status of parent, i.e in a trio, three ped-files with mother affect/father affected/no parent affected(default loaded into scout)
- will calculate all states per genomod score, per vcf
- optionally load these cases into scout, located in a subcategory in yaml-output-folder
- oncov2-0 now implemented, uses the old onco profile and oncov1-0 uses oncov1-0 profile (to be discontinued)
- No longer use delly SV-caller, instead use GATK + CNVkit + manta
- new version of MELT that catch much more important variation
- indicator of onco-version in rankmodel-name of yaml-file, visable in scout case page
- Added regex to support wgs-hg38-XXXX. suffix to run wgs-profile with different flags. ie --noupload true, no cdm/loqusdb upload for reruns
- Fixed a serious bug in prescore_sv.pl, would randomly chose proband-id for duos
- Added SVs to loqusdb load. Using scored snv-vcf for correct MT->M notation
- genmod patch not taking effect in singularity, switched to a smaller genmod container with patch
- updated processes in main.nf for above container
- Changes to custom_images in yaml, case/str
- Added support for reviewer, activate when scout is updated
- Image sizes for mitochondrial plots in yaml
- resource management in processes
- gatk-ref moved to cached directory
- Added support for loading images into scout, each process generating a plot can now be added as a path to scout-yaml
- Some support for Grace (new cluster)
- CDM load-file only created for one individual of family, fixed (join function corrected)
- increased memory allocation for onco_depth
- removed shards from dedup, caused malformed output for dedup_metrics. Works as intended still
- params.assay for onco has historical name, depth_onco process used wrong value
- using shard-specification for depup caused faulty dedupmetrics file
- Removed all distribution of sentieon except first alignment step option
- Removed bam.toRealPath() from all processes. Bai files are now given along with bam files if alignment is to be skipped. More down below
- VCF start removed temporarily
- BAM start now work better. Add headers bam + bai to csv with corresponding files
- BATCH start now available for onco-samples. Thorough channel joining and removal of distributed sentieon made it possible (does not work for wgs profile!)
- fixed grep for multi-allelic FP loci
- added hash element rstreshold (rankscore threshold), that if defined overwrites defualt -1 to createyml.pl
- added panel depth as alternative to chanjo for panel-data
- correctly assigned theanoflag for gatk coverage and ploidy, would in rare cases cause crashes
- GENS middleman command added to
generate_gens_data
. Needed for loading of data into GENS thorugh cron and middleman
- REViewer now loops through a perl shell script instead of bash. Low covered loci error no longer crash all other svg-image generation
- fixed a typo which named all svgs as 7156, a validation and verification sample
- recurring multi-allelic variant @MT:955 keeps vcfmultibreak in a never ending loop
- grep -v ^MT 955
- ignore errors of REViewer
- loqusdb faulty input caused wrong imports to loqusdb, now fixed
- GAV replaced with REViewer
- Stranger 0.8 with updated variant catalogue
- source activate gatk to all gatk processes that use 4.1.9 and set +eu. unbound errors
- increased memory allocation for several gatk and mito processes
- sharded merged bam and non-sharded bam now produces output for dedup too. Locuscollector no longer redirects bam. This saves upto 70% of temporary files!
new functions
- added mito-calling
- mutect2
- hmtnote
- haplogrep
- eklipse
- modifications to filter_indels (new VEP fields)
- modifications to
modify_vcf_scout.pl
, ignore maxentscan for M
- added SMNCopyNumberCalling
- New VEP container and version (103)
- gatk cnv calling
- adjustments to all affected scripts
- new container specifically for madeline2
- main container now includes all software except madeline2 and VEP
- new conda environments
- updates to Expansionhunter
- updates to Stranger
- updated GATK version
- updated Sentieon
- added: haplogrep, hmtnote, eklipse, melt, graphalignmentviewer, SMNcopynumbercaller, CNVkit and imagemagick
- group and sample IDs of outputs re-thought
- contig synonyms for VEP
- BAM start working better
- added path to symlinked latest weekly definition.
- added pf missmatch and error rates to qc-json
- rankmodels now separate VEP-consequence from AnnotSVrank and dbvar in Consequence and Clinical_significance respectively
- rescore.nf had wrongly named variable in output for bamfiles
- create_yml.pl now recieved gene_panel content from hopper-json. no longer require scout-vm connectivity
- clincalwes now has correct loqusdb not piggybacking of onco
- timelimit increases, scratch and stage in/out for processes
- create_yml.pl added ahus analysis for wgs_hg38 assay. Stinking mess initiated, please correct
- create_yml.pl added hemato analysis for clinicalwesv1-0 assay. Correct institute for myeloid normals
- create_yml.pl now has a hash with all definable scout import-fields per assay, allowing easier additions and modifications to/of assays.
- Fix a bug that generated corrupt Gens json files...
- Generate a json with data for the Gens overview plot, to allow quicker loading in Gens.
- added rescoring function through rescore.nf
- fixed naming of expansionhunter vcfs
- new rankmodels for wgs profile. 5.1 (loqusdb cutoffs and VEP-csq scoring)
- added specific delly filtering script
- now correctly filters breakpoints outside panel
filter_panel_cnv.pl
now only annotates for scout- delly precise/imprecise annotation
- new artifact database for SV-calling for WGS and oncogenetics
- added container and git-hash to logging
- pathing through freebayes cause nextflow to not recieve completion status, now wgs is run through freebayes with touch-command only
- Optional input files, fastq/bam/vcf
- hg38 alignment and annotations
- profiles, wgs/onco/exome
- several new sv-variant callers, melt, cnvkit, delly
- POD-tool for duplication events in trios
- Freebayes calling for difficult homopylomers in onco-samples
- Yaml-creation for scout import overhauled
- new container with needed software
- new rank-models for onco (both snv and sv) and wgs (sv-rank not live yet)
- Various small improvements of code
- Optimization of cpu/memory/time for each process
- Numerous small improvements of several scripts
- Last hg19 version
- Minor file-paths issues resolved
- Fix incorrect filtering of UPD calls in genomeplotter
- Only plot UPDs in overview plot if > 100 informative sites
- Don't run UPD process on duos
- Use the correct scout server for create_yaml and loqusdb annotation
- Removed some hardcoded assumptions in create_yml.pl
- Always create gvcfs, and publish
- Publish chanjo coverage file
- Fix ID mixup in gatkcov process
- Added retry strategies to cdm, locuscollector and bqsr processes
- create_yml: Only add each panel once and sort panels alphabetically
- Increase allocated memory for Clinvar SnpSift process due to occasional crashes
- Further fixes to output folders
- Retry create_yaml and loqus process up to 5 times
- Allow for non-distributed BWA (new default). For urgent cases use --shardbwa
- Add diagnosis field to CDM
- Change back to adding intersected vcf for loqusdb instead of full genomic vcf
- Change name of 1000G INFO field to make it show up in Scout
- Removed some hardcoded paths to scripts and added them the the bin/ folder
- Add 1000G in a special field for positions missing gnomAD (typically non-exonic) and add it to rankmodel
- Properly add selected gene panels to YAML
- Add STR-vcf to YAML
- Rename sample in STR-vcf to agree with sample name instead of bam filename