feat: Handling complex variants in snv_indel #521
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR:
New version of module snv_indels.
Added: Python script for handling complex variants with several vcf records.
Some variant callers (e.g. vardict) will compose variants within close physical distance and report it as one complex variant. vt_decompose will separate these complex variants into separate records. However, during decomposition the same allele might be reported in several records, one originating from the single variant and one or more records reported from one or more complex variants, even if these records are corresponding to the same allele at the same position. The allele frequencies will also be different for these records since it might be derived from the frequency of the allele in combination with a specific allele at another position whithin the complex variant.
This python script can turn several records from the same allele at the same position into one record. Depending on the method given by the user the allele frequency and metrics derived from this will be reported differently. "skip" is the default method and in this case no alterations of the records will be made. All records for a complex variant will be returned in the output vcf. The method "max" will return the record with the highest allele frequency and discard any additional records, with the same allele and position, from the output vcf. The "sum" method will sum the allele frequencies from all records with the same allele and position.