Releases: rbutleriii/Clinotator
New ClinVar XML
To accommodate the new format of the ClinVar XML, the code has been extensively reworked. While most things could be relocated in this new format, there are some lasting changes.
Fixes:
- new xml table
- the output in vcf mode now has no whitespace in the INFO column to be compatible with older VCF format and GATK.
- ncbi query times and reconnections have been improved
Updates:
--long-log
now has more thorough reporting of the classification of assertions, for better debugging.- CVDS now returns disease IDs to LinkOut. This is due to text descriptions being removed from the new XML format entirely. However, this is easier to parse as an output.
- Variant types have been implemented in the new xml beyond 'Simple' and 'Haplotype'. These show up in CVVT.
- Based on the current variant assertions in ClinVar, the assertion weights have been recalibrated.
The Phantom Variant
Bugfixes:
- Move the sleep timer to the end of the
batch_local
function to make sure it pauses after the first query if that is the only query (<1000 rsids). - If any of a few
Haplotype
Variation IDs are included in the vcf annotation mode, it can create an issue where the other locus in the pair will end up in the annotation table (expected). But this also creates an rsid/alt pair of.|A
which then causes all variants with no rsID and anA
alt allele to get this clinotator annotation (see VID: 560517, which isrs111033524|A
and.|A
). Fixed the vcf annotation to prevent rsIDs with.
from being annotated, but this then means the Haplotype variant will not be annotated at the.|A
locus (which isn't really possible anyways). Keeping track ofHaplotype
variants requires special attention, and both loci can be retrieved from thevcfmatch
field of the tsv table.
Restoring query functionality
Updates:
- The default value for the max header size has been increased from 200 to 500 lines.
- Multiple logging messages have been added or modified to better log the run.
Bugfixes:
- batch_local() has been given a sleep(0.37) to accommodate the stricter query limits from Entrez.
- empty returns from Entrez are now properly handled without IndexError #6 .
Compatibility updates
Bugfixes:
- Readme updates for dependencies (must use biopython 1.73 now).
- Pandas updates to replace components deprecated since 0.22.0 (read_table).
- Fixed error in py3.7 due to PEP 479.
Post Reviewer Comments: Part III
Updates:
-
CVDS has been modified to provide the assertion clinical significance associated with each condition, in the form "Disease A(P);Disease B(US);Condition zero(P)". In preparation for downstream parsing of phenotype significance. It also now excludes conditions not associated with a valid clinical assertion.
-
The prediction intervals have been recalibrated with all of the recent changes. Clinotator was rerun on the same set of variants from 2/24/2018. CTPS boundaries have changed slightly: B<-26.7<=BLB<-8.4<=LB<-4.2<=US=>4.2>LP=>8.4>PLP=>14.7>P
Post Reviewer Comments: Part II
Updates:
- The ClinVar Alternate Allele, CVAL (formerly CVMA) now has a description in the readme.
- Consistency updates, with all instances of 'RSID' now replaced with 'rsID'.
- A Clinotator run date has been added to the info reported in the terminal or log file.
- Vcf files are given an additional metadata line: "##annotation=CLINOTATORvX.X.X_run_YYYY-MM-DD".
- Additional readme updates.
Post Reviewer Comments
Updates:
- The CTRR metric has changed, there is a new value "." for insufficient evidence. A "0" value is now solely for Consistent Identity.
Bugfixes:
- Currently an issue with python 3.4.x and installing pandas >0.22, if encountered, see installation notes or consult the pandas creators.
Zenodo linked
Updates:
- added Zenodo DOI for publication
Bugfix:
- corrected the version number in global_vars.py
Manuscript Public Release
v1.0.0 Update README.md
Batch rsID: Redux, now with more ePostage
Bugfix:
- The rsID -> VID lookup was ovehauled again, as 0.3.0 resulted in frequent http errors, and was kind of slow. Now much smoother and faster.
Updates:
- Modifications to error logging for consistency.