Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate further returned GP/DSP quality metrics and GQ #88

Open
matren395 opened this issue Apr 10, 2024 · 3 comments
Open

Evaluate further returned GP/DSP quality metrics and GQ #88

matren395 opened this issue Apr 10, 2024 · 3 comments
Assignees
Labels
methods for filtering the project board. All issues should be tagged with this.

Comments

@matren395
Copy link
Contributor

As they are returned, evaluate other general quality metrics from GP and DSP, for things like contamination and coverage and any/other picard metrics. Will update this ticket as I learn more about what will be delivered!

@matren395 matren395 added the methods for filtering the project board. All issues should be tagged with this. label Apr 10, 2024
@matren395 matren395 self-assigned this Apr 10, 2024
@matren395 matren395 changed the title Evaluate further returned GP/DSP quality metrics Evaluate further returned GP/DSP quality metrics and GQ Apr 11, 2024
@matren395
Copy link
Contributor Author

@matren395
Copy link
Contributor Author

Text:

So we created an accidental natural experiment for this in seqr because one of the Gregor sites loaded a DRAGEN VCF with all the data we had submitted to Gregor, including many RGP families. Yesterday I spot checked a discovery variant Stephanie found in an RGP family and found that while in our GATK callset it had a GQ of 99, in the DRAGEN callset it has a GQ of 33: https://seqr.broadinstitute.org/summary_data/variant_lookup?genomeVersion=38&variantId=18-35067754-A-G
Looking into the research you did here, it looks like the histogram for GQ values looks really similar for the DRAGEN and GATK data, in that they both have a huge spike around 40 and a smaller spike around 100. However, I wonder if theres a difference in the distributions if we break it down a bit differently. Could you run a couple other comparions for GQ distribution with the following adjustments:
Remove the X chromosome - Stephanie mentioned that in GATK males on the X chromosome have a ton of GQs between 30 and 40m so that might be skewing our numbers
Look at the GQ distribution for non-ref calls only, to see if tehres any difference there
Its possible that it was just bad luck and I happened to spot check the one example variant where the GQ dropped off like this, but I would really want to confirm this isn't a sign of something more severe

@matren395
Copy link
Contributor Author

Fun news: it is good that we investigated this! As it were, for autosomal non-ref calls, GQ is much lower for DRAGEN-called data than our GATK-called example. Shown in the attached pdfs
Seqr GATK and DRAGEN GQ Comparison Autosomal NonRef GQ 20240411.pdf
Seqr GREGoR Sanity Checks - Autosomal NonRef GQ 20240411.pdf
Seqr_GATK_DRAGEN_GQ_graphs.pdf
Seqr GATK and DRAGEN GQ Comparison.pdf
Seqr DRAGEN Sanity Checks - 20240411.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
methods for filtering the project board. All issues should be tagged with this.
Projects
None yet
Development

No branches or pull requests

1 participant