-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
region extraction pipeline failing in allele flip handling #132
Comments
@dianacornejo I have a question about the example. Does it mean that there are duplicated SNPs (one of them flipped) in the regenie sumstats? |
@changebio yes in the sumstats, I think what's happening is one allele comes from the exome data and the other one from the imputed data. In this particular case both the exome and imputed have the exact same sample size, different from before in which the imputed had more samples. |
@dianacornejo I figured out why you got the error. There are duplicated SNPs in genotype data with shifted a0 and a1, which is not considered in my function. I am fixing it |
@gaow How should we deal with SNPs, which will be the "same" after shifting over a0 and a1. But they have different beta in sumstats, which means they are not just shifted a0 and a1 and also have different genotypes. |
@changebio I'm confused with the context under which this is discussed. The data you show above look perfectly normal to me if they come from two different association tests. In @dianacornejo 's original example, the two variants in question are claimed to have the same sample size (and possibly almost the same samples) -- which I would say so because the summary stats are a little bit different and not by too much. In @changebio 's last ticket, these variants have different summary stats because they are association tests on different data. It's not a duplicate. It's a merger artifact to me. I thought in merging the imputed and sequence data we go by whichever with a larger sample size and simply drop the other one, if they are the same variant (after flipping as necessary)? |
@dianacornejo It turns out what he found and what you reported are both legit issues and are separate problems. @changebio told me offline that he ran into the 2nd issue when he investigated on your initial issue. Moving forward, @changebio will fix the issue you reported, but we need your help on the issue he observe above. I'm going to open a ticket in the UKB repo since the problem is data-set specific. But @changebio will formalize this into QC on sumstats to help catching the issue before merging summary stats. |
@gaow yes I can see the second issue being different from the first I reported. Because what he shows is only on the exome data, not in the merged data where I reported the first problem. Sure anything I can do just let me know. |
Hi @changebio I'm going to bring the discussion here so that I don't forget about it.
I ran the region extraction pipeline with the merge exome and impute data from the UKB, and then proceed to run the fine mapping analysis. I noticed something was wrong on how the allele flip is being handled.
Here one example in chr22
That led to weird results on the fine mapping analysis. As you can see the same variant (with flipped alleles is being used for fine mapping giving 2 different pip results)
Any thoughts on how to solve this issue are appreciated!
Thank you
The text was updated successfully, but these errors were encountered: