region extraction pipeline failing in allele flip handling #132

dianacornejo · 2022-01-12T18:18:30Z

Hi @changebio I'm going to bring the discussion here so that I don't forget about it.

I ran the region extraction pipeline with the merge exome and impute data from the UKB, and then proceed to run the fine mapping analysis. I noticed something was wrong on how the allele flip is being handled.
Here one example in chr22

22	50549676	A	G	chr22:50549676:A:G	-0.122039	0.0200183	1.3205048480570123e-09
22	50549676	G	A	chr22:50549676:G:A	0.123389	0.0200053	8.481838520726142e-10

That led to weird results on the fine mapping analysis. As you can see the same variant (with flipped alleles is being used for fine mapping giving 2 different pip results)

chr22.50549067.A.AG 0.667913905147005
chr22.50549676.A.G 0.399442941057422
chr22.50549676.G.A 0.60547973816813

Any thoughts on how to solve this issue are appreciated!

Thank you

The text was updated successfully, but these errors were encountered:

changebio · 2022-01-13T02:32:56Z

@dianacornejo I have a question about the example. Does it mean that there are duplicated SNPs (one of them flipped) in the regenie sumstats?

dianacornejo · 2022-01-13T14:48:18Z

@changebio yes in the sumstats, I think what's happening is one allele comes from the exome data and the other one from the imputed data. In this particular case both the exome and imputed have the exact same sample size, different from before in which the imputed had more samples.

changebio · 2022-01-17T17:07:40Z

@dianacornejo I figured out why you got the error. There are duplicated SNPs in genotype data with shifted a0 and a1, which is not considered in my function. I am fixing it

changebio · 2022-01-17T17:40:59Z

@gaow How should we deal with SNPs, which will be the "same" after shifting over a0 and a1. But they have different beta in sumstats, which means they are not just shifted a0 and a1 and also have different genotypes.

gaow · 2022-01-17T18:21:09Z

@changebio I'm confused with the context under which this is discussed. The data you show above look perfectly normal to me if they come from two different association tests.

In @dianacornejo 's original example, the two variants in question are claimed to have the same sample size (and possibly almost the same samples) -- which I would say so because the summary stats are a little bit different and not by too much. In @changebio 's last ticket, these variants have different summary stats because they are association tests on different data. It's not a duplicate. It's a merger artifact to me. I thought in merging the imputed and sequence data we go by whichever with a larger sample size and simply drop the other one, if they are the same variant (after flipping as necessary)?

gaow · 2022-01-17T22:07:14Z

@dianacornejo It turns out what he found and what you reported are both legit issues and are separate problems. @changebio told me offline that he ran into the 2nd issue when he investigated on your initial issue. Moving forward, @changebio will fix the issue you reported, but we need your help on the issue he observe above. I'm going to open a ticket in the UKB repo since the problem is data-set specific. But @changebio will formalize this into QC on sumstats to help catching the issue before merging summary stats.

dianacornejo · 2022-01-18T14:31:46Z

@gaow yes I can see the second issue being different from the first I reported. Because what he shows is only on the exome data, not in the merged data where I reported the first problem. Sure anything I can do just let me know.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

region extraction pipeline failing in allele flip handling #132

region extraction pipeline failing in allele flip handling #132

dianacornejo commented Jan 12, 2022

changebio commented Jan 13, 2022

dianacornejo commented Jan 13, 2022

changebio commented Jan 17, 2022 •

edited

Loading

changebio commented Jan 17, 2022

gaow commented Jan 17, 2022

gaow commented Jan 17, 2022

dianacornejo commented Jan 18, 2022

region extraction pipeline failing in allele flip handling #132

region extraction pipeline failing in allele flip handling #132

Comments

dianacornejo commented Jan 12, 2022

changebio commented Jan 13, 2022

dianacornejo commented Jan 13, 2022

changebio commented Jan 17, 2022 • edited Loading

changebio commented Jan 17, 2022

gaow commented Jan 17, 2022

gaow commented Jan 17, 2022

dianacornejo commented Jan 18, 2022

changebio commented Jan 17, 2022 •

edited

Loading