-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sample size for LD estimation (EUR) #106
Comments
BTW: Is there any GTEx-V8-pre-calculated clumped SNPs to download directly? |
@Shicheng-Guo which workflow are you referring to? In our applications we mostly have the matching genotypes so we don't really use reference panels as far as I can recall, for most workflows in this repo. |
Thanks Gao for your response. I mean the workflow below: https://github.com/cumc/bioworkflows/blob/master/GWAS/LD_Clumping.ipynb Thanks Shicheng |
I notice lots of papers use 1000Genme-EUR as reference, however, I prefer to use UKB-WGS individual data as reference. my question is what's the best sample size to use? 150K WGS data will make the process very time-consuming while sample number sample size may cause biased LD-clumping.
|
@Shicheng-Guo our LD clumping application was for association analysis with UK Biobank data -- that was why we selected subsets of UKB genotypes and used that as reference panel. We used 2000 samples I believe. I don't think LD clumping is as picky as eg fine-mapping applications in terms of LD panel. Since our application was on UKB data itself, we believe 2000 samples is good enough of an approximation. We don't have the reference for GTEx V8 data. I have not formally assessed it, but if you are concerned, perhaps you can take a few regions of UKB data, try computing LD panel from sample sizes 500 to 10K see how robust your estimates are? |
I notice you selected a random subset of unrelated samples. two questions:
Thanks.
Shicheng
The text was updated successfully, but these errors were encountered: