Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADD new colocalization results #524

Open
sync-by-unito bot opened this issue Jul 16, 2024 · 19 comments
Open

ADD new colocalization results #524

sync-by-unito bot opened this issue Jul 16, 2024 · 19 comments

Comments

@sync-by-unito
Copy link

sync-by-unito bot commented Jul 16, 2024

Should adjust the db schema and add new column there + UI

Coloc data in here gs://zz-red/pipeline/resources/R12_coloc/colocQC.tsv.gz

The main new column we want to show is PP.H4.abf and in the ui the table should be sorted by that by default.

┆Issue is synchronized with this Wrike task by Unito
┆Attachments: columns_mapping.xlsx

Copy link
Author

sync-by-unito bot commented Sep 30, 2024

➤ Anastasia Kytölä commented:

It looks like the new data is in different format compared to the previous one (example: gs://r12-data/colocalization/release/formatted_v1/fg_r12_ukbb_ppp.txt.gz ) meaning that I cannot use the established workflow for importing data as-is. Currently, data should follow finngen common data model format in order to be imported into the colocalization/causal_variant tables in SQL instance by pheweb colocalization cli so that the models can further be used by Pheweb backend. Please, Mitja Kurki advise what should be done. To me re-formatting of the data sounds like the easiest approach compared to large updates of the finngen common data model and pheweb backend, but maybe there is some other solution that I don’t see. 

This is the error I get when using the pheweb colocalization cli for data import:

AssertionError: header expected '['source1', 'source2', 'pheno1', 'pheno1_description', 'pheno2', 'pheno2_description', 'quant1', 'quant2', 'tissue1', 'tissue2', 'locus_id1', 'locus_id2', 'chrom', 'start', 'stop', 'clpp', 'clpa', 'vars', 'len_cs1', 'len_cs2', 'len_inter', 'vars1_info', 'vars2_info', 'source2_displayname', 'beta1', 'beta2', 'pval1', 'pval2']' 

got '['dataset1', 'dataset2', 'trait1', 'trait2', 'region1', 'region2', 'cs1', 'cs2', 'nsnps', 'hit1', 'hit2', 'PP.H0.abf', 'PP.H1.abf', 'PP.H2.abf', 'PP.H3.abf', 'PP.H4.abf', 'low_purity1', 'low_purity2', 'nsnps1', 'nsnps2', 'cs1_log10bf', 'cs2_log10bf', 'csj1_log10bf', 'csj2_log10bf', 'clpp', 'clpa', 'cs1_size', 'cs2_size', 'cs_overlap', 'topInOverlap', 'hit1_info', 'hit2_info', 'colocRes']'

Looks like some of these columns can be just renamed or reformatted slightly, but also some of them seem to be missing completely (and of course even with modifying the file accordingly, I will still have to update finngen common data model / pheweb backend in order to include new PP.H4.abf).

Links:

Copy link
Author

sync-by-unito bot commented Oct 1, 2024

➤ Mitja Kurki commented:

Anastasia Kytölä yes reformat probably would be a good idea and adding a column. Could you give me example rows of both and do suggested mappings and I will fill the rest ..

Copy link
Author

sync-by-unito bot commented Oct 2, 2024

➤ Anastasia Kytölä commented:

Mitja Kurki I used the documentation from the previous release gs://finngen-production-library-green/finngen_R12/finngen_R12_analysis_data/colocalization/data_dictionary.txt - I marked the columns that seem to be mapping and those that don't. I think all of the columns from the prev format are required by the finngen common data model + an additional source2_displayname that we use for prettier source name representation. There is a lot of extra columns in the new data and they might still be used and it would be easier to map them if there was a similar documentation for the new data as well.

Copy link
Author

sync-by-unito bot commented Oct 3, 2024

➤ Mitja Kurki commented:

Anastasia Kytölä the new format description is here. https://finngen.gitbook.io/finngen-handbook/working-in-the-sandbox/running-analyses-in-sandbox/how-to-run-colocalization-pipeline

Copy link
Author

sync-by-unito bot commented Oct 3, 2024

➤ Mitja Kurki commented:

quant and tissue need to be parsed from dataset2 field. e.g. Alasoo_2018--macrophage_IFNg--exon--eQTL_Catalogue. Datasource= Alasoo_2018--eQTLcatalog, tissue2= macrophage-ifn-g, quant2=exon,  
Alasoo_2018--macrophage_IFNg--exon--eQTL_Catalogue

Copy link
Author

sync-by-unito bot commented Oct 3, 2024

➤ Mitja Kurki commented:

there can be less fields also separated by double dash e.g. fg endpoitns have 2 source + type

Copy link
Author

sync-by-unito bot commented Oct 3, 2024

➤ Mitja Kurki commented:

Can you enumerate all possible combos of dataset fields for checking

Copy link
Author

sync-by-unito bot commented Oct 3, 2024

➤ Mitja Kurki commented:

start/stop need to be inferred from intersection of these regions.. region1 region2 
so just report region that is overlapping in those

Copy link
Author

sync-by-unito bot commented Oct 3, 2024

➤ Mitja Kurki commented:

vars info.  Hit1;NA;2 fields from hitx_info

Copy link
Author

sync-by-unito bot commented Oct 3, 2024

➤ Mitja Kurki commented:

vars leave empty

Copy link
Author

sync-by-unito bot commented Oct 3, 2024

➤ Mitja Kurki commented:

see above Anastasia Kytölä We have team leader meetings almost all day but there is a lunch break 11.30-12.30. We are in Biomedicum 1 (3rd floor), meeting room 5-6. We could have a chat outside that at 11.30 ?

Copy link
Author

sync-by-unito bot commented Oct 4, 2024

➤ Anastasia Kytölä commented:

OK Mitja Kurki
Thanks for the specifications

Copy link
Author

sync-by-unito bot commented Oct 4, 2024

➤ Anastasia Kytölä commented:

Region1 & region2 from which we have to deduce chrom,start,stop have chromosome ranges chr1-23,chrX - confused with that a bit. Is there a reason why some of the entries have chrX and some chr23? Otherwise, will rename chrX -> chr23.

Copy link
Author

sync-by-unito bot commented Oct 4, 2024

➤ Anastasia Kytölä commented:

What do these postfixes mean in the endpoint name: EXMORE, EXALLC, ALLW (needed for constructing pheno description column)? Saw something like this in the old data:

  • EXMORE - (more control exclusions) 
  • EXALLC - (controls excluding all cancers)
  • ALLW - ???

Copy link
Author

sync-by-unito bot commented Oct 7, 2024

➤ Anastasia Kytölä commented:

I guess "ALLW" stands for "all women as controls"?

Copy link
Author

sync-by-unito bot commented Oct 8, 2024

➤ Anastasia Kytölä commented:

It seems that we don't have enough values for construction of varsN_info columns: in the prev data we have VAR_ID,PIP,BETA values listing all variants from the credible sets and separated by semicolon. In the new data we can use columns hitN and hitN_info for getting VAR_ID and BETA but we don’t have the PIP, we have P-VALUE of the top variant instead. Columns varsN_info are used to generate causal_variant table which is shown on a couple of pages in PheWeb. I will put NAs to the PIP values for now and will document this.

Copy link
Author

sync-by-unito bot commented Oct 8, 2024

➤ Anastasia Kytölä commented:

Mitja Kurki

Copy link
Author

sync-by-unito bot commented Oct 8, 2024

➤ Mitja Kurki commented:

Yea NA for niw is good!

Copy link
Author

sync-by-unito bot commented Oct 10, 2024

➤ Anastasia Kytölä commented:

Mitja KurkiArto Lehisto

Reformatted the data and imported it to the temporary database analysis_r12_v1 in the production-releases-pheweb-database Cloud SQL instance. Made 2 PRs for updates:

Next steps would be:

  1. Once the PRs are merged, update PheWeb with the new docker image and updated configs (one small addition to the sources in the colocalizationSourceTypes config).
  2. Move temporary colocalization tables from database analysis_r12_v1 to the current R12 database analysis_r12. 

One issue that I had in the past is that I couldn't update colocalization/causal_variant tables directly while they were being used by the PheWeb. This is the workaround that I did:

  • First, I reconfigured PheWeb to use temporary analysis_r12_v1 in colocalizationDAO and restarted the PheWeb;
  • Then, I renamed previous colocalization/causal_variant tables to _backup;
  • Then, I copied new coloc data from the temp datablse to the analysis_r12 db;
  • Once the data was imported,I reconfigured PheWeb colocalization DAO to use the updated analysis_r12 database and removed temp database.

Here is full documentation for the updates: gs://bucket-anastasia/pheweb/colocalization/r12/new_colocs_04092024/readme.md (phewas-development project).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

0 participants