ADD new colocalization results #524

sync-by-unito · 2024-07-16T17:13:39Z

Should adjust the db schema and add new column there + UI

Coloc data in here gs://zz-red/pipeline/resources/R12_coloc/colocQC.tsv.gz

The main new column we want to show is PP.H4.abf and in the ui the table should be sorted by that by default.

┆Issue is synchronized with this Wrike task by Unito
┆Attachments: columns_mapping.xlsx

sync-by-unito · 2024-09-30T11:18:58Z

➤ Anastasia Kytölä commented:

It looks like the new data is in different format compared to the previous one (example: gs://r12-data/colocalization/release/formatted_v1/fg_r12_ukbb_ppp.txt.gz ) meaning that I cannot use the established workflow for importing data as-is. Currently, data should follow finngen common data model format in order to be imported into the colocalization/causal_variant tables in SQL instance by pheweb colocalization cli so that the models can further be used by Pheweb backend. Please, Mitja Kurki advise what should be done. To me re-formatting of the data sounds like the easiest approach compared to large updates of the finngen common data model and pheweb backend, but maybe there is some other solution that I don’t see.

This is the error I get when using the pheweb colocalization cli for data import:

AssertionError: header expected '['source1', 'source2', 'pheno1', 'pheno1_description', 'pheno2', 'pheno2_description', 'quant1', 'quant2', 'tissue1', 'tissue2', 'locus_id1', 'locus_id2', 'chrom', 'start', 'stop', 'clpp', 'clpa', 'vars', 'len_cs1', 'len_cs2', 'len_inter', 'vars1_info', 'vars2_info', 'source2_displayname', 'beta1', 'beta2', 'pval1', 'pval2']' 

got '['dataset1', 'dataset2', 'trait1', 'trait2', 'region1', 'region2', 'cs1', 'cs2', 'nsnps', 'hit1', 'hit2', 'PP.H0.abf', 'PP.H1.abf', 'PP.H2.abf', 'PP.H3.abf', 'PP.H4.abf', 'low_purity1', 'low_purity2', 'nsnps1', 'nsnps2', 'cs1_log10bf', 'cs2_log10bf', 'csj1_log10bf', 'csj2_log10bf', 'clpp', 'clpa', 'cs1_size', 'cs2_size', 'cs_overlap', 'topInOverlap', 'hit1_info', 'hit2_info', 'colocRes']'

Looks like some of these columns can be just renamed or reformatted slightly, but also some of them seem to be missing completely (and of course even with modifying the file accordingly, I will still have to update finngen common data model / pheweb backend in order to include new PP.H4.abf).

Links:

CLI for generating colocalization database: https://github.com/FINNGEN/pheweb-colocalization/blob/main/pheweb_colocalization/cli.py
Finngen common data model: https://github.com/FINNGEN/pheweb-colocalization/blob/main/finngen_common_data_model/colocalization.py
Pheweb_colocalization model: https://github.com/FINNGEN/pheweb-colocalization/blob/main/pheweb_colocalization/model_db.py
Model imports to pheweb: https://github.com/FINNGEN/pheweb/blob/master/setup.py#L49 , https://github.com/FINNGEN/pheweb/blob/master/pheweb/serve/server.py#L41

sync-by-unito · 2024-10-01T08:46:03Z

➤ Mitja Kurki commented:

Anastasia Kytölä yes reformat probably would be a good idea and adding a column. Could you give me example rows of both and do suggested mappings and I will fill the rest ..

sync-by-unito · 2024-10-02T07:29:14Z

➤ Anastasia Kytölä commented:

Mitja Kurki I used the documentation from the previous release gs://finngen-production-library-green/finngen_R12/finngen_R12_analysis_data/colocalization/data_dictionary.txt - I marked the columns that seem to be mapping and those that don't. I think all of the columns from the prev format are required by the finngen common data model + an additional source2_displayname that we use for prettier source name representation. There is a lot of extra columns in the new data and they might still be used and it would be easier to map them if there was a similar documentation for the new data as well.

sync-by-unito · 2024-10-03T10:58:02Z

➤ Mitja Kurki commented:

Anastasia Kytölä the new format description is here. https://finngen.gitbook.io/finngen-handbook/working-in-the-sandbox/running-analyses-in-sandbox/how-to-run-colocalization-pipeline

sync-by-unito · 2024-10-03T14:21:52Z

➤ Mitja Kurki commented:

quant and tissue need to be parsed from dataset2 field. e.g. Alasoo_2018--macrophage_IFNg--exon--eQTL_Catalogue. Datasource= Alasoo_2018--eQTLcatalog, tissue2= macrophage-ifn-g, quant2=exon,
Alasoo_2018--macrophage_IFNg--exon--eQTL_Catalogue

sync-by-unito · 2024-10-03T14:22:14Z

➤ Mitja Kurki commented:

there can be less fields also separated by double dash e.g. fg endpoitns have 2 source + type

sync-by-unito · 2024-10-03T14:23:19Z

➤ Mitja Kurki commented:

Can you enumerate all possible combos of dataset fields for checking

sync-by-unito · 2024-10-03T14:23:19Z

➤ Mitja Kurki commented:

start/stop need to be inferred from intersection of these regions.. region1 region2
so just report region that is overlapping in those

sync-by-unito · 2024-10-03T14:23:47Z

➤ Mitja Kurki commented:

vars info. Hit1;NA;2 fields from hitx_info

sync-by-unito · 2024-10-03T14:23:48Z

➤ Mitja Kurki commented:

vars leave empty

sync-by-unito · 2024-10-03T14:32:48Z

➤ Mitja Kurki commented:

see above Anastasia Kytölä We have team leader meetings almost all day but there is a lunch break 11.30-12.30. We are in Biomedicum 1 (3rd floor), meeting room 5-6. We could have a chat outside that at 11.30 ?

sync-by-unito · 2024-10-04T06:41:45Z

➤ Anastasia Kytölä commented:

OK Mitja Kurki
Thanks for the specifications

sync-by-unito · 2024-10-04T11:37:38Z

➤ Anastasia Kytölä commented:

Region1 & region2 from which we have to deduce chrom,start,stop have chromosome ranges chr1-23,chrX - confused with that a bit. Is there a reason why some of the entries have chrX and some chr23? Otherwise, will rename chrX -> chr23.

sync-by-unito · 2024-10-04T13:20:52Z

➤ Anastasia Kytölä commented:

What do these postfixes mean in the endpoint name: EXMORE, EXALLC, ALLW (needed for constructing pheno description column)? Saw something like this in the old data:

EXMORE - (more control exclusions)
EXALLC - (controls excluding all cancers)
ALLW - ???

sync-by-unito · 2024-10-07T10:10:45Z

➤ Anastasia Kytölä commented:

I guess "ALLW" stands for "all women as controls"?

sync-by-unito · 2024-10-08T11:57:52Z

➤ Anastasia Kytölä commented:

It seems that we don't have enough values for construction of varsN_info columns: in the prev data we have VAR_ID,PIP,BETA values listing all variants from the credible sets and separated by semicolon. In the new data we can use columns hitN and hitN_info for getting VAR_ID and BETA but we don’t have the PIP, we have P-VALUE of the top variant instead. Columns varsN_info are used to generate causal_variant table which is shown on a couple of pages in PheWeb. I will put NAs to the PIP values for now and will document this.

sync-by-unito · 2024-10-08T11:57:52Z

➤ Anastasia Kytölä commented:

Mitja Kurki

sync-by-unito · 2024-10-08T12:17:09Z

➤ Mitja Kurki commented:

Yea NA for niw is good!

sync-by-unito · 2024-10-10T11:03:56Z

➤ Anastasia Kytölä commented:

Mitja KurkiArto Lehisto

Reformatted the data and imported it to the temporary database analysis_r12_v1 in the production-releases-pheweb-database Cloud SQL instance. Made 2 PRs for updates:

PR related to FinnGen common data model update: Feature modify schema for new coloc data ak pheweb-colocalization#9
PR related to PheWeb update: Updated components to enable new coloc data #540

Next steps would be:

Once the PRs are merged, update PheWeb with the new docker image and updated configs (one small addition to the sources in the colocalizationSourceTypes config).
Move temporary colocalization tables from database analysis_r12_v1 to the current R12 database analysis_r12.

One issue that I had in the past is that I couldn't update colocalization/causal_variant tables directly while they were being used by the PheWeb. This is the workaround that I did:

First, I reconfigured PheWeb to use temporary analysis_r12_v1 in colocalizationDAO and restarted the PheWeb;
Then, I renamed previous colocalization/causal_variant tables to _backup;
Then, I copied new coloc data from the temp datablse to the analysis_r12 db;
Once the data was imported,I reconfigured PheWeb colocalization DAO to use the updated analysis_r12 database and removed temp database.

Here is full documentation for the updates: gs://bucket-anastasia/pheweb/colocalization/r12/new_colocs_04092024/readme.md (phewas-development project).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ADD new colocalization results #524

ADD new colocalization results #524

sync-by-unito bot commented Jul 16, 2024 •

edited

Loading

sync-by-unito bot commented Sep 30, 2024

sync-by-unito bot commented Oct 1, 2024

sync-by-unito bot commented Oct 2, 2024

sync-by-unito bot commented Oct 3, 2024

sync-by-unito bot commented Oct 3, 2024

sync-by-unito bot commented Oct 3, 2024

sync-by-unito bot commented Oct 3, 2024

sync-by-unito bot commented Oct 3, 2024

sync-by-unito bot commented Oct 3, 2024

sync-by-unito bot commented Oct 3, 2024

sync-by-unito bot commented Oct 3, 2024

sync-by-unito bot commented Oct 4, 2024

sync-by-unito bot commented Oct 4, 2024

sync-by-unito bot commented Oct 4, 2024

sync-by-unito bot commented Oct 7, 2024

sync-by-unito bot commented Oct 8, 2024

sync-by-unito bot commented Oct 8, 2024

sync-by-unito bot commented Oct 8, 2024

sync-by-unito bot commented Oct 10, 2024

ADD new colocalization results #524

ADD new colocalization results #524

Comments

sync-by-unito bot commented Jul 16, 2024 • edited Loading

sync-by-unito bot commented Sep 30, 2024

sync-by-unito bot commented Oct 1, 2024

sync-by-unito bot commented Oct 2, 2024

sync-by-unito bot commented Oct 3, 2024

sync-by-unito bot commented Oct 3, 2024

sync-by-unito bot commented Oct 3, 2024

sync-by-unito bot commented Oct 3, 2024

sync-by-unito bot commented Oct 3, 2024

sync-by-unito bot commented Oct 3, 2024

sync-by-unito bot commented Oct 3, 2024

sync-by-unito bot commented Oct 3, 2024

sync-by-unito bot commented Oct 4, 2024

sync-by-unito bot commented Oct 4, 2024

sync-by-unito bot commented Oct 4, 2024

sync-by-unito bot commented Oct 7, 2024

sync-by-unito bot commented Oct 8, 2024

sync-by-unito bot commented Oct 8, 2024

sync-by-unito bot commented Oct 8, 2024

sync-by-unito bot commented Oct 10, 2024

sync-by-unito bot commented Jul 16, 2024 •

edited

Loading