Column name handling: start/end #96

bschilder · 2022-03-22T19:36:57Z

Currently, we don't have start/end in the mapping file. This isn't a problem for 1bp-wide features like SNPs (in the strict sense of the term), but if you include indels/structural variants you can have both start/end columns spanning some range.

Might we want to come up with some way to rename these such that it's compatible with the full pipeline? i.e. we probably want to keep "start" as "BP" since this is a cornerstone of how MSS works. But perhaps when synonyms of "end" occur, we can rename that something like "BP2".

Mappings that come to mind:

--> "BP"

"pos1"
"position 1"
"position1"
"start"
"start position"
"position start"
"start pos"
"pos start"
"bp1"
"bp start"
"start bp"
"begin"

--> "BP2"

"pos2"
"position 2"
"position2"
"end"
"end position"
"position end"
"end pos"
"pos end"
"bp2"
"bp end"
"end bp

bschilder · 2022-03-22T19:52:05Z

Oh also, other minor edits to the CHR mapping that was updated recently. By adding:

--> "CHR"

"seqs"
"seqname"
"CHROMS"

Al-Murphy · 2022-03-23T08:21:38Z

This makes sense, we should probably only do it if the Indel parameter is set to true? Next time one of us is making changes to the dev branch it will probably be worth testing the effect of this. Do you kknow of any downstream software that uses start & end? What names do they require? Just want to make sure BP2 will be understandable to users and downstream applications

Al-Murphy · 2022-03-23T08:23:26Z

Yep happy to add the extra CHR mappings but put them in as upper case as all inputted headers are pushed toupper() anwyay (this doesn't really matter since the same is done to the entries in the mapping file but just so they are the same as what's there currently)

bschilder · 2022-03-23T14:16:13Z

This makes sense, we should probably only do it if the Indel parameter is set to true?

I think this makes sense currently given that MSS only covers SNPs/indels currently (not larger SVs).

SVs are a bit more complicated, but you can run GWAS with them very similar to the way you would run one with SNPs only. My former labmate did this in AD so if we eventually decide to go in that direction it would be great to get his input. @ricardovialle would you mind giving some initial thoughts on this?

Next time one of us is making changes to the dev branch it will probably be worth testing the effect of this.

Definitely!

Do you kknow of any downstream software that uses start & end? What names do they require? Just want to make sure BP2 will be understandable to users and downstream applications

That's a good point, I picked BP2 for brevity but the typical nomenclature is "start"/"end" when dealing with ranged data (based on GenomicRanges). So maybe "BPEND" would be closer? That seems less obvious though when it's all uppercase.

That said, anyone who wants to do downstream analysis with ranged data is likely going to convert toGRanges anyways, which fortunately is already an export options for MSS. So as long as MSS has a way of converting back and forth from data.table to GRanges format, that should cover most use cases. One other nice thing would be add BED as one of the write formats. That way, it can automatically be read in as a GRanges object by tools like rtracklayer::import.

Regarding the analysis softwares, I suppose it depends on how you want to use your sumstats. One that comes to mind is goshifter. It takes a set of ranged annotations and tests for enrichment against a list of non-ranged SNPs. In theory, you could input ranged annotations derived from a GWAS with SNPs/indels and run enrichment against a list of SNPs from another GWAS that only have single-bp SNPs (or some other source).

Yep happy to add the extra CHR mappings but put them in as upper case as all inputted headers are pushed toupper() anwyay (this doesn't really matter since the same is done to the entries in the mapping file but just so they are the same as what's there currently)

Makes sense!

Al-Murphy · 2022-03-24T07:44:19Z

I actually had a few other concerns around checks on this with the reference datasets. Probably worth discussing in person before we commit to adding the functionality (and maybe including Nathan too)

ricardovialle · 2022-03-28T23:25:20Z

SVs are a bit more complicated, but you can run GWAS with them very similar to the way you would run one with SNPs only. My former labmate did this in AD so if we eventually decide to go in that direction it would be great to get his input. @ricardovialle would you mind giving some initial thoughts on this?

Hi guys, this definitely would be something nice to have. I'm not aware of a specific standard for reporting SVs summary stats. Even the VCFs created by SV callers sometimes do not follow the same specification. The VCF format specification usually expect an END field. In our results, we reported pos and sv_end (link). Also, you might want to consider CIPOS and CIEND as many SVs have imprecise breakpoints.

bschilder · 2022-03-29T09:34:23Z

Thanks so much for the input @ricardovialle! Alan and I will chat about all this in our next meeting and we'll let you know the plan here.

Al-Murphy · 2022-04-05T10:13:04Z

Added extra CHR mappings

bschilder added the enhancement New feature or request label Mar 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Column name handling: start/end #96

Column name handling: start/end #96

bschilder commented Mar 22, 2022 •

edited

Loading

bschilder commented Mar 22, 2022

Al-Murphy commented Mar 23, 2022

Al-Murphy commented Mar 23, 2022

bschilder commented Mar 23, 2022 •

edited

Loading

Al-Murphy commented Mar 24, 2022

ricardovialle commented Mar 28, 2022

bschilder commented Mar 29, 2022

Al-Murphy commented Apr 5, 2022

Column name handling: start/end #96

Column name handling: start/end #96

Comments

bschilder commented Mar 22, 2022 • edited Loading

--> "BP"

--> "BP2"

bschilder commented Mar 22, 2022

--> "CHR"

Al-Murphy commented Mar 23, 2022

Al-Murphy commented Mar 23, 2022

bschilder commented Mar 23, 2022 • edited Loading

Al-Murphy commented Mar 24, 2022

ricardovialle commented Mar 28, 2022

bschilder commented Mar 29, 2022

Al-Murphy commented Apr 5, 2022

bschilder commented Mar 22, 2022 •

edited

Loading

bschilder commented Mar 23, 2022 •

edited

Loading