Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Column name handling: start/end #96

Open
bschilder opened this issue Mar 22, 2022 · 8 comments
Open

Column name handling: start/end #96

bschilder opened this issue Mar 22, 2022 · 8 comments
Labels
enhancement New feature or request

Comments

@bschilder
Copy link
Collaborator

bschilder commented Mar 22, 2022

Currently, we don't have start/end in the mapping file. This isn't a problem for 1bp-wide features like SNPs (in the strict sense of the term), but if you include indels/structural variants you can have both start/end columns spanning some range.

Might we want to come up with some way to rename these such that it's compatible with the full pipeline? i.e. we probably want to keep "start" as "BP" since this is a cornerstone of how MSS works. But perhaps when synonyms of "end" occur, we can rename that something like "BP2".

Mappings that come to mind:

--> "BP"

  • "pos1"
  • "position 1"
  • "position1"
  • "start"
  • "start position"
  • "position start"
  • "start pos"
  • "pos start"
  • "bp1"
  • "bp start"
  • "start bp"
  • "begin"

--> "BP2"

  • "pos2"
  • "position 2"
  • "position2"
  • "end"
  • "end position"
  • "position end"
  • "end pos"
  • "pos end"
  • "bp2"
  • "bp end"
  • "end bp
@bschilder bschilder added the enhancement New feature or request label Mar 22, 2022
@bschilder
Copy link
Collaborator Author

Oh also, other minor edits to the CHR mapping that was updated recently. By adding:

--> "CHR"

  • "seqs"
  • "seqname"
  • "CHROMS"

@Al-Murphy
Copy link
Owner

This makes sense, we should probably only do it if the Indel parameter is set to true? Next time one of us is making changes to the dev branch it will probably be worth testing the effect of this. Do you kknow of any downstream software that uses start & end? What names do they require? Just want to make sure BP2 will be understandable to users and downstream applications

@Al-Murphy
Copy link
Owner

Yep happy to add the extra CHR mappings but put them in as upper case as all inputted headers are pushed toupper() anwyay (this doesn't really matter since the same is done to the entries in the mapping file but just so they are the same as what's there currently)

@bschilder
Copy link
Collaborator Author

bschilder commented Mar 23, 2022

This makes sense, we should probably only do it if the Indel parameter is set to true?

I think this makes sense currently given that MSS only covers SNPs/indels currently (not larger SVs).

SVs are a bit more complicated, but you can run GWAS with them very similar to the way you would run one with SNPs only. My former labmate did this in AD so if we eventually decide to go in that direction it would be great to get his input. @ricardovialle would you mind giving some initial thoughts on this?

Next time one of us is making changes to the dev branch it will probably be worth testing the effect of this.

Definitely!

Do you kknow of any downstream software that uses start & end? What names do they require? Just want to make sure BP2 will be understandable to users and downstream applications

That's a good point, I picked BP2 for brevity but the typical nomenclature is "start"/"end" when dealing with ranged data (based on GenomicRanges). So maybe "BPEND" would be closer? That seems less obvious though when it's all uppercase.

That said, anyone who wants to do downstream analysis with ranged data is likely going to convert toGRanges anyways, which fortunately is already an export options for MSS. So as long as MSS has a way of converting back and forth from data.table to GRanges format, that should cover most use cases. One other nice thing would be add BED as one of the write formats. That way, it can automatically be read in as a GRanges object by tools like rtracklayer::import.

Regarding the analysis softwares, I suppose it depends on how you want to use your sumstats. One that comes to mind is goshifter. It takes a set of ranged annotations and tests for enrichment against a list of non-ranged SNPs. In theory, you could input ranged annotations derived from a GWAS with SNPs/indels and run enrichment against a list of SNPs from another GWAS that only have single-bp SNPs (or some other source).

Yep happy to add the extra CHR mappings but put them in as upper case as all inputted headers are pushed toupper() anwyay (this doesn't really matter since the same is done to the entries in the mapping file but just so they are the same as what's there currently)

Makes sense!

@Al-Murphy
Copy link
Owner

I actually had a few other concerns around checks on this with the reference datasets. Probably worth discussing in person before we commit to adding the functionality (and maybe including Nathan too)

@ricardovialle
Copy link

SVs are a bit more complicated, but you can run GWAS with them very similar to the way you would run one with SNPs only. My former labmate did this in AD so if we eventually decide to go in that direction it would be great to get his input. @ricardovialle would you mind giving some initial thoughts on this?

Hi guys, this definitely would be something nice to have. I'm not aware of a specific standard for reporting SVs summary stats. Even the VCFs created by SV callers sometimes do not follow the same specification. The VCF format specification usually expect an END field. In our results, we reported pos and sv_end (link). Also, you might want to consider CIPOS and CIEND as many SVs have imprecise breakpoints.

@bschilder
Copy link
Collaborator Author

Thanks so much for the input @ricardovialle! Alan and I will chat about all this in our next meeting and we'll let you know the plan here.

@Al-Murphy
Copy link
Owner

Added extra CHR mappings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants