-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Column name handling: start/end #96
Comments
Oh also, other minor edits to the CHR mapping that was updated recently. By adding: --> "CHR"
|
This makes sense, we should probably only do it if the Indel parameter is set to true? Next time one of us is making changes to the dev branch it will probably be worth testing the effect of this. Do you kknow of any downstream software that uses start & end? What names do they require? Just want to make sure BP2 will be understandable to users and downstream applications |
Yep happy to add the extra CHR mappings but put them in as upper case as all inputted headers are pushed toupper() anwyay (this doesn't really matter since the same is done to the entries in the mapping file but just so they are the same as what's there currently) |
I think this makes sense currently given that MSS only covers SNPs/indels currently (not larger SVs). SVs are a bit more complicated, but you can run GWAS with them very similar to the way you would run one with SNPs only. My former labmate did this in AD so if we eventually decide to go in that direction it would be great to get his input. @ricardovialle would you mind giving some initial thoughts on this?
Definitely!
That's a good point, I picked BP2 for brevity but the typical nomenclature is "start"/"end" when dealing with ranged data (based on That said, anyone who wants to do downstream analysis with ranged data is likely going to convert to Regarding the analysis softwares, I suppose it depends on how you want to use your sumstats. One that comes to mind is
Makes sense! |
I actually had a few other concerns around checks on this with the reference datasets. Probably worth discussing in person before we commit to adding the functionality (and maybe including Nathan too) |
Hi guys, this definitely would be something nice to have. I'm not aware of a specific standard for reporting SVs summary stats. Even the VCFs created by SV callers sometimes do not follow the same specification. The VCF format specification usually expect an END field. In our results, we reported pos and sv_end (link). Also, you might want to consider CIPOS and CIEND as many SVs have imprecise breakpoints. |
Thanks so much for the input @ricardovialle! Alan and I will chat about all this in our next meeting and we'll let you know the plan here. |
Added extra CHR mappings |
Currently, we don't have start/end in the mapping file. This isn't a problem for 1bp-wide features like SNPs (in the strict sense of the term), but if you include indels/structural variants you can have both start/end columns spanning some range.
Might we want to come up with some way to rename these such that it's compatible with the full pipeline? i.e. we probably want to keep "start" as "BP" since this is a cornerstone of how MSS works. But perhaps when synonyms of "end" occur, we can rename that something like "BP2".
Mappings that come to mind:
--> "BP"
--> "BP2"
The text was updated successfully, but these errors were encountered: