Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to use masking repeat file (.bed) in the reference genome #25

Open
sekhwal opened this issue Feb 7, 2022 · 4 comments
Open

how to use masking repeat file (.bed) in the reference genome #25

sekhwal opened this issue Feb 7, 2022 · 4 comments

Comments

@sekhwal
Copy link

sekhwal commented Feb 7, 2022

Hi,
I am working with my 70 assembled genomes to identify core SNPs. Therefore, I am looking to use masking repeat file .bed (generated from Mummer). Please let me know how to use the file in the analysis with the following command.

parsnp -g /ref/ref_genomic -d scaffolds/*.fasta -c

Thank you!

@gongyh
Copy link

gongyh commented Jul 21, 2023

I have similar issues. How to use soft/hard masked genomes?

@bkille
Copy link
Contributor

bkille commented Nov 16, 2023

Hi @sekhwal and @gongyh,

There is currently no way to mask files through Parsnp. However, you can provide Parsnp with soft-masked or hard-masked genomes.

@bkille bkille closed this as completed Nov 16, 2023
@gongyh
Copy link

gongyh commented Nov 17, 2023

Thanks for your reply. When i tested, soft-masked genomes are the same with unmasked ones. However, hard-masked bases will be identified as SNPs if not all the genomes are strictly and correctly hard-masked.

@bkille bkille reopened this Nov 17, 2023
@bkille
Copy link
Contributor

bkille commented Nov 17, 2023

Ahh I see, sorry for misunderstanding the issue.

In terms of the core-genome alignment:
Soft-masking the genomes won't impact the resulting core-genome alignment, however hard-masking might. If a hard-masked region exists between two anchors for the alignment, MUSCLE will likely align through the region. However, hard-masked regions cannot be selected as anchors.

In terms of the variants and resulting tree:
This is a good point, and there should be an option to only use SNPs if the reference allele is not hard-masked. I'll transfer this ticket to harvesttools, the program responsible for identifying the variants from the XMFA.

@bkille bkille transferred this issue from marbl/parsnp Nov 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants