Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatibility between EnsDb objects and BSgenome objects #159

Open
mschubert opened this issue Aug 14, 2024 · 1 comment
Open

Compatibility between EnsDb objects and BSgenome objects #159

mschubert opened this issue Aug 14, 2024 · 1 comment

Comments

@mschubert
Copy link

I'm trying to use the VariantAnnotation package to annotate a VCF file from nf-core/sarek (UCSC style) using AnnotationHub EnsDb objects (Ensembl style), where I encountered the following issues:

  1. The VariantAnnotation package does not accept EnsDb objects. I raised this (Allow 'ANY' txdb, eg. EnsDb objects Bioconductor/VariantAnnotation#74), but it will probably not get fixed; I worked around this by providing my own S4 method in my code
  2. The seqlevelsStyles() do not match

For point (2), I can change the EnsDb style to UCSC:

ens106 = AnnotationHub::AnnotationHub()[["AH100643"]]
seqlevelsStyle(ens106) = "UCSC"

and then either load UCSC-style genome or change the Ensembl-style genome to UCSC:

asm = BSgenome.Hsapiens.NCBI.GRCh38::BSgenome.Hsapiens.NCBI.GRCh38
seqlevelsStyle(asm) = "UCSC"

asm = BSgenome.Hsapiens.UCSC.hg38::BSgenome.Hsapiens.UCSC.hg38

However, this does not work, because the genome() of the EnsDb object is GRCh38, while the one of the assembly is hg38, raising an assertion error in VariantAnnotation. So this needs an additional line changing the internal state of the S4 genome object (we can't change the EnsDb object):

genome(ens106)[] = "hg38"
# Error: unable to find an inherited method for function 'seqinfo<-' for signature 'x = "EnsDb"'

asm@seqinfo@genome[] = "GRCh38" # works, but is messing with internals
# this was previously the default, but they explicitly changed it

I'm not sure what a good solution is here. It seems to be that a check if they genomes are identical (as performed by VariantAnnotation) is reasonable. I'm raising this issue more to document it rather than suggesting a change in ensembldb.

@jorainer
Copy link
Owner

Thanks for reporting. Is there no way to create or load a BSgenome or other supported genome sequence object from AnnotationHub? There should also be the twobit files from Ensembl accessible through AnnotationHub - maybe these can be used? The advantage would be that you could ensure to use matching EnsDb and genome sequence from the same Ensembl release. I'm just a bit worried that hg38 is not exactly identical to the GRCh38 version used by Ensembl...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants