Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building the DB from prefetched txt files #98

Open
harish0201 opened this issue May 8, 2019 · 5 comments
Open

Building the DB from prefetched txt files #98

harish0201 opened this issue May 8, 2019 · 5 comments

Comments

@harish0201
Copy link

Hi!

I was able to build a test package for few plants I'm working currently using the GFF files. Unfortunately as it is mentioned in the vignette, it has no other information relating to proteins, pathways, ontologies etc.

I did download the text files from ensembl plants for one of the genomes (a. thaliana) after which I'll be doing for some of the other plants for local use. I'm using the following command to build the local database and needless to say these are the text files retrieved from the mysql folder under the ftp.

DB<- makeEnsemblSQLiteFromTables(path="arabidopsis", dbname="a_thal")

Error in makeEnsemblSQLiteFromTables(path = "arabidopsis", dbname = "a_thal") :

Something went wrong! I'm missing some of the txt files the perl script should have generated.

The files are attached in the screenshot.
arabidopsis

What should I do so as to get the build to progress? I'm curious to know if I'm doing something wrong.

This is going to be stupid, but do I substitute the link somewhere in order to fetch the db internally from the link: http://mysql-eg-publicsql.ebi.ac.uk/

@jorainer
Copy link
Owner

jorainer commented May 8, 2019

Hi @harish0201 !

Probably I was not clear in the vignette, but in order to use the makeEnsemblSQLiteFromTables you would need to first extract the corresponding data from an ensembl database using the fetchTablesFromEnsembl function (that in turn uses the Ensembl Perl API to extract the data).

You have now two possibilities:

  1. the hard way: install the Ensembl Perl API (https://www.ensembl.org/info/docs/api/core/core_tutorial.html) locally on your computer and use the fetchTablesFromEnsembl to create the database.
  2. the easy way: tell me which species and Ensembl/Ensemblgenome version you need and I will build the EnsDb database for you.

@harish0201
Copy link
Author

harish0201 commented May 9, 2019

Ah, thank you. I'm planning on doing it the hard way because I don't want to pester you again and again. And I might as well learn something new :)

Currently I'm planning on building the database for Vigna radiata. I'll also be working on the transcriptomes of many other plants from ensembl so I'd rather do it here than hoping a miracle from your side.

Is this link valid though? http://mysql-eg-publicsql.ebi.ac.uk/ or do I need to substitute something there?

Would it be possible to build the sqlite db from the fetched txt files if they are functionally the same? Because then instead for waiting for the api calls to go through/fail, I can probably automate the downloads from my side and then just build the databases.

I did try the following:

`fetchTablesFromEnsembl(43,user="anonymous",host="ftp://ftp.ensemblgenomes.org/pub/release-43/plants/mysql/", pass="",port=4157, species="arabidopsis_thaliana_core_43_96_11")'

The submitted Ensembl version (43) does not match the version of the Ensembl API (96). Please configure the environment variable ENS to point to the correct API. at /home/harish/R/x86_64-pc-linux-gnu-library/3.4/ensembldb/perl/get_gene_transcript_exon_tables.pl line 101.`

Edit: (In hindsight, I should have searched for this beforehand: http://ensemblgenomes.org/info/access/mysql)
However, this does work:
fetchTablesFromEnsembl(96, species = "arabidopsis thaliana", host="mysql-eg-publicsql.ebi.ac.uk", port=4157)

In the mean time I'll figure out a way to build this.

Thanks for the help!

@jorainer
Copy link
Owner

jorainer commented May 9, 2019

Ah, thank you. I'm planning on doing it the hard way because I don't want to pester you again and again. And I might as well learn something new :)

Very brave! Just keep me updated!

Is this link valid though? http://mysql-eg-publicsql.ebi.ac.uk/ or do I need to substitute something there?

Honestly, I don't know what the public database for ensemblgenomes is - but definitely without the http.

Would it be possible to build the sqlite db from the fetched txt files if they are functionally the same? Because then instead for waiting for the api calls to go through/fail, I can probably automate the downloads from my side and then just build the databases.

There is a possibility - actually that's the way how I do it - there are some functions in inst/scripts of the installed package (or see here https://github.com/jorainer/ensembldb/blob/master/inst/scripts/generate-EnsDBs.R). What you need for that is: a local mysql server (5.6, or better mariadb 10.0 - higher versions won't work) to which you need write access. You could then use the createEnsDbForSpecies function. This function will download the mysql database dump for a species from Ensembl, import the database to your local mysql server and then use the ensembldb tools to create the EnsDb SQLite database.

I'd suggest you try it first with something from Ensembl, like

createEnsDbForSpecies(ens_version = 96, species = "mus_musculus", user = <your local mysql user>, pass = <your local mysel pass>, host = <your local host running the mysql server, e.g. "localhost">)

For ensemblgenomes you would have to specify the ftp_folder, and I guess the ens_version would then be the ensemblgenome release number.

@harish0201
Copy link
Author

Ah well,

I've taken to using it as such:

perl /home/harish/R/x86_64-pc-linux-gnu-library/3.4/ensembldb/perl/get_gene_transcript_exon_tables.pl -e 96 -H mysql-eg-publicsql.ebi.ac.uk -p 4157 -U anonymous -s "vigna_radiata" &

And it seems to be working currently. IDK if its supposed to slow, because I've got some downloads going on as well :)

But looking at the dumps the perl scripts seems to be generating, I'd gather that the same can be done using Biomart as well, so I'm looking at the alternatives.

fetchTablesFromEnsembl(96, species = "vigna radiata", host="mysql-eg-publicsql.ebi.ac.uk", port=4157)

But it's definitely the api version as opposed to the ensembl release version, which is what I had thought initially, but other than that it works!

@jorainer
Copy link
Owner

jorainer commented May 9, 2019

Regarding speed: yes, it is slow. I had the impression that it is faster when I downloaded the mysql dumps locally and ran the code locally.

Regarding biomart - I don't know if you can get all the data from there. Biomart and ensembl are different databases and not everything what is in ensembl does necesserily also have to be in Biomart. I prefer to use the Ensembl perl API that queries the original Ensembl databases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants