Building the DB from prefetched txt files #98

harish0201 · 2019-05-08T12:44:17Z

Hi!

I was able to build a test package for few plants I'm working currently using the GFF files. Unfortunately as it is mentioned in the vignette, it has no other information relating to proteins, pathways, ontologies etc.

I did download the text files from ensembl plants for one of the genomes (a. thaliana) after which I'll be doing for some of the other plants for local use. I'm using the following command to build the local database and needless to say these are the text files retrieved from the mysql folder under the ftp.

DB<- makeEnsemblSQLiteFromTables(path="arabidopsis", dbname="a_thal")

Error in makeEnsemblSQLiteFromTables(path = "arabidopsis", dbname = "a_thal") :

Something went wrong! I'm missing some of the txt files the perl script should have generated.

The files are attached in the screenshot.

What should I do so as to get the build to progress? I'm curious to know if I'm doing something wrong.

This is going to be stupid, but do I substitute the link somewhere in order to fetch the db internally from the link: http://mysql-eg-publicsql.ebi.ac.uk/

The text was updated successfully, but these errors were encountered:

jorainer · 2019-05-08T16:42:30Z

Hi @harish0201 !

Probably I was not clear in the vignette, but in order to use the makeEnsemblSQLiteFromTables you would need to first extract the corresponding data from an ensembl database using the fetchTablesFromEnsembl function (that in turn uses the Ensembl Perl API to extract the data).

You have now two possibilities:

the hard way: install the Ensembl Perl API (https://www.ensembl.org/info/docs/api/core/core_tutorial.html) locally on your computer and use the fetchTablesFromEnsembl to create the database.
the easy way: tell me which species and Ensembl/Ensemblgenome version you need and I will build the EnsDb database for you.

harish0201 · 2019-05-09T04:34:12Z

Ah, thank you. I'm planning on doing it the hard way because I don't want to pester you again and again. And I might as well learn something new :)

Currently I'm planning on building the database for Vigna radiata. I'll also be working on the transcriptomes of many other plants from ensembl so I'd rather do it here than hoping a miracle from your side.

Is this link valid though? http://mysql-eg-publicsql.ebi.ac.uk/ or do I need to substitute something there?

Would it be possible to build the sqlite db from the fetched txt files if they are functionally the same? Because then instead for waiting for the api calls to go through/fail, I can probably automate the downloads from my side and then just build the databases.

I did try the following:

`fetchTablesFromEnsembl(43,user="anonymous",host="ftp://ftp.ensemblgenomes.org/pub/release-43/plants/mysql/", pass="",port=4157, species="arabidopsis_thaliana_core_43_96_11")'

The submitted Ensembl version (43) does not match the version of the Ensembl API (96). Please configure the environment variable ENS to point to the correct API. at /home/harish/R/x86_64-pc-linux-gnu-library/3.4/ensembldb/perl/get_gene_transcript_exon_tables.pl line 101.`

Edit: (In hindsight, I should have searched for this beforehand: http://ensemblgenomes.org/info/access/mysql)
However, this does work:
fetchTablesFromEnsembl(96, species = "arabidopsis thaliana", host="mysql-eg-publicsql.ebi.ac.uk", port=4157)

In the mean time I'll figure out a way to build this.

Thanks for the help!

jorainer · 2019-05-09T06:59:09Z

Ah, thank you. I'm planning on doing it the hard way because I don't want to pester you again and again. And I might as well learn something new :)

Very brave! Just keep me updated!

Is this link valid though? http://mysql-eg-publicsql.ebi.ac.uk/ or do I need to substitute something there?

Honestly, I don't know what the public database for ensemblgenomes is - but definitely without the http.

Would it be possible to build the sqlite db from the fetched txt files if they are functionally the same? Because then instead for waiting for the api calls to go through/fail, I can probably automate the downloads from my side and then just build the databases.

There is a possibility - actually that's the way how I do it - there are some functions in inst/scripts of the installed package (or see here https://github.com/jorainer/ensembldb/blob/master/inst/scripts/generate-EnsDBs.R). What you need for that is: a local mysql server (5.6, or better mariadb 10.0 - higher versions won't work) to which you need write access. You could then use the createEnsDbForSpecies function. This function will download the mysql database dump for a species from Ensembl, import the database to your local mysql server and then use the ensembldb tools to create the EnsDb SQLite database.

I'd suggest you try it first with something from Ensembl, like

createEnsDbForSpecies(ens_version = 96, species = "mus_musculus", user = <your local mysql user>, pass = <your local mysel pass>, host = <your local host running the mysql server, e.g. "localhost">)

For ensemblgenomes you would have to specify the ftp_folder, and I guess the ens_version would then be the ensemblgenome release number.

harish0201 · 2019-05-09T11:33:40Z

Ah well,

I've taken to using it as such:

perl /home/harish/R/x86_64-pc-linux-gnu-library/3.4/ensembldb/perl/get_gene_transcript_exon_tables.pl -e 96 -H mysql-eg-publicsql.ebi.ac.uk -p 4157 -U anonymous -s "vigna_radiata" &

And it seems to be working currently. IDK if its supposed to slow, because I've got some downloads going on as well :)

But looking at the dumps the perl scripts seems to be generating, I'd gather that the same can be done using Biomart as well, so I'm looking at the alternatives.

fetchTablesFromEnsembl(96, species = "vigna radiata", host="mysql-eg-publicsql.ebi.ac.uk", port=4157)

But it's definitely the api version as opposed to the ensembl release version, which is what I had thought initially, but other than that it works!

jorainer · 2019-05-09T12:40:41Z

Regarding speed: yes, it is slow. I had the impression that it is faster when I downloaded the mysql dumps locally and ran the code locally.

Regarding biomart - I don't know if you can get all the data from there. Biomart and ensembl are different databases and not everything what is in ensembl does necesserily also have to be in Biomart. I prefer to use the Ensembl perl API that queries the original Ensembl databases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Building the DB from prefetched txt files #98

Building the DB from prefetched txt files #98

harish0201 commented May 8, 2019

jorainer commented May 8, 2019

harish0201 commented May 9, 2019 •

edited

Loading

jorainer commented May 9, 2019

harish0201 commented May 9, 2019

jorainer commented May 9, 2019

Building the DB from prefetched txt files #98

Building the DB from prefetched txt files #98

Comments

harish0201 commented May 8, 2019

jorainer commented May 8, 2019

harish0201 commented May 9, 2019 • edited Loading

jorainer commented May 9, 2019

harish0201 commented May 9, 2019

jorainer commented May 9, 2019

harish0201 commented May 9, 2019 •

edited

Loading