Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide MassBank as SQLite database #181

Open
jorainer opened this issue Sep 8, 2021 · 9 comments
Open

Provide MassBank as SQLite database #181

jorainer opened this issue Sep 8, 2021 · 9 comments

Comments

@jorainer
Copy link

jorainer commented Sep 8, 2021

For some users a MySQL database might be a little to demanding (see also sneumann/xcms#534). Would be nice to provide the MassBank data in addition as an SQLite database. This would be ~ straight forward:

Convert the MySQL dump to SQLite SQL calls using mysql2sqlite

./mysql2sqlite MassBank.sql | sqlite3 MassBank-2021.03.db

The only problem is that the views are not correctly converted (actually, seems mysql2sqlite ignores them completely). To fix that one could simply insert the views later using:

sqlite3 MassBank-2021.03.db < views.sql 

Where the views.sql file is simply a file containing the create view... calls from the 01-init-massbank.sql script.

With this SQLite database it would be super-easy to use the MassBank data in R:

library(MsBackendMassbank)
library(RSQLite)
con <- dbConnect(SQLite(), "MassBank-2021.03.db")
sps <- Spectra(con, source = MsBackendMassbankSql())
sps
MSn data (Spectra) with 86576 spectra in a MsBackendMassbankSql backend:
        msLevel precursorMz  polarity
      <integer>   <numeric> <integer>
1             2      179.07         1
2             2      179.07         1
...         ...         ...       ...
86575        NA          NA         0
86576        NA          NA         0
 ... 42 more variables/columns.
 Use  'spectraVariables' to list all of them.

so, one would have full access to MassBank (cc @tsufz ).

@YANGJJ93research
Copy link

Dear @jorainer, I was wondering if where is the massbank sql data from? Is it the compilation of all spectra as provided on the massbank web page? I am asking this for the sake of confidence in my target identification.

@tsufz
Copy link
Member

tsufz commented Oct 9, 2021

Hey @YangjjMSresearch, you can find the massbank.sql at our GitHub site. It is available beginning with database version 2020.11. We usually announce new releases by Twitter or the MassBank Europe website or you can watch the MassBank data repository.

Best wishes,
Tobias

@YANGJJ93research
Copy link

@tsufz Noted with many thanks! I noticed that there are around 80000 spectra founded from the massbank.sql database. From the MoNA website, I see there are 196,159 spectra inside. Is this massbank.sql repository different from the MoNA repository?

Best regards,
Junjie

@tsufz
Copy link
Member

tsufz commented Oct 11, 2021

@YangjjMSresearch. I reviewed MassBank of America, They actually provide 73 k MassBank Europe records and in total 175 k GC -and LC-records. MoNA and MassBank hold different datasets. MassBank Europe records are only a part of it among GNPS, HMDB, ReSPECT and others. Thus, the contents of MassBank Europe and MoNA are different. I am also not sure about the update frequency of MoNA.

The structure of the sql files are also different. We provide the dump of our internal database. MoNA provides a dump of their database. @jorainer may explain, it if the MoNA sql files are also usuable.

If you want to use MassBank Europe records only, use our databases, please. They guarantee reproducibility, as we provide versioned releases. In addition, you may use MoNA sql files containing different other libraries as appropriate.

Best wishes,
Tobias

@jorainer
Copy link
Author

I've never looked into MoNa sql files - I was even not aware that they provide their data as SQL. Maybe I have a look into that someday

@meowcat
Copy link

meowcat commented Jun 2, 2022

Hi all, I still think it would be great to serve the SQLite that can be used with RforMassSpectrometry directly, otherwise the users have to do the conversion themselves...

@jorainer
Copy link
Author

jorainer commented Jun 2, 2022

Agree @meowcat ! note that I have one pre-build database here: https://github.com/jorainer/SpectraTutorials/releases/tag/2021.03 . I've also included a super-simple short function to MsBackendMassbank that extracts the databases tables from the MySQL MassBank and stores it into an SQLite database: https://github.com/rformassspectrometry/MsBackendMassbank/blob/master/inst/scripts/massbank-to-sqlite.R

My other plan is (if I finally find the time) to create such MassBank SQLite versions (maybe as CompDb databases, because they use the same database layout) for each new MassBank and distribute them via Bioconductor's AnnotationHub. That would make it super-simple for users to search for and install any MassBank release.

@meowcat
Copy link

meowcat commented Jan 30, 2023

Hi, I made a small converter that takes MassBank dumps and converts to SQLite as well as mzVault formats. It's a docker and has no external requirements.

https://github.com/meowcat/MassBank-convert

Note that the mzVault converter collapses compound information by InChIKey. A further function which isn't working well collapses by 1D inchikey, but is deactivated in the config.

Can we add that to some CI that generates "best-effort community-contributed conversions"?

@sneumann
Copy link
Member

+1 on "best-effort community-contributed conversions", the OpenMS Team has an mzML converter mentioned in #31. Yours, Steffen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants