Welcome to the Database Infrastructure for Mass Spectrometry project. This project is the result of work from the National Institute of Standards and Technology's Material Measurement Laboratory, Chemical Sciences Division. We seek to provide a comprehensive portable database toolkit supporting non-targeted analysis of high resolution mass spectrometry experiments for exposure-based analyte targets (e.g. per- and polyfluorinated alkyl substances (PFAS)) including descriptive metadata for analytical instrument method, quality analysis, and samples. If you would like to get involved, or just to keep track of the project, please give this repository a watch or star, or send an email to [email protected] to receive updates.
2024 February (@jmr-nist-gov)
A video tutorial series is now available for DIMSpec, discussing download and setup, file conversion to .mzML, and using the MSMatch application.
2024 February (@jmr-nist-gov)
Minor changes to the quick install guide were made to clarify some language, especially in regards to what is actually required versus recommended versus suggested, and under which circumstances those apply.
A bug was fixed in the molecule_picture
function where invalid filenames were produced from InChI (and other) strings. Invalid filename characters are now substituted with descriptive characters for these; the result is that filenames no longer match 1:1 with molecular notation in many cases, though most SMILES strings should remain intact. Also, use of the show
argument should be more intuitive and will now display the resulting picture in the system viewer.
These changes will be included in the next release, but can be downloaded directly from the current repository.
2024 January (@jmr-nist-gov)
The DIMSpec project was featured as part of the SERDP Webinar Series on December 7, 2024. A recording of that webinar, the first half of which is dedicated to DIMSpec is now available.
Older news items (click to expand)
2023 December (@jmr-nist-gov)
This update provides quality of life improvements and minor bug fixes in MSMatch, and supports certain functionality issues related to package versioning when installed on R v4.3 as of Nov 2023. If you are running with R v4.1 and certain package combinations, you may run into an issue with logging and receive a console message regarding `log_formatter`. If so, turn off logging by setting `LOGGING_ON <- FALSE` in the `config/env_log.txt` file or update your packages. Furthermore, this update (a) fixes certain instances with alert messages failing to render, (b) fixes a rare issue with uncertainty calculation inheriting NaN values, (c) adds support for advanced settings on the match uncertainty evaluation tool, and (d) fixes the location of alert messages which could occasionally run past the bottom of the browser.2023 July (@jmr-nist-gov)
DIMSpec has been updated to its first release candidate version. Changes include schema tightening for annotated fragments and PFAS data updates including consistency updates to analyte nomenclature including aliases, and other minor bug fixes.In analytical chemistry, the objective of non-targeted analysis (NTA) is to detect and identify unknown (generally organic) compounds using a combination of advanced analytical instrumentation (e.g. high-resolution mass spectrometry) and computational tools. For NTA using mass spectrometry, the use of reference libraries containing fragmentation mass spectra of known compounds is essential to successfully identifying unknown compounds in complex mixtures. However, due to the diversity of vendors of mass spectrometers and mass spectrometry software, it is difficult to easily share mass spectral data sets between laboratories using different instrument vendor software packages while maintaining the quality and detail of complex data and metadata that makes the mass spectra commutable and useful. Additionally, this diversity can also alter fragmentation patterns as instrument engineering and method settings can differ between analyses.
This report describes a set of tools developed in the NIST Chemical Sciences Division to provide a database infrastructure for the management and use of NTA data and associated metadata. In addition, as part of a NIST-wide effort to make data more Findable, Accessible, Interoperable, and Reusable (FAIR), the database and affiliated tools were designed using only open-source resources that can be easily shared and reused by researchers within and outside of NIST. The information provided in this report includes guidance for the setup, population, and use of the database and its affiliated analysis tools. This effort has been primarily supported by the Department of Defense Strategic Environmental Research and Development Program (DOD-SERDP), project number ER20-1056. As that project focuses on per- and polyfluoroalkyl substances (PFAS), DIMSpec is distributed with mass spectra including compounds on the NIST Suspect List of Possible PFAS as collected using the Non-Targeted Analysis Method Reporting Tool.
- Portable and reusable database infrastructure for linking sample and method details to high resolution mass spectrometry data.
- Easily extendable schema for new data extensions or views.
- Open source from inception to delivery using only R, python, and SQLite.
- Application programming interface (API) support using the plumber framework.
- Web applications for exploration and data processing, including a template web application to quickly build new GUI functionality using the shiny framework.
- Development support through flexible logging and function argument validation frameworks.
- Includes curated high resolution mass spectra for 132 per- and polyfluorinated alkyl substances from over 100 samples using ESI-, ESI+, and APCI- detection methods (as of 2023-03-16). The DIMSpec for PFAS database is provided here as an example, and is published on the NIST Public Data Repository at https://doi.org/10.18434/mds2-2905. If you use the DIMSpec for PFAS database, please cite both this repository and that file.
While the only hard requirement for using DIMSpec is R version 4.1 or later (packages will be installed as part of the installation compliance script, though users on Windows systems should also install RTools), to get the most out of DIMSpec users may want to include other software such as (but in no way limited to):
- Java (with bit architecture matching that of R)
- MSConvert >= 3.0.21050 (from ProteoWizard)
- SQLite >= v3.32.0
- Mini/Anaconda w/ Python >= 3.8 (if not already installed, R will install it as part of the compliance script, though advanced users may want to explicitly install this themselves)
Note: As of the December 2023 release, use of R v4.3 is encouraged as support for older versions of R will sunset in 2024.
To get started in most cases from a blank slate:
- Ensure R v4.1+ is installed (download)
- Download the project by forking this repository or downloading the zip file.
- If using Windows, ensure RTools (download) matching your R version is installed to build certain packages.
- Run the compliance script, which should install everything needed for the project.
- The easiest way is to load the project using RStudio (download).
- Open RStudio and click "File" > "Open Project..." and navigate to the location where you downloaded the project.
- Either open the file at "R/compliance.R" from the "Files" pane and click the "Source" button or enter the command
source(file.path("R", "compliance"))
in the console pane.
- If not using RStudio, open an R terminal at the project directory (or
setwd(file.path("path", "to", "project")
) and enter the commandsource(file.path("R", "compliance"))
. - The first installation typically takes around half an hour from start to finish, depending on the speed of your internet connection and computer.
- The easiest way is to load the project using RStudio (download).
A quick guide is available describing the install process.
For evaluation and distribution purposes, DIMSpec is distributed with a populated database of per- and polyfluorinated alkyl substances (PFAS), but supporting functionality is present to easily create new databases. This enables DIMSpec to support multiple efforts simultaneously as research needs require.
For a full description of the project and its different aspects, please see the DIMSpec User Guide.
A series of Quick Guides have been made available focusing on various aspects of the project.
- DIMSpec Quick Guide - Installation
- DIMSpec Quick Guide - Plumber
- DIMSpec Quick Guide - Web Applications
- DIMSpec Quick Guide - Advanced Use
- DIMSpec Quick Guide - Importing Data
- File Conversion using msconvert
In addition, a series of short video tutorials are available discussing certain topics.
- Download and installation
- mzML conversion of instrument data files
- Import files and process on MSMatch
- Library searching and data mining
- Fragmenation searching and data mining
Several links can provide additional contextual information about this project. If any of the resource links below are broken, please report them so we may address it. The user guide is also available in running DIMSpec sessions using the user_guide()
function which will load a local version of the user guide if the web version is unavailable or your computer is offline.
- PFAS Program at the US National Institute of Standards and Technology
- DoD SERDP Progam Project ER20-1056
- NIST Suspect List of Possible PFAS
- NIST Method Reporting Tool for Non-Targeted Analysis (NTA MRT)
- Database Infrastructure for Mass Spectrometry - Per- and Polyfluoroalkyl Substances
If you have any issues with any portion of the repository, please feel free to contact the NIST PFAS program at [email protected] directly or post an issue in the repository itself.
The main contributors to this project from NIST were members of the Material Measurement Laboratory's Chemical Sciences Division:
- Jared M. Ragland (@jmr-nist-gov) (email) (staff page) (Chemical Informatics Group)
- Benjamin J. Place (@benjaminplace) (email) (staff page) (Organic Chemical Metrology Group)
NIST projects are provided as a public service, and we always appreciate feedback and contributions. If you have a contribution, feel free to fork this project, open a PR, or start a discussion. The authors hope this effort spurs further innovations in the NTA open data space for environmental mass spectrometry.
Certain commercial equipment, instruments, software, or materials are identified in this documentation in order to specify the experimental procedure adequately. Such identification is not intended to imply recommendation or endorsement by the National Institute of Standards and Technology, nor is it intended to imply that the materials or equipment identified are necessarily the best available for the purpose.
This work is provided by NIST as a public service and is expressly provided "AS IS." Please see the license statement for details.
The work included in this repository has been funded in large part by the Department of Defense's Strategic Environmental Research and Development Program (SERDP), project number ER20-1056.