-
Notifications
You must be signed in to change notification settings - Fork 4
Schema
The principle kind of data held in a XAFS Data Library is XAFS Spectra. Additionally, several other kinds of data can be useful to include, such as for sample preparation, measurement conditions, and so on. Of course, a key feature of a multi-spectral database is to be able to combine several spectra into a Suite of Spectra, and also to identify the people adding data to a library. Thus the XAFS Data Library contains the following main tables:
Table Name | Description |
---|---|
spectra | main XAS spectra, pointers to other tables |
sample | Samples |
crystal_structure | Crystal structures |
person | People |
citation | Literature or Other Citations |
format | Data Formats |
suite | Spectra Suites |
facility | Facilities |
beamline | Beamlines |
monochromator | Monochromators |
mode | Modes of Data Collection |
ligand | Ligands |
element | names of Elements |
edge | names of x-ray Edges |
energy_units | units for energies stored for a spectra |
info | General Information, version etc |
While some of these tables (spectra, sample) are fairly complex, many of the tables are really quite simple, holding a few pieces of information.
In addition there are a few Join Tables to tie together information and allow Many-to-One and Many-to-Many relations. These tables include
Table Name | Description |
---|---|
spectra_mode | mode(s) used for a particular spectra |
spectra_ligand | ligand(s) present in a particular spectra |
spectra_suite | spectra contained in a suite |
spectra_rating | People's comments and scores for Spectra |
suite_rating | People's comments and scores for Suites |
A key feature of this layout is that a Suite is very light-weight, simply comprising lists of spectra. Multiple suites can contain an individual spectra, and a particular spectra can be contained in multiple suites without repeated data.
The tables are described in more detail below. While many are straightforward, a few tables may need further explanation.
This is the principle table for the entire database, and needs extensive discussion. Several of the thorniest issues have to be dealt with in this table, making this likely to be the place where most attention and discussion should probably be focused.
--
create table spectra (
id integer primary key
name text not null,
notes text,
attributes text,
file_link text,
data_energy text,
data_i0 text default '[1.0]',
data_itrans text default '[1.0]',
data_iemit text default '[1.0]',
data_irefer text default '[1.0]',
data_dtime_corr text default '[1.0]',
calc_mu_trans text default '-log(itrans/i0)',
calc_mu_emit text default '(iemit*dtime_corr/i0)',
calc_mu_refer text default '-log(irefer/itrans)',
notes_i0 text,
notes_itrans text,
notes_iemit text,
notes_irefer text,
temperature text,
submission_date datetime,
collection_date datetime,
reference_used integer,
energy_units_id -- > energy_units table
monochromator_id -- > monochromator table
person_id -- > person table
edge_id -- > edge table
element_z -- > element table
sample_id -- > sample table
beamline_id -- > beamline table
format_id -- > format table
citation_id -- > citatione table
reference_id -- > sample table (for sample used as reference
);
We'll discuss the table entries more by grouping several of them together. First, Each entry in the spectra table contains links to many other tables.
spectra Column Name | Description |
---|---|
energy_units_id | index of energy_units table |
person_id | index of person table for person donating spectra |
edge_id | index of edge table for X-ray Edge |
element_z | index of element table for absorbing element |
sample_id | index of sample table, describing the sample |
reference_id | index of sample table, describing the reference sample |
beamline_id | index of the beamline where data was collected |
monochromator_id | index of the monochromator table for mono used |
format_id | index of the format table for data format used |
citation_id | index of the citation table for literature citation |
Next, the table contains ancillary information (you may ask why some of these are explicit while others are allowed to be put in the attributes field).
spectra Column Name | Description |
---|---|
notes | any notes on data |
attributes | JSON-encoded hash table of extra attributes |
temperature | Sample temperature during measurement |
submission_date | date of submission |
reference_used | Boolean (0=False, 1=True) of whether a Reference was used |
file_link | link to external file |
Here, reference_used* means whether data was also measured in the reference channel for additional energy calibration . If 1 (True), the reference sample must be given. The file_link entry would be the file and path name for an external file. This must be relative to the directory containing database file itself, and cannot be an absolute path. It may be possible to include URLs, ....
Finally, we have the information for internally stored data arrays themselves
spectra Column Name | Description | Default |
---|---|---|
data_energy | JSON data for energy | -- |
data_i0 | JSON data for I_0 (Monitor) | 1.0 |
data_itrans | JSON data for I_transmission (I_1) | 1.0 |
data_iemit | JSON data for I_emisssion (fluorescence, electron yield) | 1.0 |
data_irefer | JSON data for I_trans for reference channel | 1.0 |
data_dtime_corr | JSON data for Multiplicative Deadtime Correction for I_emit | 1.0 |
calc_mu_trans | calculation for mu_transmission | -log(dat_itrans/dat_i0) |
calc_mu_emit | calculation for mu_emission | dat_iemit * dat_dtime_corr / dat_i0 |
calc_mu_refer | calculation for mu_reference | -log(dat_irefer/dat_itrans) |
calc_energy_ev | calculation to convert energy to eV | None |
notes_energy | notes on energy | |
notes_i0 | notes on dat_i0 | |
notes_itrans | notes on dat_itrans | |
notes_iemit | notes on dat_iemit | |
notes_irefer | notes on dat_irefer |
The data*****_ entries will be JSON encoded strings of the array data. The calculations will be covered in more detail below. Note that the spectra_mode table below will be used to determine in which modes the data is recorded.
As alluded to above, the data*****_ will be stored as JSON-encoded strings.
The calculations of mu in the various modes (transmission, fluorescence) are generally well defined, but it is possible to override them by explicitly documenting the expression used to calculate mu. Note that this expression should not be expected to be fully and correctly evaluated by the database -- it is meant for human reading.
There are several distinct conventions for storing the data for energy values from monochromators in the XAFS community. As most beamlines select energy with a double-crystal monochromator, many beamlines store "raw motor positions" or steps for the monochromator angle. Some beamlines store the angle in degrees, and ome store energy in eV or keV. If the details of the monochromator crystal (d-spacing and motor steps per angular unit) are known, then all of these data can be inter-converted.
A possible advantage of saving angular information instead of energy in eV is that energy re-calibration may be easier in some cases. This generally assumes that re-calibration would involve changing an angular offset or monochromator d-spacing, each of which is difficult to tell from a single spectra without additional information about the system. In any event, saving all the "meta data" about the monochromator (d spacing used to convert angle to energy, and possibly steps per angular unit) is recommended.
To use XAFS data, one generally wants the energy units to be eV or keV. Thus, in all cases, knowing how to convert the measured "energy" data, whether it be in steps, degrees, or keV into eV is necessary to use the data.
The energy_units table simply holds conventional names for energy units:
create table energy_units (
id INTEGER NOT NULL,
units TEXT NOT NULL,
notes TEXT,
attributes TEXT,
PRIMARY KEY (id),
UNIQUE (units)
);
and will typically be initialized to have these entries:
energy_units units | Notes |
---|---|
eV | electron Volts |
keV | kilo-electron Volts |
degrees | angle in degrees for a Bragg monochromator. Need mono d-spacing!! |
step | angular steps for a Bragg monochromator. Need mono d-spacing and steps_per_degree!! |
more entries can be added as needed.
The monochromator table will hold the details of d-spacing and steps per degree for any particular monochromator.
Since each spectra has an entry for which energy_units and monochromator are used, the data in a Spectra's energy column should be convertible to eV.
The Sample table holds basic information about the sample measured. Each Spectra has a pointer to a Sample entry, and of course one Sample may be measured several times.
-- sample information
create table sample (
id integer not null, -- primary key
name text not null, -- must be unique!
notes text,
attributes text,
formula text,
material_source text,
person_id integer, --> person table
crystal_structure_id integer, --> crystal_structure table
);
Note that the material_source and formula information here are plain text. There are also pointers to the Person providing (or documenting) the Sample, and possibly to the Crystal Structure.
The Crystal Structure table is meant to hold crystallographic information about a Sample. The idea is that the contents of a file describing the crystal structure (CIFS, PDB format, atoms.inp for example) can be held here. These are plain text values, meant for human reading.
-- crystal information (example format = CIFS , PDB, atoms.inp)
create table crystal (
id integer primary key , -- primary key
format text not null,
data text not null,
notes text,
attributes text);
The Ligand table holds names of ligands that might be present in a Spectra.
create table ligand (
id integer primary key,
name text,
notes text);
create table spectra_ligand (
id integer primary key,
ligand integer not null, --> ligand table
spectra integer not null); --> spectra table
The Person table holds names and email addresses of people who enter Spectra and Samples, and who make Suites and enter ratings.
create table person (
id integer primary key ,
email text not null unique,
first_name text not null,
last_name text not null,
sha_password text not null);
create table citation (
id integer primary key ,
journal text,
authors text,
title text,
volume text,
pages text,
year text,
notes text,
attributes text,
doi text);
-- spectra format: table of data formats
--
-- name='internal-json' means data is stored as json data in spectra table
--
create table format (
id integer primary key,
name text,
notes text,
attributes text);
insert into format (name, notes) values ('internal-json', 'Read dat_*** columns of spectra table as json');
-- Suite: collection of spectra
create table suite (
id integer primary key ,
person integer not null, -- > person table
name text not null,
notes text,
attributes text);
-- SUITE_SPECTRA: Join table for suite and spectra
create table spectra_suite (
id integer primary key ,
suite integer not null, -- > suite table
spectra integer not null); -- > spectra table
A rating is a numerical score given to a Spectra or a Suite of Spectra by a particular person. Each score can also be accompanied by a comment.
While not enforced within the database itself, the scoring convention should be Amazon Scoring: a scale of 1 to 5, with 5 being best.
create table rating (
id integer primary key ,
person integer not null, -- > person table
spectra integer, -- > spectra table
suite integer, -- > suite table
score integer,
comments text);
These two tables simply list standard monochromator types and data collection modes.
-- Monochromator descriptions
create table monochomator (
id integer primary key,
name text,
lattice_constant text,
steps_per_degree text,
notes text,
attributes text);
-- XAS collection modes ('transmission', 'fluorescence', ...)
create table mode (
id integer primary key,
name text,
notes text);
insert into mode (name, notes) values ('transmission',
'transmission intensity through sample');
insert into mode (name, notes) values ('fluorescence, total yield',
'total x-ray fluorescence intensity, as measured with ion chamber');
insert into mode (name, notes) values ('fluorescence, energy analyzed',
'x-ray fluorescence measured with an energy dispersive (solid-state) detector. These measurements will often need to be corrected for dead-time effects');
insert into mode (name, notes) values ('electron emission', 'emitted electrons from sample');
insert into mode (name, notes) values ('xeol', 'visible or uv light emission');
create table spectra_modes (
id integer primary key ,
mode integer not null, -- > collection_mode
spectra integer not null); -- > spectra table
These two tables list X-ray (synchrotron) facilities and particular beamlines.
-- beamline description
-- must have a facility
-- a single, physical beamline can be represented many times for different configurations
create table beamline (
id integer primary key ,
facility integer not null, --> facility table
name text,
xray_source text,
monochromator integer, -- > monochromator table (optional)
notes text,
attributes text);
-- facilities
create table facility (
id integer primary key,
name text not null unique,
notes text,
attributes text);
Note that a monochromator is optional for a beamline.
These two tables simply list standard symbols and names of the elements of the periodic table, and the standard names for the x-ray absorption edges. The schema are
create table element (z integer primary key,
symbol text not null unique,
name text);
insert into element (z, symbol, name) values (1, 'H', 'hydrogen');
insert into element (z, symbol, name) values (2, 'He', 'helium');
create table edge (id integer primary key,
name text not null unique,
level text);
insert into edge (name, level) values ('K', '1s');
insert into edge (name, level) values ('L3', '2p3/2');
insert into edge (name, level) values ('L2', '2p1/2');
insert into edge (name, level) values ('L1', '2s');