Skip to content

Data Blending and Interpolation

Bren Moushall edited this page Aug 25, 2020 · 7 revisions

Data Blending and Interpolation

Updated 25/08/2020

Updates and extensions have been made to the Point Blending workflow, which is explained below.

Workflow

The workflow allows for multiple datasets to be combined and then interpolated over a grid, which can then be plotted and also output as GIS files or GMT compatible data files. The entry point for the workflow is moho_workflow, which takes a JSON config file as the only argument.

Configuration

An example config is available, shown below:

  {
      "methods":
      [
          {
              "name": "ccp_1",
              "data": "sandbox/PV-227/ccp_line_data_sample1.csv",
              "weight": 1.0,
              "scale_length": 0.2
          },
          {
              "name": "ccp_2",
              "data": "sandbox/PV-227/ccp_line_data_sample2.csv",
              "weight": 1.0,
              "scale_length": 0.5
          }
      ],
      "plotting":
      {
          "output_plot": true,
          "plot_parameters":
          {
              "scale": [10.0, 60.0],
              "format": "png",
              "show": true,
              "title": "Moho depth from blended data",
              "cb_label": "Moho depth (km)"
          },
          "output_gmt": true,
          "output_gis": true
      },
      "bounds": [130.0, -20.0, 140.0, -17.0],
      "grid_interval": 0.25,
      "output_directory":  "./output"
  }
  • methods: A list of inversion/stacking/etc. methods, defining the data and parameters for blending and gridding. Each method is stored as a dictionary.

    • name: The name of the method, which is required and must be unique.
    • data: The data file. See the section below on 'Data Format' for more information.
    • weight: The overall weight of the method. This is multiplied by each sample weight to produce a total relative weight for each sample. See 1 for more information about weighting.
    • scale_length: The scale length variable for the spatial spread algorithm, in degrees. See 1 for more information about the spatial spread algorithm.
  • plotting: A dictionary containing plotting parameters. This is optional and if not included no plots or map data will be produced.

    • output_plot: A boolean flag. If true, then a plot of the grid and vector map will be produced. This figure will be saved as plot.{format} in the output directory.

    • plot_parameters: Parameters used for generating the plot. All of these parameters are optional.

      • scale: The minimum and maximum values to define the scale of the colormap, used when plotting the grid. If not provided, the values will be dervied from the minimum and maximum of the grid data.
      • format: Output format of the plot, must be compatible with matplotlib. By default, 'png' is used.
      • show: A boolean flag. Whether or not to display the plot as part of the workflow. If true, the plot will be displayed after generation, useful for verification and debugging purposes.
      • title: The title of the plot. Defaults to 'Moho depth from blended data'.
      • cb_label: Label for the grid colorbar. Defaults to 'Moho depth (km)'.
    • output_gmt: A boolean flag. Whether or not to produce GMT compatible data files. See section below on GMT mapping for more information.

    • output_gis: A boolean flag. Whether or not to produce GIS data. See section below on GIS mapping for more information.

  • bounds: A bounding box of format [min lon, min lat, max lon, max lat]. This defines the extent of the grid to be interpolated and the extent of plots and maps produced. If not provided, the extent is derived from the minimum and maximum bounds of the aggregate datasets.

  • grid_interval: Required. The grid interval in degrees. E.g. 0.25 will interpolate the data to a grid with cell size 0.25 x 0.25 degrees.

  • output_directory: Directory to contain the output. If not provided, the current working directory is used.

python moho_workflow.py config_moho_workflow_example.json will run the workflow with the configured parameters.

File Format

The method data files must be in a specific format, as below:

# Sta,Lon,Lat,Depth,Weight
OA.BS24,133.036,-19.4734,1.0

Note that coordinates must be in lat/lon. The interpolation algorithm relies on ellipsoidal distance calculations and requires a WGS84 CRS to operate.

Each row contains a sample to be blended. It is up to the user to ensure their data is in this format. Some conversion scripts exist in the sandbox folder. These show examples of converting data stored in Excel spreadsheets and whitespace delimited text files.

The sample weight column is required. If your raw data does not contain sample weights, a helper script exists to add these weights. This conversion helper can be passed a CSV file in format Sta,Lon,Lat,Depth and will add a weights column.

By default, all weights will be 1.0, which is useful if neutral weighting is desired. Otherwise, a second file containing weights can be provided, of format:

STA WEIGHT

Each row is a station name with a desired sample weight.

This can be executed using python conversion_helper.py data.csv --weights-file weights.txt. Every reading from the stations listed in the text file will be set to the specified weight. Unlisted stations will have their weight set to 1.0.

GIS Data

If enabled, GIS data will be produced and stored in the output directory under gis_data. A singleband geotiff of the grid is produced. A multiband geotiff of the vector map is produced, with the bands containing U and V components respectively. This can be rendered as a vector field using ArcGIS' vector field renderer or equivalent function in your GIS software. Each method used will also produce a shapefile containg the locations of samples used, along with other information such as station name and weight.

GMT Data

If enabled, GMT compatible data files will be produced and stored in the output directory under gmt_data. A grid.txt file of format LON LAT DEPTH is produced, which can be converted to a NetCDF grid using gmt xyz2grd and plotted using gmt grdimage. A gradient.txt file of format LON LAT ANGLE MAGNITUDE can be used to plot the vector field using gmt psxy -Sv. For each method used, a text file of format LON LAT TOTAL_WEIGHT is produced, which can be plotted using gmt psxy with the weight used as symbol size or other differentiator.

Gridding other data

This workflow is developed around Moho depth, but any data can be gridded, so long as the data files follow the format of:

# Sta,Lon,Lat,Value,Weight

Corrections

The workflow supports applying corrections and other preprocessing to data used. Currently CCP correction by H-K values is supported. Config example:

 {
      "data_preperation":
      [
          {
              "data": "/home/bren/data_passive/MOHO/hk_ccp_correction/ccp_weights.csv",
              "correction_data": "/home/bren/data_passive/MOHO/hk_ccp_correction/hk_weights.csv",
              "correction_func": "ccp_correction",
              "output_file": "/home/bren/data_passive/MOHO/hk_ccp_correction/ccp_corrected.csv"
          }
      ],
      "methods":
      ...
}

Adding this block will correct the CCP readings by H-k readings by applying the difference between the H-k median value and CCP median value to the CCP readings for each station.

The corrected data (specified by output_file) can then be blended and plotted by providing it as the data for a method.

Developer notes

The moho_config module contains constants for the configuration keys and config validation. If the config schema is modified, provide the new key a constant and add it to the relevant SUPPORTED_KEYS list so it can pass validation. It's recommended to validate the parameter's type, check if it exists if required etc., in the validate function.

If further correction or preprocessing functions are added, add a mapping of the correction_func parameter to the correction function to CORR_FUNC_MAP.

Original documentation

Note this has been superceded, but is preserved to help understand the original implementation.

Point Dataset Blending and Plotting

This section deals with the issue of handling multiple, disjointed spatial datasets in such a way that they can be combined into a single, plottable dataset.

The source data format for the datasets in question is assumed to be unknown or arbitrary, and the user takes responsibility for any bespoke code required to read and convert them into a standard point data format. Examples of such bespoke scripts can be found under this sandbox folder.

The "standard point data format" referred to above is the starting point for this workflow. As long as various different bespoke source data can be converted into this simple format, then this workflow can applied to combine them together into a unified dataset and plot them on common axes (assuming they have a common metric to plot). The standard point data format is a simple text file format in table layout (having rows and columns) where each column is a variable, and each row is a sample, and each sample refers to a particular (longitude, latitude) location. The first two columns need to be the (longitude, latitude) coordinates respectively, like so:

# Lon,Lat,Depth
133.035951,-19.473353,37.9
133.006100,-20.003900,45.9
132.997000,-20.486800,40.8
132.991205,-20.997177,47.3
132.989100,-21.506900,nan
133.494690,-18.994886,41.9
133.505800,-19.505400,43.7
133.501410,-20.002186,45.2

As shown in this example, the first row should have a header comment specifying the column names. For reference, numpy.savetxt applied to a 2D numpy array can readily generate this format. The value 'nan' can be used to represent floating point Not A Number.

Blending Workflow

The purpose of this workflow is to blend multiple point datasets, each with unique measurement locations, together onto a common, uniform grid in a defined cartesian projected coordinate reference system (CRS). The method used is the weighted Gaussian method of Kennett (2019), and is implemented in script pointsets2grid.py. The default projected CRS is EPSG code 3395.

The workflow is illustrated in the following schematic: As shown in the diagram, script pointsets2grid.py takes a configuration file and an output file name as required command line options.

The fields specified in the JSON configuration file are documented by example in config_pts2grid_example.json. The collection of source files to blend, and the blending parameters (see Kennett (2019) for explanation of parameters), are defined by the source_files key. The gridding projected CRS is given by the proj_crs_code key, and the grid spacing in km is specified by the output_spacing_km key.

The overall bounding box of the unified data in the projected CRS is determined automatically according to the union of the input datasets. Note that even though the gridding is done on a uniform grid in the projected CRS, the results are transformed back to (longitude, latitude) for final output. This saves the projected CRS code from having to be saved with the output file and allows the plotting step of the workflow to use a different projected CRS if it so wishes.

The first 2 lines of the output file state the gridded point array dimensions in horizontal and vertical directions, followed by CSV data. This output file format generated by pointsets2grid.py is immediately ready for ingestion into the plotting script plot_spatial_map.py.

Note that currently the implementation of blending ignores the point-wise weighting factors wkd from eq.(2.1), effectively treating them as 1, as a simplification. The individual dataset weighting factors wd are supported.

Plotting Workflow

The data file generated by the above blending workflow can be plotted on a map using script plot_spatial_map.py. Under the hood, these plots are based on cartopy. The workflow is illustrated schematically as follows: The following inputs are required to plot_spatial_map.py. To see details of command line arguments by running plot_spatial_map.py --help:

  • EPSG code for projected CRS, e.g. 3577 is a good default for continental Australia
  • Input file name
  • [OPTIONAL]: Output file name (requires .png or .pdf file name extension). If not provided, plot will be displayed but not saved to file.