From 4921e2e75b6f2d7ff9faaa2b8b742d2bb1a42f74 Mon Sep 17 00:00:00 2001 From: Kay Robbins <1189050+VisLab@users.noreply.github.com> Date: Sat, 13 Jul 2024 05:35:44 -0500 Subject: [PATCH 1/3] NWB tutorial --- docs/source/HedAnnotationInNWB.md | 121 ++++++++++++++++++++++++++++++ docs/source/index.rst | 1 + 2 files changed, 122 insertions(+) create mode 100644 docs/source/HedAnnotationInNWB.md diff --git a/docs/source/HedAnnotationInNWB.md b/docs/source/HedAnnotationInNWB.md new file mode 100644 index 0000000..da4d50d --- /dev/null +++ b/docs/source/HedAnnotationInNWB.md @@ -0,0 +1,121 @@ +# HED annotation in NWB (draft) + +[**Neurodata Without Borders (NWB)**](https://www.nwb.org/) is a data standard for organizing neurophysiology data. +NWB is used extensively as the data representation for single cell and animal recordings as well as +human neuroimaging modalities such as IEEG. HED (Hierarchical Event Descriptors) is a system of +standardized vocabularies and supporting tools that allows fine-grained annotation of data. + +Each NWB file (extension `.nwb`) is a self-contained (and hopefully complete) representation of +experimental data for a single experiment. +The file should contain all experimental stimuli, acquired data, and metadata synchronized to a global +timeline for the experiment. +See [**Intro to NWB**](https://nwb-overview.readthedocs.io/en/latest/intro_to_nwb/1_intro_to_nwb.html) +for a basic introduction to NWB. + +## The ndx-hed NWB extension +The [**ndx-hed**](https://github.com/hed-standard/ndx-hed) extension allows HED annotations to be added as a column to any NWB +[**DynamicTable**](https://hdmf-common-schema.readthedocs.io/en/stable/format.html#sec-dynamictable). +extends the NWB [**VectorData**](https://hdmf-common-schema.readthedocs.io/en/stable/format.html#sec-dynamictable) +class, allowing HED data to be added as a column to any NWB +[**DynamicTable**](https://hdmf-common-schema.readthedocs.io/en/stable/format.html#sec-dynamictable). +The `DynamicTable` class is the underlying base class for many data structures within NWB files, +and this extension allows HED annotations to be easily added to NWB. +See [**DynamicTable Tutorial**](https://hdmf.readthedocs.io/en/stable/tutorials/plot_dynamictable_tutorial.html#sphx-glr-tutorials-plot-dynamictable-tutorial-py) +for a basic guide for usage in Python and +[**DynamicTable Tutorial (MATLAB)**](https://neurodatawithoutborders.github.io/matnwb/tutorials/html/dynamic_tables.html) +for introductory usage in MATLAB. + +The class that implements the ndx-hed extension is called `HedTags`. +This class represents a column vector (`VectorData`) of HED tags and the version +of the HedSchema needed to validate the tags. +Valid +which + +## Example HED usage in NWB + +### HED as a standalone vector + +The `HedTags` class has two required argument (the `hed_version` and the `data`) and two optional arguments +(`name` and `description`). + +````{admonition} Create a HedTags object. +:class: tip + +```python +tags = HedTags(hed_version='8.3.0', data=["Correct-action", "Incorrect-action"]) +``` +```` +The result is a `VectorData` object whose data vector includes 2 elements. +Notice that data is a list with 2 values representing two distinct HED strings. +The values of these elements are validated using HED schema version 8.3.0 when `tags` is created. +If any of the tags had been invalid, the constructor would raise a `ValueError`. + +````{admonition} Add a row to an existing HED VectorData +:class: tip + +```python +tags.add_row("Sensory-event, Visual-presentation") +``` +After this `add_row` operation, `tags` has 3 elements. Notice that "Sensory-event, Visual-presentation" +is a single HED string, not two HED strings. + +```` + +### HED in a table + +The following color table uses HED tags to define the meanings of integer codes: +| color_code | HED | +|----- | --- | +| 1 | `Red` | +| 2 | `Green` | +| 3 | `Blue` | + +````{admonition} Create an NWB DynamicTable to represent the color table. +:class: tip + +```python + +color_nums = VectorData(name="color_code", description="Internal color codes", data=[1,2,3]) +color_tags = HedTags(name="HED", hed_version="8.2.0", data=["Red", "Green", "Blue"]) +color_table = DynamicTable( + name="colors", description="Experimental colors", columns=[color_num, color_tags]) +``` +```` +The example sets up a table with columns named `color_code` and `HED`. +Here are some common operations that you might want to perform on such a table: +````{admonition} +Get row 0 of `color_table` as a Pandas DataFrame: +```python +df = color_table[0] +``` +Append a row to `color_table`: +```python +color_table.add_row(color_code=4, HED="Black") +``` +```` +The `HED` +color_table = DynamicTable(name="colors", description="Color table for the experiment", + columns=[{"name"="code", "description"="Internal color codes", data=[1, 2, 3]}, + HedTags(name="HED", hed_version="8.3.0", data=["Red", "Green", "Blue"])] + +``` + + + +``` +## Installation + +## Implementation +The implementation is based around +### The HedTags class + + + +### In TrialTable + +### in EventTable + +file including the `TimeIntervals`, `Units`, `PlaneSegmentation`, and many of the +the `icephys` tables (`ExperimentalConditionsTable`, `IntracellularElectrodesTable`, +`IntracellularResponsesTable, `IntracellularStimuliTable`, `RepetitionsTable`, SequentialRecordingsTable`, +`SimultaneousRecordingsTable` and the `SweepTable`)(`TrialTable`, Epch \ No newline at end of file diff --git a/docs/source/index.rst b/docs/source/index.rst index 1359dd6..817e75d 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -116,6 +116,7 @@ Visit the `HED project homepage `_ for links to BidsAnnotationQuickstart.md HedAnnotationQuickstart.md + HedAnnotationInNWB.md HedValidationGuide.md HedSearchGuide.md HedSummaryGuide.md From 16e31edbb1756139f183cb6353cfa363199c8932 Mon Sep 17 00:00:00 2001 From: Kay Robbins <1189050+VisLab@users.noreply.github.com> Date: Tue, 16 Jul 2024 10:38:01 -0500 Subject: [PATCH 2/3] Updated the docs for HED and NWB --- docs/source/HedAnnotationInNWB.md | 179 +- docs/source/HedMatlabTools.md | 4 +- docs/source/HedOnlineTools.md | 4 +- docs/source/HedPythonTools.md | 261 +- ...ickstart.md => HedRemodelingQuickstart.md} | 990 +-- ...modelingTools.md => HedRemodelingTools.md} | 5592 ++++++++--------- docs/source/HedSearchGuide.md | 4 +- docs/source/HedSummaryGuide.md | 6 +- docs/source/HedValidationGuide.md | 6 +- docs/source/HowCanYouUseHed.md | 30 +- docs/source/UnderstandingHedVersions.md | 1 + docs/source/index.rst | 5 +- src/jupyter_notebooks/remodeling/README.md | 2 +- 13 files changed, 3483 insertions(+), 3601 deletions(-) rename docs/source/{FileRemodelingQuickstart.md => HedRemodelingQuickstart.md} (94%) rename docs/source/{FileRemodelingTools.md => HedRemodelingTools.md} (97%) create mode 100644 docs/source/UnderstandingHedVersions.md diff --git a/docs/source/HedAnnotationInNWB.md b/docs/source/HedAnnotationInNWB.md index da4d50d..a075fb5 100644 --- a/docs/source/HedAnnotationInNWB.md +++ b/docs/source/HedAnnotationInNWB.md @@ -4,39 +4,37 @@ NWB is used extensively as the data representation for single cell and animal recordings as well as human neuroimaging modalities such as IEEG. HED (Hierarchical Event Descriptors) is a system of standardized vocabularies and supporting tools that allows fine-grained annotation of data. +HED annotations can now be used in NWB. -Each NWB file (extension `.nwb`) is a self-contained (and hopefully complete) representation of -experimental data for a single experiment. -The file should contain all experimental stimuli, acquired data, and metadata synchronized to a global -timeline for the experiment. -See [**Intro to NWB**](https://nwb-overview.readthedocs.io/en/latest/intro_to_nwb/1_intro_to_nwb.html) -for a basic introduction to NWB. - -## The ndx-hed NWB extension -The [**ndx-hed**](https://github.com/hed-standard/ndx-hed) extension allows HED annotations to be added as a column to any NWB -[**DynamicTable**](https://hdmf-common-schema.readthedocs.io/en/stable/format.html#sec-dynamictable). -extends the NWB [**VectorData**](https://hdmf-common-schema.readthedocs.io/en/stable/format.html#sec-dynamictable) -class, allowing HED data to be added as a column to any NWB -[**DynamicTable**](https://hdmf-common-schema.readthedocs.io/en/stable/format.html#sec-dynamictable). -The `DynamicTable` class is the underlying base class for many data structures within NWB files, -and this extension allows HED annotations to be easily added to NWB. -See [**DynamicTable Tutorial**](https://hdmf.readthedocs.io/en/stable/tutorials/plot_dynamictable_tutorial.html#sphx-glr-tutorials-plot-dynamictable-tutorial-py) +A standardized HED vocabulary is referred to as a HED schema. +A single term in a HED vocabulary is called a HED tag. +A HED string consists of one or more HED tags separated by commas and possibly grouped using parentheses. + +The [**ndx-hed**](https://github.com/hed-standard/ndx-hed) extension consists of a `HedTags` class that extends +the NWB [**VectorData**](https://hdmf-common-schema.readthedocs.io/en/stable/format.html#sec-dynamictable) class, +allowing HED data to be added as a column to any NWB [**DynamicTable**](https://hdmf-common-schema.readthedocs.io/en/stable/format.html#sec-dynamictable). +`VectorData` and `DynamicTable` are base classes for many NWB data structures. +See the [**DynamicTable Tutorial**](https://hdmf.readthedocs.io/en/stable/tutorials/plot_dynamictable_tutorial.html#sphx-glr-tutorials-plot-dynamictable-tutorial-py) for a basic guide for usage in Python and [**DynamicTable Tutorial (MATLAB)**](https://neurodatawithoutborders.github.io/matnwb/tutorials/html/dynamic_tables.html) for introductory usage in MATLAB. +The `ndx-hed` extension is not currently supported in MATLAB, although support is planned in the future. + +## NWB ndx-hed installation -The class that implements the ndx-hed extension is called `HedTags`. -This class represents a column vector (`VectorData`) of HED tags and the version -of the HedSchema needed to validate the tags. -Valid -which +Should it be uploaded to PyPi? -## Example HED usage in NWB +## NWB ndx-hed examples -### HED as a standalone vector +### HedTags as a standalone vector -The `HedTags` class has two required argument (the `hed_version` and the `data`) and two optional arguments -(`name` and `description`). +The `HedTags` class has two required arguments (`hed_version` and `data`) and two optional arguments +(`name` and `description`). +The result of the following example is a `HedTags` object whose data vector includes 2 elements. +Notice that the `data` argument value is a list with 2 values representing two distinct HED strings. +These values are validated using HED schema version 8.3.0 when `tags` is created. +If any of the tags had been invalid, the constructor would have raised a `ValueError`. +The example uses the default column name (`HED`) and the default column description. ````{admonition} Create a HedTags object. :class: tip @@ -45,21 +43,31 @@ The `HedTags` class has two required argument (the `hed_version` and the `data`) tags = HedTags(hed_version='8.3.0', data=["Correct-action", "Incorrect-action"]) ``` ```` -The result is a `VectorData` object whose data vector includes 2 elements. -Notice that data is a list with 2 values representing two distinct HED strings. -The values of these elements are validated using HED schema version 8.3.0 when `tags` is created. -If any of the tags had been invalid, the constructor would raise a `ValueError`. -````{admonition} Add a row to an existing HED VectorData +You must specify the version of the HED vocabulary to be used. +We recommend that you use the latest version of HED (currently 8.3.0). +A separate HED version is used for each instance of the `HedTags` column, +so in theory you could use a different version for each column. +This is not recommended, as annotations across columns and tables may be combined for analysis. +See [**Understanding HED versions**](./UnderstandingHedVersions.md) for a more detailed explanation +of HED versioning. + +### Adding a row to HedTags + +The following example assumings that a `HedTags` object `tags` as already been +created as illustrated in the previous example. + +````{admonition} Add a row to an existing HedTags object :class: tip ```python tags.add_row("Sensory-event, Visual-presentation") ``` +```` + After this `add_row` operation, `tags` has 3 elements. Notice that "Sensory-event, Visual-presentation" is a single HED string, not two HED strings. - -```` +In contrast, ["Correct-action", "Incorrect-action"] is a list with two HED strings. ### HED in a table @@ -82,9 +90,12 @@ color_table = DynamicTable( ``` ```` The example sets up a table with columns named `color_code` and `HED`. -Here are some common operations that you might want to perform on such a table: -````{admonition} -Get row 0 of `color_table` as a Pandas DataFrame: +Table `colors` has 3 rows. + +### Add a row to a `DynamicTable` +Once a table has been required, you can add a row using the table's `add_row` method. + +````{admonition} Get row 0 of color_table as a Pandas DataFrame: ```python df = color_table[0] ``` @@ -93,29 +104,95 @@ Append a row to `color_table`: color_table.add_row(color_code=4, HED="Black") ``` ```` -The `HED` -color_table = DynamicTable(name="colors", description="Color table for the experiment", - columns=[{"name"="code", "description"="Internal color codes", data=[1, 2, 3]}, - HedTags(name="HED", hed_version="8.3.0", data=["Red", "Green", "Blue"])] - +As mentioned above, the `DynamicTable` class is used as the base class for many table classes including the +`TimeIntervals`, `Units`, and `PlaneSegmentation`. +For example `icephys` classes that extend `DynamicTable` include `ExperimentalConditionsTable`, `IntracellularElectrodesTable`, +`IntracellularResponsesTable`, `IntracellularStimuliTable`, `RepetitionsTable`, `SequentialRecordingsTable`, +`SimultaneousRecordingsTable` and the `SweepTable`. +This means that HED can be used to annotate a variety of NWB data. + +HED tools recognize a column as containing HED annotations if it is an instance of `HedTags`. +This is in contrast to BIDS ([**Brain Imaging Data Structure**](https://bids.neuroimaging.io/)), +which identifies HED in tabular files by the presence of a `HED` column, +or by an accompanying JSON sidecar, which associates HED annotations with tabular column names. + +## HED and ndx-events + +The NWB [**ndx-events**](https://github.com/rly/ndx-events) extension provides data structures for +representing event information about data recordings. +The following table lists elements of the *ndx-events* extension that inherit from +`DynamicTable` and can accommodate HED annotations. + +```{list-table} ndx-events tables that can use HED. +:header-rows: 1 +:name: ndx-events-data-structures + +* - Table + - Purpose + - Comments +* - `EventsTypesTable` + - Information about each event type
One row per event type. + - Analogous to BIDS events.json. +* - `EventsTable` + - Stores event instances
One row per event instance. + - Analogous to BIDS events.tsv. +* - `TtlTypesTable` + - Information about each TTL type. + - +* - `TtlTable` + - Information about each TTL instance. + - ``` +HED annotations that are common to a particular type of event can be added to e NWB `EventsTypesTable`, +which is analogous to the `events.json` in BIDS. +A `HED` column can be added to a BIDS `events.tsv` file to provide HED annotations specific +to each event instance. +Any number of `HedTags` columns can be added to the NWB `EventsTable` to provide different types +of HED annotations for each event instance. +The HEDTools ecosystem currently supports assembling the annotations from all sources to construct +complete annotations for event instances in BIDS. Similar support is planned for NWB files. -``` -## Installation +## HED in NWB files -## Implementation -The implementation is based around -### The HedTags class +A single NWB recording and its supporting data is stored in an `NWBFile` object. +The NWB infrastructure efficiently handles reading, writing, and accessing large `NWBFile` objects and their components. +The following example shows the creation of a simple `NWBFile` using only the required constructor arguments. +````{admonition} Create an NWBFile object called my_nwb. +```python +from datetime import datetime +from dateutil.tz import tzutc +from pynwb import NWBFile + +my_nwb = NWBFile(session_description='a test NWB File', + identifier='TEST123', + session_start_time=datetime(1970, 1, 1, 12, tzinfo=tzutc())) -### In TrialTable +``` +```` -### in EventTable +An `NWBFile` has many fields, which can be set using optional parameters to the constructor +or set later using method calls. -file including the `TimeIntervals`, `Units`, `PlaneSegmentation`, and many of the -the `icephys` tables (`ExperimentalConditionsTable`, `IntracellularElectrodesTable`, -`IntracellularResponsesTable, `IntracellularStimuliTable`, `RepetitionsTable`, SequentialRecordingsTable`, -`SimultaneousRecordingsTable` and the `SweepTable`)(`TrialTable`, Epch \ No newline at end of file +````{admonition} Add a HED trial column to an NWB trial table and add trial information. +```python +my_nwb.add_trial_column(name="HED", hed_version="8.3.0", col_cls=HedTags, data=[], description="temp") +my_nwb.add_trial(start_time=0.0, stop_time=1.0, HED="Correct-action") +my_nwb.add_trial(start_time=2.0, stop_time=3.0, HED="Incorrect-action") +``` +```` +The optional parameters for the `NWBFile` constructor whose values can inherit from `DynamicTable` +include `epochs`, `trials`, `invalid_times`, `units`, `electrodes`, `sweep_table`, +`intracellular_recordings`, `icephys_simultaneous_recordings`, `icephys_repetitions`, and +`icephys_experimental_conditions`. +The `NWBFile` class has methods of the form `add_xxx_column` for the +`epochs`, `electrodes`, `trials`, `units`,and `invalid_times` tables. +The other tables also allow a HED column to be added by constructing the appropriate table +prior to passing it to the `NWBFile` constructor. + +In addition, the `stimulus` input is a list or tuple of objects that could include `DynamicTable` objects. + +The NWB infrastructure provides IO functions to serialize these HED-augmented tables. \ No newline at end of file diff --git a/docs/source/HedMatlabTools.md b/docs/source/HedMatlabTools.md index da8f934..4720b2b 100644 --- a/docs/source/HedMatlabTools.md +++ b/docs/source/HedMatlabTools.md @@ -722,8 +722,8 @@ For the remodeling operations, first and second operation must be the dataset ro directory and the remodeling file name, respectively. In this example, dataset `ds003645` has been downloaded from [**openNeuro**](https://openneuro.org) to the `G:\` drive. The remodeling file used in this example can be found at -See [**File remodeling quickstart**](FileRemodelingQuickstart.md) -and [**File remodeling tools**](FileRemodelingTools.md) for +See [**HED remodeling quickstart**](HedRemodelingQuickstart) +and [**HED remodeling tools**](HedRemodelingTools) for additional information. (web-service-matlab-demos-anchor)= diff --git a/docs/source/HedOnlineTools.md b/docs/source/HedOnlineTools.md index 4c70e5f..00e3c0f 100644 --- a/docs/source/HedOnlineTools.md +++ b/docs/source/HedOnlineTools.md @@ -151,8 +151,8 @@ to generate a JSON sidecar based on all the events files in a BIDS dataset. The HED remodeling tools provide an interface to nearly all the HED tools functionality without programming. To use the tools, create a JSON file containing the commands that you wish to execute on the events file. Command are available to do various transformations and summaries of events files as explained in -the [**File remodeling quickstart**](https://www.hed-resources.org/en/latest/FileRemodelingQuickstart.html) and the -[**File remodeling tools**](https://www.hed-resources.org/en/latest/FileRemodelingTools.html). +the [**HED remodeling quickstart**](https://www.hed-resources.org/en/latest/HedRemodelingQuickstart.html) and the +[**HED remodeling tools**](https://www.hed-resources.org/en/latest/HedRemodelingTools.html). ``````{admonition} Execute a remodel script. diff --git a/docs/source/HedPythonTools.md b/docs/source/HedPythonTools.md index 02b3f57..5a60524 100644 --- a/docs/source/HedPythonTools.md +++ b/docs/source/HedPythonTools.md @@ -1,22 +1,44 @@ # HED Python tools +The primary codebase for HED support is in Python. +Source code for the HED Python tools is available in the +[**hed-python**](https://github.com/hed-standard/hed-pythong) GitHub repository +See the [**HED tools API documentation**](https://hed-python.readthedocs.io/en/latest/) for +detailed information about the HED Tools API. + +Many of the most-frequently used tools are available using the +[**HED remodeling tools**](https://www.hed-resources.org/en/latest/HedRemodelingTools.html). +Using the remodeling interface, users specify operations and parameters in a JSON +file rather than writing code. + +The [**HED online tools**](https://hedtools.org/hed) provide an easy-to-use GUI and web service for accessing the tools. +See the [**HED online tools documentation**](https://www.hed-resources.org/en/latest/HedOnlineTools.html) +for more information. + +The [**HED MATLAB tools**](https://www.hed-resources.org/en/latest/HedMatlabTools.html) +provide a MATLAB wrapper for the HED Python tools. +For users that do not have an appropriate version of Python installed for their MATLAB, +the tools access the online tool web service to perform the operation. + +## HED Python tool installation The HED (Hierarchical Event Descriptor) scripts and notebooks assume that the Python HedTools have been installed. -The HedTools package is not yet available on PyPI, so you will need to install it -directly from GitHub using: +The HedTools package is available on PyPi and can be installed using: ```shell - pip install git+https://github.com/hed-standard/hed-python/@master + pip install hedtools ``` -There are several types of Jupyter notebooks and other HED support tools: -* [**Jupyter notebooks for HED in BIDS**](jupyter-notebooks-for-hed-in-bids-anchor) - aids for HED annotation in BIDS. -* [**Jupyter notebooks for data curation**](jupyter-curation-notebooks-anchor) - aids for -summarizing and reorganizing event data. -* [**Calling HED tools**](calling-hed-tools-anchor) - specific useful functions/classes. +Prerelease versions of HedTools are available on the `develop` branch of the +[**hed-python**](https://github.com/hed-standard/hed-pythong) GitHub repository +and can be installed using: +```shell + pip install git+https://github.com/hed-standard/hed-python/@develop +``` -(jupyter-notebooks-for-hed-in-bids-anchor)= -## Jupyter notebooks for HED in BIDS + +(jupyter-notebooks-for-hed-anchor)= +## Jupyter notebooks for HED The following notebooks are specifically designed to support HED annotation for BIDS datasets. @@ -225,222 +247,3 @@ This is very useful for testing new schemas that are underdevelopment. ### Validate BIDS datasets The [**validate_bids_datasets.ipynb**](https://github.com/hed-standard/hed-examples/blob/main/src/jupyter_notebooks/bids/validate_bids_datasets.ipynb) is similar to the other validation notebooks, but it takes a list of datasets to validate as a convenience. - - -(jupyter-curation-notebooks-anchor)= -## Jupyter notebooks for data curation - -All data curation notebooks and other examples can now be found -in the [**hed-examples**](https://github.com/hed-standard/hed-examples) repository. - - -(consistency-of-BIDS-event-files-anchor)= -### Consistency of BIDS event files - -Some neuroimaging modalities such as EEG, typically contain event information -encoded in the data recording files, and the BIDS `events.tsv` files are -generated post hoc. - -In general, the following things should be checked before data is released: -1. The BIDS `events.tsv` files have the same number of events as the data -recording and that onset times of corresponding events agree. -2. The associated information contained in the data recording and event files is consistent. -3. The relevant metadata is present in both versions of the data. - -The example data curation scripts discussed in this section assume that two versions -of each BIDS event file are present: `events.tsv` and a corresponding `events_temp.tsv` file. -The example datasets that are using for these tutorials assume that the recordings -are in EEG.set format. - -(calling-hed-tools-anchor)= -## Calling HED tools - -This section shows examples of useful processing functions provided in HedTools: - -* [**Getting a list of filenames**](getting-a-list-of-files-anchor) -* [**Dictionaries of filenames**](dictionaries-of-filenames-anchor) -* [**Logging processing steps**](logging-processing-steps-anchor) - - -(getting-a-list-of-files-anchor)= -### Getting a list of files - -Many situations require the selection of files in a directory tree based on specified criteria. -The `get_file_list` function allows you to pick out files with a specified filename -prefix and filename suffix and specified extensions - -The following example returns a list of full paths of the files whose names end in `_events.tsv` -or `_events.json` that are not in any `code` or `derivatives` directories in the `bids_root_path` -directory tree. -The search starts in the directory root `bids_root_path`: - -````{admonition} Get a list of specified files in a specified directory tree. -:class: tip -```python -file_list = get_file_list(bids_root_path, extensions=[ ".json", ".tsv"], name_suffix="_events", - name_prefix="", exclude_dirs=[ "code", "derivatives"]) -``` -```` - -(dictionaries-of-filenames-anchor)= -### Dictionaries of filenames - -The HED tools provide both generic and BIDS-specific classes for dictionaries of filenames. - -Many of the HED data processing tools make extensive use of dictionaries specifying both data and format. - -#### BIDS-specific dictionaries of files - -Files in BIDS have unique names that indicate not only what the file represents, -but also where that file is located within the BIDS dataset directory tree. - -##### BIDS file names and keys -A BIDS file name consists of an underbar-separated list of entities, -each specified as a name-value pair, -followed by suffix indicating the data modality. - -For example, the file name `sub-001_ses-3_task-target_run-01_events.tsv` -has entities subject (`sub`), task (`task`), and run (`run`). -The suffix is `events` indicating that the file contains events. -The extension `.tsv` gives the data format. - -Modality is not the same as data format, since some modalities allow -multiple formats. For example, `sub-001_ses-3_task-target_run-01_eeg.set` -and `sub-001_ses-3_task-target_run-01_eeg.edf` are both acceptable -representations of EEG files, but the data is in different formats. - -The BIDS file dictionaries represented by the class `BidsFileDictionary` -and its extension `BidsTabularDictionary` use a set combination of entities -as the file key. - -For a file name `sub-001_ses-3_task-target_run-01_events.tsv`, -the tuple ('sub', 'task') gives a key of `sub-001_task-target`, -while the tuple ('sub', 'ses', 'run') gives a key of `sub-001_ses-3_run-01`. -The use of dictionaries of file names with such keys makes it -easier to associate related files in the BIDS naming structure. - -Notice that specifying entities ('sub', 'ses', 'run') gives the -key `sub-001_ses-3_run-01` for all three files: -`sub-001_ses-3_task-target_run-01_events.tsv`, `sub-001_ses-3_task-target_run-01_eeg.set` -and `sub-001_ses-3_task-target_run-01_eeg.edf`. -Thus, the expected usage is to create a dictionary of files of one modality. - -````{admonition} Create a key-file dictionary for files ending in events.tsv in bids_root_path directory tree. -:class: tip -```python -from hed.tools import FileDictionary -from hed.util import get_file_list - -file_list = get_file_list(bids_root_path, extensions=[ ".set"], name_suffix="_eeg", - exclude_dirs=[ "code", "derivatives"]) -file_dict = BidsFileDictionary(file_list, entities=('sub', 'ses', 'run) ) -``` -```` - -In this example, the `get_file_list` filters the files of the appropriate type, -while the `BidsFileDictionary` creates a dictionary with keys such as -`sub-001_ses-3_run-01` and values that are `BidsFile` objects. -`BidsFile` can hold the file name of any BIDS file and keeps a parsed -version of the file name. - - - -#### A generic dictionary of filenames - - -````{admonition} Create a key-file dictionary for files ending in events.json in bids_root_path directory tree. -:class: tip -```python -from hed.tools import FileDictionary -from hed.util import get_file_list - -file_list = get_file_list(bids_root_path, extensions=[ ".json"], name_suffix="_events", - exclude_dirs=[ "code", "derivatives"]) -file_dict = FileDictionary(file_list, name_indices=name_indices) -``` -```` - -Keys are calculated from the filename using a `name_indices` tuple, -which indicates the positions of the name-value entity pairs in the -BIDS file name to use. - -The BIDS filename `sub-001_ses-3_task-target_run-01_events.tsv` has -three name-value entity pairs (`sub-001`, `ses-3`, `task-target`, -and `run-01`) separated by underbars. - -The tuple (0, 2) gives a key of `sub-001_task-target`, -while the tuple (0, 3) gives a key of `sub-001_run-01`. -Neither of these choices uniquely identifies the file. -The tuple (0, 1, 3) gives a unique key of `sub-001_ses-3_run-01`. -The tuple (0, 1, 2, 3) also works giving `sub-001_ses-3_task-target_run-01`. - -If you choose the `name_indices` incorrectly, the keys for the event files -will not be unique, and the notebook will throw a `HedFileError`. -If this happens, modify your `name_indices` key choice to include more entity pairs. - -For example, to compare the events stored in a recording file and the events -in the `events.tsv` file associated with that recording, -we might dump the recording events in files with the same name, but ending in `events_temp.tsv`. -The `FileDictionary` class allows us to create a keyed dictionary for each of these event files. - - -(logging-processing-steps-anchor)= -### Logging processing steps - -Often event data files require considerable processing to assure -internal consistency and compliance with the BIDS specification. -Once this processing is done and the files have been transformed, -it can be difficult to understand the relationship between the -transformed files and the original data. - -The `HedLogger` allows you to document processing steps associated -with the dataset by identifying key as illustrated in the following -log file excerpt: - -(example-output-hed-logger-anchor)= -`````{admonition} Example output from HED logger. -:class: tip -```text -sub-001_run-01 - Reordered BIDS columns as ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value', 'HED'] - Dropped BIDS skip columns ['trial_type', 'value', 'response_time', 'stim_file', 'HED'] - Reordered EEG columns as ['sample_offset', 'event_code', 'cond_code', 'type', 'latency', 'urevent', 'usertags'] - Dropped EEG skip columns ['urevent', 'usertags', 'type'] - Concatenated the BIDS and EEG event files for processing - Dropped the sample_offset and latency columns - Saved as _events_temp1.tsv -sub-002_run-01 - Reordered BIDS columns as ['onset', 'duration', 'sample', 'trial_type', 'response_time', 'stim_file', 'value', 'HED'] - Dropped BIDS skip columns ['trial_type', 'value', 'response_time', 'stim_file', 'HED'] - Reordered EEG columns as ['sample_offset', 'event_code', 'cond_code', 'type', 'latency', 'urevent', 'usertags'] - Dropped EEG skip columns ['urevent', 'usertags', 'type'] - Concatenated the BIDS and EEG event files for processing - . . . -``` -````` - -Each of the lines following a key represents a print message to the logger. - -The most common use for a logger is to create a file dictionary -using [**make_file_dict**](dictionaries-of-filenames-anchor) -and then to log each processing step using the file's key. -This allows a processing step to be applied to all the relevant files in the dataset. -After all the processing is complete, the `print_log` method -outputs the logged messages by key, thus showing all the -processing steps that have been applied to each file -as shown in the [**previous example**](example-output-hed-logger-anchor). - -(using-hed-logger-example-anchor)= -`````{admonition} Using the HED logger. -:class: tip -```python -from hed.tools import HedLogger -status = HedLogger() -status.add(key, f"Concatenated the BIDS and EEG event files") - -# ... after processing is complete output or save the log -status.print_log() -``` -````` - -The `HedLogger` is used throughout the processing notebooks in this repository. diff --git a/docs/source/FileRemodelingQuickstart.md b/docs/source/HedRemodelingQuickstart.md similarity index 94% rename from docs/source/FileRemodelingQuickstart.md rename to docs/source/HedRemodelingQuickstart.md index 52be3e2..357212e 100644 --- a/docs/source/FileRemodelingQuickstart.md +++ b/docs/source/HedRemodelingQuickstart.md @@ -1,495 +1,495 @@ -(file-remodeling-quickstart-anchor)= -# File remodeling quickstart - -This tutorial works through the process of restructuring tabular (`.tsv`) files using the HED file remodeling tools. -These tools particularly useful for creating event files from -information in experimental logs and for restructuring event files to enable a particular analysis. - -The tools, which are written in Python, are designed to be run on an entire dataset. -This dataset can be in BIDS -([**Brain Imaging Data Structure**](https://bids.neuroimaging.io/)), -Alternative users can specify files with a particular suffix and extension appearing -in a specified directory tree. -The later format is useful for restructuring that occurs early in the experimental process, -for example, during the conversion from the experimental control software formats. - -The tools can be run using a command-line script, called from a Jupyter notebook, -or run using online tools. This quickstart covers the basic concepts of remodeling and -develops some basic examples of how remodeling is used. See the -[**File remodeling tools**](./FileRemodelingTools.md) -guide for detailed descriptions of the available operations. - -* [**What is remodeling?**](what-is-remodeling-anchor) -* [**The remodeling process**](the-remodeling-process-anchor) -* [**JSON remodeling files**](json-remodeling-files-anchor) - * [**Basic remodel operation syntax**](basic-remodel-operation-syntax-anchor) - * [**Applying multiple remodel operations**](applying-multiple-remodel-operations-anchor) - * [**More complex remodeling**](more-complex-remodeling-anchor) - * [**Remodeling file locations**](remodeling-file-locations-anchor) -* [**Using the remodeling tools**](using-the-remodeling-tools-anchor) - * [**Online tools for debugging**](online-tools-for-debugging-anchor) - * [**The command-line interface**](the-command-line-interface-anchor) - * [**Jupyter notebooks for remodeling**](jupyter-notebooks-for-remodeling-anchor) - -(what-is-remodeling-anchor)= -## What is remodeling? - -Although the remodeling process can be applied to any tabular file, -they are most often used for restructuring event files. -Event files, which consist of identified time markers linked to the timeline of the experiment, -provide a crucial bridge between what happens in -the experiment and the experimental data. - -Event files are often initially created using information in the log files -generated by the experiment control software. -The entries in the log files mark time points within the experimental record at which something -changes or happens (such as the onset or offset of a stimulus or a participant response). -These event files are then used to identify portions of the data -corresponding to particular points or blocks of data to be analyzed or compared. - -**Remodeling** refers to the process of file restructuring including creating, modifying, and -reorganizing tabular files in order to -disambiguate or clarify their information to enable or streamline -their analysis and/or further distribution. -HED-based remodeling can occur at several stages during the acquisition and processing -of experimental data as shown in this schematic diagram: -![schematic diagram](./_static/images/WebWorkflow.png). - -In addition to restructuring during initial structuring of the tabular files, -further event file restructuring may be useful when the event files are not suited to the requirements of a particular analysis. Thus, restructuring can be an iterative process, which is supported by the HED Remodeling Tools for datasets with tabular event files. - -The following table gives a summary of the tools available in the HED remodeling toolbox. - -(summary-of-hed-remodeling-operations-anchor)= -````{table} Summary of the HED remodeling operations for tabular files. -| Category | Operation | Example use case | -| -------- | ------- | -----| -| **clean-up** | | | -| | [*remove_columns*](remove-columns-anchor) | Remove temporary columns created during restructuring. | -| | [*remove_rows*](remove-rows-anchor) | Remove rows with a particular value in a specified column. | -| | [*rename_columns*](rename-columns-anchor) | Make columns names consistent across a dataset. | -| | [*reorder_columns*](reorder-columns-anchor) | Make column order consistent across a dataset. | -| **factor** | | | -| | [*factor_column*](factor-column-anchor) | Extract factor vectors from a column of condition variables. | -| | [*factor_hed_tags*](factor-hed-tags-anchor) | Extract factor vectors from search queries of HED annotations. | -| | [*factor_hed_type*](factor-hed-type-anchor) | Extract design matrices and/or condition variables. | -| **restructure** | | | -| | [*merge_consecutive*](merge-consecutive-anchor) | Replace multiple consecutive events of the same type
with one event of longer duration. | -| | [*remap_columns*](remap-columns-anchor) | Create *m* columns from values in *n* columns (for recoding). | -| | [*split_rows*](split-rows-anchor) | Split trial-encoded rows into multiple events. | -| **summarization** | | | -| | [*summarize_column_names*](summarize-column-names-anchor) | Summarize column names and order in the files. | -| | [*summarize_column_values*](summarize-column-values-anchor) |Count the occurrences of the unique column values. | -| | [*summarize_definitions*](summarize-definitions-anchor) |Summarize definitions used and report inconsistencies. | -| | [*summarize_hed_tags*](summarize-hed-tags-anchor) | Summarize the HED tags present in the
HED annotations for the dataset. | -| | [*summarize_hed_type*](summarize-hed-type-anchor) | Summarize the detailed usage of a particular type tag
such as *Condition-variable* or *Task*
(used to automatically extract experimental designs). | -| | [*summarize_hed_validation*](summarize-hed-validation-anchor) | Validate the data files and report any errors. | -| | [*summarize_sidecar_from_events*](summarize-sidecar-from-events-anchor) | Generate a sidecar template from an event file. | -```` - -The **clean-up** operations are used at various phases of restructuring to assure consistency -across files in the dataset. - -The **factor** operations produce column vectors of the same length as the number of rows in a file -in order to encode condition variables, design matrices, or the results of other search criteria. -See the -[**HED conditions and design matrices**](./HedConditionsAndDesignMatrices.md) -for more information on factoring and analysis. - -The **restructure** operations modify the way that files represent information. - -The **summarization** operations produce dataset-wide and individual file -summaries of various aspects of the data. - -More detailed information about the remodeling operations can be found in -the [**File remodeling tools**](file-remodeling-tools-anchor) guide. - -(the-remodeling-process-anchor)= -## The remodeling process - -Remodeling consists of applying a list of operations to a tabular file -to restructure or modify the file in some way. -The following diagram shows a schematic of the remodeling process. - -![Event remodeling process](./_static/images/EventRemappingProcess.png) - -Initially, the user creates a backup of the selected files. -This backup process is performed only once, and the results are -stored in the `derivatives/remodel/backups` subdirectory of the dataset. - -Restructuring applies a sequence of remodeling operations given in a JSON remodeling -file to produce a final result. -By convention, we name these remodeling instruction files `_rmdl.json` -and store them in the `derivatives/remodel/remodeling_files` directory relative -to the dataset root directory. - -The restructuring always proceeds by looking up each data file in the backup -and applying the transformation to the backup before overwriting the non-backed up version. - -The remodeling file provides a record of the operations performed on the file -starting with the original file. -If the user detects a mistake in the transformation instructions, -he/she can correct the remodeling JSON file and rerun. - -Usually, users will use the default backup, run the backup request once, and -work from the original backup. -However, user may also elect to create a named backup, use the backup -as a checkpoint mechanism, and develop scripts that use the check-pointed versions as the starting point. -This is useful if different versions of the events files are needed for different purposes. - - -(json-remodeling-files-anchor)= -## JSON remodeling files - -The operations to restructure a tabular file are stored in a remodel file in JSON format. -The file consists of a list of JSON dictionaries. - -(basic-remodel-operation-syntax-anchor)= -### Basic remodel operation syntax - -Each dictionary specifies an operation, a description of the purpose, and the operation parameters. -The basic syntax of a remodeler operation is illustrated in the following example which renames -the *trial_type* column to *event_type*. - - -````{admonition} Example of a remodeler operation. -:class: tip - -```json -{ - "operation": "rename_columns", - "description": "Rename a trial type column to more specific event_type", - "parameters": { - "column_mapping": { - "trial_type": "event_type" - }, - "ignore_missing": true - } -} -``` -```` - -Each remodeler operation has its own specific set of required parameters -that can be found under [**File remodeling tools**](./FileRemodelingTools.md). -For *rename_columns*, the required operations are *column_mapping* and *ignore_missing*. -Some operations also have optional parameters. - -(applying-multiple-remodel-operations-anchor)= -### Applying multiple remodel operations - -A remodel JSON file consists of a list of one or remodel operations, -each specified in a dictionary. -These operations are performed by the remodeler in the order they appear in the file. -In the example below, a summary is performed after renaming, -so the result reflects the new column names. - -````{admonition} An example JSON remodeler file with multiple operations. -:class: tip - -```json -[ - { - "operation": "rename_columns", - "description": "Rename a trial type column to more specific event_type.", - "parameters": { - "column_mapping": { - "trial_type": "event_type" - }, - "ignore_missing": true - } - }, - { - "operation": "summarize_column_names", - "description": "Get column names across files to find any missing columns.", - "parameters": { - "summary_name": "Columns after remodeling", - "summary_filename": "columns_after_remodel" - } - } -] -``` -```` - -By stacking operations you can make several changes to a data file, -which is important because the changes are always applied to a copy of the original backup. -If you are planning new changes to the file, note that you are always changing -a copy of the original backed up file, not a previously remodeled `.tsv`. - -(more-complex-remodeling-anchor)= -### More complex remodeling - -This section discusses a complex example using the -[**sub-0013_task-stopsignal_acq-seq_events.tsv**](./_static/data/sub-0013_task-stopsignal_acq-seq_events.tsv) -events file of AOMIC-PIOP2 dataset available on [OpenNeuro](https://openneuro.org) as ds002790. -Here is an excerpt of the event file. - - -(sample-remodeling-events-file-anchor)= -````{admonition} Excerpt from an event file from the stop-go task of AOMIC-PIOP2 (ds002790). -| onset | duration | trial_type | stop_signal_delay | response_time | response_accuracy | response_hand | sex | -| ----- | -------- | ---------- | ----------------- | ------------- | ----------------- | ------------- | --- | -| 0.0776 | 0.5083 | go | n/a | 0.565 | |correct | right | female -| 5.5774 | 0.5083 | unsuccesful_stop | 0.2 | 0.49 | correct | right | female | -| 9.5856 | 0.5084 | go | n/a | 0.45 | correct | right | female | -| 13.5939 | 0.5083 | succesful_stop | 0.2 | n/a | n/a | n/a | female | -| 17.1021 | 0.5083 | unsuccesful_stop | 0.25 | 0.633 | correct | left | male | -| 21.6103 | 0.5083 | go | n/a | 0.443 | correct | left | male | -```` - -This event file corresponds to a stop-signal experiment. -Participants were presented with faces and had to decide the sex of the face -by pressing a button with left or right hand. -However, if a stop signal occurred before this selection, the participant was to refrain from responding. - -The structure of this file corresponds to the [**BIDS**](https://bids.neuroimaging.io/) format -for event files. -The first column, which must be called `onset` represents the time from the start of recording -in seconds of the temporal marker represented by that row in the file. -In this case that temporal marker represents the presentation of a face image. - -Notice that the *stop_signal_delay* and *response_time* columns contain information -about additional events (when a trial stop signal was presented and when the participant -pushed a button). -These events are encoded implicitly as offsets from the presentation of the go signal. -Each row is the file encodes information for an entire trial rather than what occurred -at a single temporal marker. -This strategy is known as *trial-level* encoding. - -Our goal is to represent all the trial events (e.g., go signal, stop signal, and response) -in separate rows of the event file using the *split_rows* restructuring operation. -The following example shows the remodeling operations to perform the splitting. - -````{admonition} Example of split_rows operation for the AOMIC stop signal task. -:class: tip - -```json -[ - { - "operation": "split_rows", - "description": "Split response event from trial event based on response_time column.", - "parameters": { - "anchor_column": "trial_type", - "new_events": { - "response": { - "onset_source": ["response_time"], - "duration": [0], - "copy_columns": ["response_accuracy", "response_hand"] - }, - "stop_signal": { - "onset_source": ["stop_signal_delay"], - "duration": [0.5] - } - }, - "remove_parent_row": false - } - } -] -``` -```` - -The example uses the *split_rows* operation to convert this -file from trial encoding to event encoding. -In trial encoding each event marker (row in the event file) represents -all the information in a single trial. -Event markers such as the participant's response key-press are encoded implicitly -as an offset from the stimulus presentation. -while event encoding includes event markers for each individual event within the trial. - -The [**Split rows**](./FileRemodelingTools.md#split-rows) -explanation under [**File remodeling tools**](./FileRemodelingTools.md) -shows the required parameters for the *split_rows* operation. -The required parameters are *anchor_column*, *new_events*, and *remove_parent_row*. - -The *anchor_column* is the column we want to add new events corresponding to the stop signal and the response. -In this case we are going to add events to an existing column: *trial_type*. -The new events will be in new rows and the existing rows will not be overwritten -because *remove_parent_event* is false. -(After splitting we may want to rename *trial_type* to *event_type* since the individual -rows in the data file no longer represent trials, but individual events within the trial.) - -Next we specify how the new events are generated in the *new_events* dictionary. -Each type of new event has a name, which is a key in the *new_events* dictionary. -Each key is associated with a dictionary -specifying the values of the following parameters. - -* *onset_source* -* *duration* -* *copy_columns* - -The *onset_source* is a list indicating how to calculate the onset for the new event -relative to the onset of the anchor event. -The list contains any combination of column names and numerical values, -which are evaluated and added to the onset value of the row being split. -Column names are evaluated to the row values in the corresponding columns. - -In our example, the response time and stop signal delay are calculated relative to the trial's onset, -so we only need to add the value from the respective column. -Note that these new events do not exist for every trial. -Rows where there was no stop signal have an `n/a` in the *stop_signal_delay* column. -This is processed automatically, and remodeler does not create new events -when any items in the *onset_source* list is missing or `n/a`. - -The *duration* specifies the duration for the new events. -The AOMIC data did not measure the durations of the button presses, -so we set the duration of the response event to 0. -The AOMIC data report indicates that the stop signal lasted 500 ms. - -The *copy_columns* is an optional parameter indicating which columns from the parent event should be copied to the -newly-created event. -We would like to transfer the *response_accuracy* and the *response_hand* information to the *response* event. -Since no extra column values are to be transferred for *stop_signal*, columns other than *onset*, *duration*, -and *trial_type* are filled with `n/a`. - - -The final remodeling file can be found at: -[**finished json remodeler**](./_static/data/AOMIC_splitevents_rmdl.json) - -(remodeling-file-locations-anchor)= -### Remodeling file locations - -The remodeling tools expect the full path for the JSON remodeling operation file to be given -when the remodeling is executed. -However, it is a good practice to include all remodeling files used with the dataset. -The JSON remodeling operation files are usually located in the -`derivatives/remodel/remodeling_files` subdirectory below the dataset root, -and have file names ending in `_rmdl.json`. - -The backups are always in the `derivatives/remodel/backups` subdirectory under the dataset root. -Summaries produced by the restructuring tools are located in `derivatives/remodel/summaries`. - -In the next section we will go over several ways to call the remodeler. - -(using-the-remodeling-tools-anchor)= -## Using the remodeling tools - -The remodeler can be called in a number of ways including using online tools and from the command line. -The following sections explain various ways to use the available tools. - -(online-tools-for-debugging-anchor)= -### Online tools for debugging - -Although the event restructuring tools are designed to be run on an entire dataset, -you should consider working with a single data file during debugging. -The HED online tools provide support for debugging your remodeling script and for -seeing the effect of remodeling on a single data file before running on the entire dataset. -You can access these tools on the [**HED tools online tools server**](https://hedtools.ucsd.edu/hed). - -To use the online remodeling tools, navigate to the events page and select the *Execute remodel script* action. -Browse to select the data file to be remodeled and the JSON remodel file -containing the remodeling operations. -The following screenshot shows these selections for the split rows example of the previous section. - -![Remodeling tools online](./_static/images/RemodelingOnline.png) - -Press the *Process* button to complete the action. -If the remodeling script has errors, -the result will be a downloaded text file with the errors identified. -If the remodeling script is correct, -the result will be a data file with the remodeling transformations applied. -If the remodeling script contains summarization operations, -the result will be a zip file with the modified data file and the summaries included. - -If you are using one of the remodeling operations that relies on HED tags, you will -also need to upload a suitable JSON sidecar file containing the HED annotations for the data file -if you turn the *Include summaries* option on. - -(the-command-line-interface-anchor)= -### The command-line interface - -After [**installing the remodeler**](./FileRemodelingTools.md#installing-the-remodel-tools), -you can run the tools on a full BIDS dataset, -or on any directory using the command-line interface using -`run_remodel_backup`, `run_remodel`, and `run_remodel_restore`. -A full overview of all arguments is available at -[**File remodeling tools**](./FileRemodelingTools.md#remodel-command-line-arguments). - -The `run_remodel_backup` is usually run only once for a dataset. -It makes the baseline backup of the event files to assure that nothing will be lost. -The remodeling always starts from the backup files. - -The `run_remodel` restores the data files from the corresponding backup files and then -executes remodeling operations from a JSON file. -A sample command line call for `run_remodel` is shown in the following example. - -(remodel-run-anchor)= -````{admonition} Command to run a summary for the AOMIC dataset. -:class: tip - -```bash -python run_remodel /data/ds002790 /data/ds002790/derivatives/remodel/remodeling_files/AOMIC_summarize_rmdl.json \ - -b -s .txt -x derivatives - -``` -```` - -The parameters are as follows: - -* `data_dir` - (Required first argument) Root directory of the dataset. -* `model_path` - (Required second argument) Path of JSON file with remodeling operations. -* `-b` - (Optional) If present, assume BIDS formatted data. -* `-s` - (Optional) list of formats to save summaries in. -* `-x` - (Optional) List of directories to exclude from event processing. - -There are three types of command line arguments: - -[**Positional arguments**](./FileRemodelingTools.md#positional-arguments), -[**Named arguments**](./FileRemodelingTools.md#named-arguments), -and [**Named arguments with values**](./FileRemodelingTools.md#named-arguments). - -The positional arguments, `data_dir` and `model_path` are not optional and must -be the first and second arguments to `run_remodel`, respectively. -The named arguments (with and without values) are optional. -They all have default values if omitted. - -The `-b` option is a named argument indicating whether the dataset is in BIDS format. -If in BIDS format, the remodeling tools can extract information such as the HED schema -and the HED annotations from the dataset. -BIDS data file names are unique, which is convenient for reporting summary information. -Name arguments are flags-- their presence indicates true and absence indicates false. - -The `-s` and `-x` options are examples of named arguments with values. -The `-s .txt` specifies that summaries should be saved in text format. -The `-x derivatives` indicates that the `derivatives` subdirectory should not -be processed during remodeling. - -This script can be run multiple times without doing backups and restores, -since it always starts with the backed up files. - -The first argument of the command line scripts is the full path to the root directory of the dataset. -The `run_remodel` requires the full path of the json remodeler file as the second argument. -A number of optional key-value arguments are also available. - -After the `run_remodel` finishes, it overwrites the data files (not the backups) -and writes any requested summaries in `derivatives/remodel/summaries`. - -The summaries will be written to `/data/ds002790/derivatives/remodel/summaries` folder in text format. -By default, the summary operations will return both. - -The [**summary file**](./_static/data/AOMIC_column_names_2022_12_21_T_13_12_35_641062.txt) lists all different column combinations and for each combination, the files with those columns. -Looking at the different column combinations you can see there are three, one for each task that was performed for this dataset. - -Going back to the [**split rows example**](#more-complex-remodeling) of remodeling, -we see that splitting the rows into multiple rows only makes sense if the event files have the same columns. -Only the event files for the stop signal task contain the `stop_signal_delay` column and the `response_time` column. -The summarizing the column names across the dataset allows users to check whether the column -names are consistent across the dataset. -A common use case for BIDS datasets is that the event files have a different structure -for different tasks. -The `-t` command-line option allows users to specify which tasks to perform remodeling on. -Using this option allows users to select only the files that have the specified task names -in their filenames. - - -Now you can try out the *split_rows* on the full dataset! - - -(jupyter-notebooks-for-remodeling-anchor)= -### Jupyter notebooks for remodeling - -Three Jupyter remodeling notebooks are available at -[**Jupyter notebooks for remodeling**](https://github.com/hed-standard/hed-examples/tree/main/src/jupyter_notebooks/remodeling). - -These notebooks are wrappers that create the backup as well as run restructuring operations on data files. -If you do not have access to a Jupyter notebook facility, the article -[Six easy ways to run your Jupyter Notebook in the cloud](https://www.dataschool.io/cloud-services-for-jupyter-notebook/) discusses various no-cost options -for running Jupyter notebooks online. +(hed-remodeling-quickstart-anchor)= +# HED remodeling quickstart + +This tutorial works through the process of restructuring tabular (`.tsv`) files using the HED file remodeling tools. +These tools particularly useful for creating event files from +information in experimental logs and for restructuring event files to enable a particular analysis. + +The tools, which are written in Python, are designed to be run on an entire dataset. +This dataset can be in BIDS +([**Brain Imaging Data Structure**](https://bids.neuroimaging.io/)), +Alternative users can specify files with a particular suffix and extension appearing +in a specified directory tree. +The later format is useful for restructuring that occurs early in the experimental process, +for example, during the conversion from the experimental control software formats. + +The tools can be run using a command-line script, called from a Jupyter notebook, +or run using online tools. This quickstart covers the basic concepts of remodeling and +develops some basic examples of how remodeling is used. See the +[**HED remodeling tools**](./HedRemodelingTools) +guide for detailed descriptions of the available operations. + +* [**What is remodeling?**](what-is-remodeling-anchor) +* [**The remodeling process**](the-remodeling-process-anchor) +* [**JSON remodeling files**](json-remodeling-files-anchor) + * [**Basic remodel operation syntax**](basic-remodel-operation-syntax-anchor) + * [**Applying multiple remodel operations**](applying-multiple-remodel-operations-anchor) + * [**More complex remodeling**](more-complex-remodeling-anchor) + * [**Remodeling file locations**](remodeling-file-locations-anchor) +* [**Using the remodeling tools**](using-the-remodeling-tools-anchor) + * [**Online tools for debugging**](online-tools-for-debugging-anchor) + * [**The command-line interface**](the-command-line-interface-anchor) + * [**Jupyter notebooks for remodeling**](jupyter-notebooks-for-remodeling-anchor) + +(what-is-remodeling-anchor)= +## What is remodeling? + +Although the remodeling process can be applied to any tabular file, +they are most often used for restructuring event files. +Event files, which consist of identified time markers linked to the timeline of the experiment, +provide a crucial bridge between what happens in +the experiment and the experimental data. + +Event files are often initially created using information in the log files +generated by the experiment control software. +The entries in the log files mark time points within the experimental record at which something +changes or happens (such as the onset or offset of a stimulus or a participant response). +These event files are then used to identify portions of the data +corresponding to particular points or blocks of data to be analyzed or compared. + +**Remodeling** refers to the process of file restructuring including creating, modifying, and +reorganizing tabular files in order to +disambiguate or clarify their information to enable or streamline +their analysis and/or further distribution. +HED-based remodeling can occur at several stages during the acquisition and processing +of experimental data as shown in this schematic diagram: +![schematic diagram](./_static/images/WebWorkflow.png). + +In addition to restructuring during initial structuring of the tabular files, +further event file restructuring may be useful when the event files are not suited to the requirements of a particular analysis. Thus, restructuring can be an iterative process, which is supported by the HED Remodeling Tools for datasets with tabular event files. + +The following table gives a summary of the tools available in the HED remodeling toolbox. + +(summary-of-hed-remodeling-operations-anchor)= +````{table} Summary of the HED remodeling operations for tabular files. +| Category | Operation | Example use case | +| -------- | ------- | -----| +| **clean-up** | | | +| | [*remove_columns*](remove-columns-anchor) | Remove temporary columns created during restructuring. | +| | [*remove_rows*](remove-rows-anchor) | Remove rows with a particular value in a specified column. | +| | [*rename_columns*](rename-columns-anchor) | Make columns names consistent across a dataset. | +| | [*reorder_columns*](reorder-columns-anchor) | Make column order consistent across a dataset. | +| **factor** | | | +| | [*factor_column*](factor-column-anchor) | Extract factor vectors from a column of condition variables. | +| | [*factor_hed_tags*](factor-hed-tags-anchor) | Extract factor vectors from search queries of HED annotations. | +| | [*factor_hed_type*](factor-hed-type-anchor) | Extract design matrices and/or condition variables. | +| **restructure** | | | +| | [*merge_consecutive*](merge-consecutive-anchor) | Replace multiple consecutive events of the same type
with one event of longer duration. | +| | [*remap_columns*](remap-columns-anchor) | Create *m* columns from values in *n* columns (for recoding). | +| | [*split_rows*](split-rows-anchor) | Split trial-encoded rows into multiple events. | +| **summarization** | | | +| | [*summarize_column_names*](summarize-column-names-anchor) | Summarize column names and order in the files. | +| | [*summarize_column_values*](summarize-column-values-anchor) |Count the occurrences of the unique column values. | +| | [*summarize_definitions*](summarize-definitions-anchor) |Summarize definitions used and report inconsistencies. | +| | [*summarize_hed_tags*](summarize-hed-tags-anchor) | Summarize the HED tags present in the
HED annotations for the dataset. | +| | [*summarize_hed_type*](summarize-hed-type-anchor) | Summarize the detailed usage of a particular type tag
such as *Condition-variable* or *Task*
(used to automatically extract experimental designs). | +| | [*summarize_hed_validation*](summarize-hed-validation-anchor) | Validate the data files and report any errors. | +| | [*summarize_sidecar_from_events*](summarize-sidecar-from-events-anchor) | Generate a sidecar template from an event file. | +```` + +The **clean-up** operations are used at various phases of restructuring to assure consistency +across files in the dataset. + +The **factor** operations produce column vectors of the same length as the number of rows in a file +in order to encode condition variables, design matrices, or the results of other search criteria. +See the +[**HED conditions and design matrices**](./HedConditionsAndDesignMatrices.md) +for more information on factoring and analysis. + +The **restructure** operations modify the way that files represent information. + +The **summarization** operations produce dataset-wide and individual file +summaries of various aspects of the data. + +More detailed information about the remodeling operations can be found in +the [**HED remodeling tools**](hed-remodeling-tools-anchor) guide. + +(the-remodeling-process-anchor)= +## The remodeling process + +Remodeling consists of applying a list of operations to a tabular file +to restructure or modify the file in some way. +The following diagram shows a schematic of the remodeling process. + +![Event remodeling process](./_static/images/EventRemappingProcess.png) + +Initially, the user creates a backup of the selected files. +This backup process is performed only once, and the results are +stored in the `derivatives/remodel/backups` subdirectory of the dataset. + +Restructuring applies a sequence of remodeling operations given in a JSON remodeling +file to produce a final result. +By convention, we name these remodeling instruction files `_rmdl.json` +and store them in the `derivatives/remodel/remodeling_files` directory relative +to the dataset root directory. + +The restructuring always proceeds by looking up each data file in the backup +and applying the transformation to the backup before overwriting the non-backed up version. + +The remodeling file provides a record of the operations performed on the file +starting with the original file. +If the user detects a mistake in the transformation instructions, +he/she can correct the remodeling JSON file and rerun. + +Usually, users will use the default backup, run the backup request once, and +work from the original backup. +However, user may also elect to create a named backup, use the backup +as a checkpoint mechanism, and develop scripts that use the check-pointed versions as the starting point. +This is useful if different versions of the events files are needed for different purposes. + + +(json-remodeling-files-anchor)= +## JSON remodeling files + +The operations to restructure a tabular file are stored in a remodel file in JSON format. +The file consists of a list of JSON dictionaries. + +(basic-remodel-operation-syntax-anchor)= +### Basic remodel operation syntax + +Each dictionary specifies an operation, a description of the purpose, and the operation parameters. +The basic syntax of a remodeler operation is illustrated in the following example which renames +the *trial_type* column to *event_type*. + + +````{admonition} Example of a remodeler operation. +:class: tip + +```json +{ + "operation": "rename_columns", + "description": "Rename a trial type column to more specific event_type", + "parameters": { + "column_mapping": { + "trial_type": "event_type" + }, + "ignore_missing": true + } +} +``` +```` + +Each remodel operation has its own specific set of required parameters +that can be found under [**HED remodeling tools**](./HedRemodelingTools). +For *rename_columns*, the required operations are *column_mapping* and *ignore_missing*. +Some operations also have optional parameters. + +(applying-multiple-remodel-operations-anchor)= +### Applying multiple remodel operations + +A remodel JSON file consists of a list of one or remodel operations, +each specified in a dictionary. +These operations are performed by the remodeler in the order they appear in the file. +In the example below, a summary is performed after renaming, +so the result reflects the new column names. + +````{admonition} An example JSON remodeler file with multiple operations. +:class: tip + +```json +[ + { + "operation": "rename_columns", + "description": "Rename a trial type column to more specific event_type.", + "parameters": { + "column_mapping": { + "trial_type": "event_type" + }, + "ignore_missing": true + } + }, + { + "operation": "summarize_column_names", + "description": "Get column names across files to find any missing columns.", + "parameters": { + "summary_name": "Columns after remodeling", + "summary_filename": "columns_after_remodel" + } + } +] +``` +```` + +By stacking operations you can make several changes to a data file, +which is important because the changes are always applied to a copy of the original backup. +If you are planning new changes to the file, note that you are always changing +a copy of the original backed up file, not a previously remodeled `.tsv`. + +(more-complex-remodeling-anchor)= +### More complex remodeling + +This section discusses a complex example using the +[**sub-0013_task-stopsignal_acq-seq_events.tsv**](./_static/data/sub-0013_task-stopsignal_acq-seq_events.tsv) +events file of AOMIC-PIOP2 dataset available on [OpenNeuro](https://openneuro.org) as ds002790. +Here is an excerpt of the event file. + + +(sample-remodeling-events-file-anchor)= +````{admonition} Excerpt from an event file from the stop-go task of AOMIC-PIOP2 (ds002790). +| onset | duration | trial_type | stop_signal_delay | response_time | response_accuracy | response_hand | sex | +| ----- | -------- | ---------- | ----------------- | ------------- | ----------------- | ------------- | --- | +| 0.0776 | 0.5083 | go | n/a | 0.565 | |correct | right | female +| 5.5774 | 0.5083 | unsuccesful_stop | 0.2 | 0.49 | correct | right | female | +| 9.5856 | 0.5084 | go | n/a | 0.45 | correct | right | female | +| 13.5939 | 0.5083 | succesful_stop | 0.2 | n/a | n/a | n/a | female | +| 17.1021 | 0.5083 | unsuccesful_stop | 0.25 | 0.633 | correct | left | male | +| 21.6103 | 0.5083 | go | n/a | 0.443 | correct | left | male | +```` + +This event file corresponds to a stop-signal experiment. +Participants were presented with faces and had to decide the sex of the face +by pressing a button with left or right hand. +However, if a stop signal occurred before this selection, the participant was to refrain from responding. + +The structure of this file corresponds to the [**BIDS**](https://bids.neuroimaging.io/) format +for event files. +The first column, which must be called `onset` represents the time from the start of recording +in seconds of the temporal marker represented by that row in the file. +In this case that temporal marker represents the presentation of a face image. + +Notice that the *stop_signal_delay* and *response_time* columns contain information +about additional events (when a trial stop signal was presented and when the participant +pushed a button). +These events are encoded implicitly as offsets from the presentation of the go signal. +Each row is the file encodes information for an entire trial rather than what occurred +at a single temporal marker. +This strategy is known as *trial-level* encoding. + +Our goal is to represent all the trial events (e.g., go signal, stop signal, and response) +in separate rows of the event file using the *split_rows* restructuring operation. +The following example shows the remodeling operations to perform the splitting. + +````{admonition} Example of split_rows operation for the AOMIC stop signal task. +:class: tip + +```json +[ + { + "operation": "split_rows", + "description": "Split response event from trial event based on response_time column.", + "parameters": { + "anchor_column": "trial_type", + "new_events": { + "response": { + "onset_source": ["response_time"], + "duration": [0], + "copy_columns": ["response_accuracy", "response_hand"] + }, + "stop_signal": { + "onset_source": ["stop_signal_delay"], + "duration": [0.5] + } + }, + "remove_parent_row": false + } + } +] +``` +```` + +The example uses the *split_rows* operation to convert this +file from trial encoding to event encoding. +In trial encoding each event marker (row in the event file) represents +all the information in a single trial. +Event markers such as the participant's response key-press are encoded implicitly +as an offset from the stimulus presentation. +while event encoding includes event markers for each individual event within the trial. + +The [**Split rows**](./HedRemodelingTools#split-rows) +explanation under [**HED remodeling tools**](./HedRemodelingTools) +shows the required parameters for the *split_rows* operation. +The required parameters are *anchor_column*, *new_events*, and *remove_parent_row*. + +The *anchor_column* is the column we want to add new events corresponding to the stop signal and the response. +In this case we are going to add events to an existing column: *trial_type*. +The new events will be in new rows and the existing rows will not be overwritten +because *remove_parent_event* is false. +(After splitting we may want to rename *trial_type* to *event_type* since the individual +rows in the data file no longer represent trials, but individual events within the trial.) + +Next we specify how the new events are generated in the *new_events* dictionary. +Each type of new event has a name, which is a key in the *new_events* dictionary. +Each key is associated with a dictionary +specifying the values of the following parameters. + +* *onset_source* +* *duration* +* *copy_columns* + +The *onset_source* is a list indicating how to calculate the onset for the new event +relative to the onset of the anchor event. +The list contains any combination of column names and numerical values, +which are evaluated and added to the onset value of the row being split. +Column names are evaluated to the row values in the corresponding columns. + +In our example, the response time and stop signal delay are calculated relative to the trial's onset, +so we only need to add the value from the respective column. +Note that these new events do not exist for every trial. +Rows where there was no stop signal have an `n/a` in the *stop_signal_delay* column. +This is processed automatically, and remodeler does not create new events +when any items in the *onset_source* list is missing or `n/a`. + +The *duration* specifies the duration for the new events. +The AOMIC data did not measure the durations of the button presses, +so we set the duration of the response event to 0. +The AOMIC data report indicates that the stop signal lasted 500 ms. + +The *copy_columns* is an optional parameter indicating which columns from the parent event should be copied to the +newly-created event. +We would like to transfer the *response_accuracy* and the *response_hand* information to the *response* event. +Since no extra column values are to be transferred for *stop_signal*, columns other than *onset*, *duration*, +and *trial_type* are filled with `n/a`. + + +The final remodeling file can be found at: +[**finished json remodeler**](./_static/data/AOMIC_splitevents_rmdl.json) + +(remodeling-file-locations-anchor)= +### Remodeling file locations + +The remodeling tools expect the full path for the JSON remodeling operation file to be given +when the remodeling is executed. +However, it is a good practice to include all remodeling files used with the dataset. +The JSON remodeling operation files are usually located in the +`derivatives/remodel/remodeling_files` subdirectory below the dataset root, +and have file names ending in `_rmdl.json`. + +The backups are always in the `derivatives/remodel/backups` subdirectory under the dataset root. +Summaries produced by the restructuring tools are located in `derivatives/remodel/summaries`. + +In the next section we will go over several ways to call the remodeler. + +(using-the-remodeling-tools-anchor)= +## Using the remodeling tools + +The remodeler can be called in a number of ways including using online tools and from the command line. +The following sections explain various ways to use the available tools. + +(online-tools-for-debugging-anchor)= +### Online tools for debugging + +Although the event restructuring tools are designed to be run on an entire dataset, +you should consider working with a single data file during debugging. +The HED online tools provide support for debugging your remodeling script and for +seeing the effect of remodeling on a single data file before running on the entire dataset. +You can access these tools on the [**HED tools online tools server**](https://hedtools.ucsd.edu/hed). + +To use the online remodeling tools, navigate to the events page and select the *Execute remodel script* action. +Browse to select the data file to be remodeled and the JSON remodel file +containing the remodeling operations. +The following screenshot shows these selections for the split rows example of the previous section. + +![Remodeling tools online](./_static/images/RemodelingOnline.png) + +Press the *Process* button to complete the action. +If the remodeling script has errors, +the result will be a downloaded text file with the errors identified. +If the remodeling script is correct, +the result will be a data file with the remodeling transformations applied. +If the remodeling script contains summarization operations, +the result will be a zip file with the modified data file and the summaries included. + +If you are using one of the remodeling operations that relies on HED tags, you will +also need to upload a suitable JSON sidecar file containing the HED annotations for the data file +if you turn the *Include summaries* option on. + +(the-command-line-interface-anchor)= +### The command-line interface + +After [**installing the remodeler**](./HedRemodelingTools#installing-the-remodel-tools), +you can run the tools on a full BIDS dataset, +or on any directory using the command-line interface using +`run_remodel_backup`, `run_remodel`, and `run_remodel_restore`. +A full overview of all arguments is available at +[**HED remodeling tools**](./HedRemodelingTools#remodel-command-line-arguments). + +The `run_remodel_backup` is usually run only once for a dataset. +It makes the baseline backup of the event files to assure that nothing will be lost. +The remodeling always starts from the backup files. + +The `run_remodel` restores the data files from the corresponding backup files and then +executes remodeling operations from a JSON file. +A sample command line call for `run_remodel` is shown in the following example. + +(remodel-run-anchor)= +````{admonition} Command to run a summary for the AOMIC dataset. +:class: tip + +```bash +python run_remodel /data/ds002790 /data/ds002790/derivatives/remodel/remodeling_files/AOMIC_summarize_rmdl.json \ + -b -s .txt -x derivatives + +``` +```` + +The parameters are as follows: + +* `data_dir` - (Required first argument) Root directory of the dataset. +* `model_path` - (Required second argument) Path of JSON file with remodeling operations. +* `-b` - (Optional) If present, assume BIDS formatted data. +* `-s` - (Optional) list of formats to save summaries in. +* `-x` - (Optional) List of directories to exclude from event processing. + +There are three types of command line arguments: + +[**Positional arguments**](./HedRemodelingTools#positional-arguments), +[**Named arguments**](./HedRemodelingTools#named-arguments), +and [**Named arguments with values**](./HedRemodelingTools#named-arguments). + +The positional arguments, `data_dir` and `model_path` are not optional and must +be the first and second arguments to `run_remodel`, respectively. +The named arguments (with and without values) are optional. +They all have default values if omitted. + +The `-b` option is a named argument indicating whether the dataset is in BIDS format. +If in BIDS format, the remodeling tools can extract information such as the HED schema +and the HED annotations from the dataset. +BIDS data file names are unique, which is convenient for reporting summary information. +Name arguments are flags-- their presence indicates true and absence indicates false. + +The `-s` and `-x` options are examples of named arguments with values. +The `-s .txt` specifies that summaries should be saved in text format. +The `-x derivatives` indicates that the `derivatives` subdirectory should not +be processed during remodeling. + +This script can be run multiple times without doing backups and restores, +since it always starts with the backed up files. + +The first argument of the command line scripts is the full path to the root directory of the dataset. +The `run_remodel` requires the full path of the json remodeler file as the second argument. +A number of optional key-value arguments are also available. + +After the `run_remodel` finishes, it overwrites the data files (not the backups) +and writes any requested summaries in `derivatives/remodel/summaries`. + +The summaries will be written to `/data/ds002790/derivatives/remodel/summaries` folder in text format. +By default, the summary operations will return both. + +The [**summary file**](./_static/data/AOMIC_column_names_2022_12_21_T_13_12_35_641062.txt) lists all different column combinations and for each combination, the files with those columns. +Looking at the different column combinations you can see there are three, one for each task that was performed for this dataset. + +Going back to the [**split rows example**](#more-complex-remodeling) of remodeling, +we see that splitting the rows into multiple rows only makes sense if the event files have the same columns. +Only the event files for the stop signal task contain the `stop_signal_delay` column and the `response_time` column. +The summarizing the column names across the dataset allows users to check whether the column +names are consistent across the dataset. +A common use case for BIDS datasets is that the event files have a different structure +for different tasks. +The `-t` command-line option allows users to specify which tasks to perform remodeling on. +Using this option allows users to select only the files that have the specified task names +in their filenames. + + +Now you can try out the *split_rows* on the full dataset! + + +(jupyter-notebooks-for-remodeling-anchor)= +### Jupyter notebooks for remodeling + +Three Jupyter remodeling notebooks are available at +[**Jupyter notebooks for remodeling**](https://github.com/hed-standard/hed-examples/tree/main/src/jupyter_notebooks/remodeling). + +These notebooks are wrappers that create the backup as well as run restructuring operations on data files. +If you do not have access to a Jupyter notebook facility, the article +[Six easy ways to run your Jupyter Notebook in the cloud](https://www.dataschool.io/cloud-services-for-jupyter-notebook/) discusses various no-cost options +for running Jupyter notebooks online. diff --git a/docs/source/FileRemodelingTools.md b/docs/source/HedRemodelingTools.md similarity index 97% rename from docs/source/FileRemodelingTools.md rename to docs/source/HedRemodelingTools.md index 65216bf..ce3decb 100644 --- a/docs/source/FileRemodelingTools.md +++ b/docs/source/HedRemodelingTools.md @@ -1,2796 +1,2796 @@ -(file-remodeling-tools-anchor)= -# File remodeling tools - -**Remodeling** refers to the process of transforming a tabular file -into a different form in order to disambiguate the -information or to facilitate a particular analysis. -The remodeling operations are specified in a JSON (`.json`) file, -giving a record of the transformations performed. - -There are two types of remodeling operations: **transformation** and **summarization**. -The **transformation** operations modify the tabular files, -while **summarization** produces an auxiliary information file but leaves -the tabular files unchanged. - -The file remodeling tools can be applied to any tab-separated value (`.tsv`) file -but are particularly useful for restructuring files representing experimental events. -Please read the [**File remodeling quickstart**](./FileRemodelingQuickstart.md) -tutorials for an introduction and basic use of the tools. - -The file remodeling tools can be applied to individual files using the -[**HED online tools**](https://hedtools.ucsd.edu/hed) or to entire datasets -using the [**remodel command-line interface**](remodel-command-line-interface-anchor) -either by calling Python scripts directly from the command line -or by embedding calls in a Jupyter notebook. -The tools are also available as -[**HED RESTful services**](./HedOnlineTools.md#hed-restful-services). -The online tools are particularly useful for debugging. - -This user's guide contains the following topics: - -* [**Overview of remodeling**](overview-of-remodeling-anchor) -* [**Installing the remodel tools**](installing-the-remodel-tools-anchor) -* [**Remodel command-line interface**](remodel-command-line-interface-anchor) -* [**Remodel scripts**](remodel-scripts-anchor) - * [**Backing up files**](backing-up-files-anchor) - * [**Remodeling files**](remodeling-files-anchor) - * [**Restoring files**](restoring-files-anchor) -* [**Remodel with HED**](remodel-with-hed-anchor) -* [**Remodel sample files**](remodel-sample-files-anchor) - * [**Sample remodel file**](sample-remodel-file-anchor) - * [**Sample remodel event file**](sample-remodel-event-file-anchor) - * [**Sample remodel sidecar file**](sample-remodel-sidecar-file-anchor) -* [**Remodel transformations**](remodel-transformations-anchor) - * [**Factor column**](factor-column-anchor) - * [**Factor HED tags**](factor-hed-tags-anchor) - * [**Factor HED type**](factor-hed-type-anchor) - * [**Merge consecutive**](merge-consecutive-anchor) - * [**Remap columns**](remap-columns-anchor) - * [**Remove columns**](remove-columns-anchor) - * [**Remove rows**](remove-rows-anchor) - * [**Rename columns**](rename-columns-anchor) - * [**Reorder columns**](reorder-columns-anchor) - * [**Split rows**](split-rows-anchor) -* [**Remodel summarizations**](remodel-summarizations-anchor) - * [**Summarize column names**](summarize-column-names-anchor) - * [**Summarize column values**](summarize-column-values-anchor) - * [**Summarize definitions**](summarize-definitions-anchor) - * [**Summarize hed tags**](summarize-hed-tags-anchor) - * [**Summarize hed type**](summarize-hed-type-anchor) - * [**Summarize hed validation**](summarize-hed-validation-anchor) - * [**Summarize sidecar from events**](summarize-sidecar-from-events-anchor) -* [**Remodel implementation**](remodel-implementation-anchor) - - -(overview-of-remodeling-anchor)= -## Overview of remodeling - -Remodeling consists of restructuring and/or extracting information from tab-separated -value files based on a specified list of operations contained in a JSON file. - -Internally, the remodeling operations represent the tabular file using a -[**Pandas DataFrame**](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html). - -(transformation-operations-anchor)= -### Transformation operations - -**Transformation** operations, shown schematically in the following -figure, are designed to transform an incoming tabular file -into a new DataFrame without modifying the incoming data. - -![Transformation operations](./_static/images/TransformationOperations.png) - -Transformation operations are stateless and do not save any context information or -affect future applications of the transformation. - -Transformations, themselves, do not have any output and just return a new, -transformed DataFrame. -In other words, transformations do not operate in place on the incoming DataFrame, -but rather, they create a new DataFrame containing the result. - -Typically, the calling program is responsible for reading and saving the tabular file, -so the user can choose whether to overwrite or create a new file. - -See the [**remodeling tool program interface**](remodel-command-line-interface-anchor) -section for information on how to call the operations. - -(summarization-operations-anchor)= -### Summarization operations - -**Summarization** operations do not modify the input DataFrame but rather extract and save information in an internally stored summary dictionary as shown schematically in the following figure. - -![Summary operations](./_static/images/SummaryOperation.png) - -The dispatcher that executes remodeling operations can be interrogated at any time -for the state information contained in the global summary dictionary and can save additional summary information at any time during execution. -Usually summaries are dumped at the end of processing to the `derivatives/remodel/summaries` -subdirectory under the dataset root. - -Summarization operations may appear anywhere in the operation list, -and the same type of summary may appear multiple times under different names in order to track progress. - -The dispatcher stores information from each uniquely named summarization operation -as a separate summary dictionary entry. -Within its summary information, most summarization operations keep a separate -summary for each individual file and have methods to create an overall summary -of the information for all the files that have been processed by the summarization. - -Summarization results are available in JSON (`.json`) and text (`.txt`) formats. - -(available-operations-anchor)= -### Available operations - -The following table lists the available remodeling operations with brief example use cases -and links to further documentation. Operations not listed in the summarize section are transformations. - -(remodel-operation-summary-anchor)= -````{table} Summary of the HED remodeling operations for tabular files. -| Category | Operation | Example use case | -| -------- | ------- | -----| -| **clean-up** | | | -| | [*remove_columns*](remove-columns-anchor) | Remove temporary columns created during restructuring. | -| | [*remove_rows*](remove-rows-anchor) | Remove rows with n/a values in a specified column. | -| | [*rename_columns*](rename-columns-anchor) | Make columns names consistent across a dataset. | -| | [*reorder_columns*](reorder-columns-anchor) | Make column order consistent across a dataset. | -| **factor** | | | -| | [*factor_column*](factor-column-anchor) | Extract factor vectors from a column of condition variables. | -| | [*factor_hed_tags*](factor-hed-tags-anchor) | Extract factor vectors from search queries of HED annotations. | -| | [*factor_hed_type*](factor-hed-type-anchor) | Extract design matrices and/or condition variables. | -| **restructure** | | | -| | [*merge_consecutive*](merge-consecutive-anchor) | Replace multiple consecutive events of the same type
with one event of longer duration. | -| | [*remap_columns*](remap-columns-anchor) | Create m columns from values in n columns (for recoding). | -| | [*split_rows*](split-rows-anchor) | Split trial-encoded rows into multiple events. | -| **summarize** | | | -| | [*summarize_column_names*](summarize-column-names-anchor) | Summarize column names and order in the files. | -| | [*summarize_column_values*](summarize-column-values-anchor) | Count the occurrences of the unique column values. | -| | [*summarize_hed_tags*](summarize-hed-tags-anchor) | Summarize the HED tags present in the
HED annotations for the dataset. | -| | [*summarize_hed_type*](summarize-hed-type-anchor) | Summarize the detailed usage of a particular type tag
such as *Condition-variable* or *Task*
(used to automatically extract experimental designs). | -| | [*summarize_hed_validation*](summarize-hed-validation-anchor) | Validate the data files and report any errors. | -| | [*summarize_sidecar_from_events*](summarize-sidecar-from-events-anchor) | Generate a sidecar template from an event file. | -```` - -The **clean-up** operations are used at various phases of restructuring to assure consistency -across dataset files. - -The **factor** operations produce column vectors with the same number of rows as the data file -from which they were calculated. -They encode condition variables, design matrices, or other search criteria. -See the [**HED conditions and design matrices**](./HedConditionsAndDesignMatrices.md) -for more information on factoring and analysis. - -The **restructure** operations modify the way in which a data file represents its information. - -The **summarize** operations produce dataset-wide summaries of various aspects of the data files -as well as summaries of the individual files. - -(installing-the-remodel-tools-anchor)= -## Installing the remodel tools - -The remodeling tools are available in the GitHub -[**hed-python**](https://github.com/hed-standard/hed-python) repository -along with other tools for data cleaning and curation. -Although version 0.1.0 of this repository is available on [**PyPI**](https://pypi.org/) -as `hedtools`, the version containing the restructuring tools (Version 0.2.0) -is still under development and has not been officially released. -However, the code is publicly available on the `develop` branch of the -hed-python repository and -can be directly installed from GitHub using `pip`: - -```text -pip install git+https://github.com/hed-standard/hed-python/@develop -``` - -The web services and online tools supporting remodeling are available -on the [**HED online tools dev server**](https://hedtools.ucsd.edu/hed_dev). -When version 0.2.0 of `hedtools` is officially released on PyPI, restructuring -will become available on the released [**HED online tools**](https://hedtools.ucsd.edu/hed). -A docker version is also under development. - -The following diagram shows a schematic of the remodeling process. - -![Event remodeling process](./_static/images/EventRemappingProcess.png) - -Initially, the user creates a backup of the specified tabular files (usually `events.tsv` files). -This backup is a mirror of the data files in the dataset, -but is located in the `derivatives/remodel/backups` directory and never modified once the backup is created. - -Remodeling applies a sequence of operations specified in a JSON remodel file -to the backup versions of the data files. -The JSON remodel file provides a record of the operations performed on the file. -If the user detects a mistake in the transformations, -he/she can correct the transformation file and rerun the transformations. - -Remodeling always runs on the original backup version of the file rather than -the transformed version, so the transformations can always be corrected and rerun. -It is possible to by-pass the backup, particularly if only using summarization operations, -but this is not recommended and should be done with care. - -(remodel-command-line-interface-anchor)= -## Remodel command-line interface - -The remodeling toolbox provides Python scripts with command-line interfaces -to create or restore backups and to apply operations to the files in a dataset. -The file remodeling tools may be applied to datasets that are in free form under a directory root -or that are in [**BIDS-format**](https://bids.neuroimaging.io/). -Some operations use [**HED (Hierarchical Event Descriptors)**](./IntroductionToHed.md) annotations. -See the [**Remodel with HED**](remodel-with-hed-anchor) section for a discussion -of these operations and how to use them. - -The remodeling command-line interface can be used from the command line, -called from another Python program, or used in a Jupyter notebooks. -Example Jupyter notebooks using the remodeling commands can be found -[**here**](https://github.com/hed-standard/hed-examples/tree/main/src/jupyter_notebooks/remodeling). - - -(calling-remodel-tools-anchor)= -### Calling remodel tools - -The remodeling tools provide three Python programs for backup (`run_remodel_backup`), -remodeling (`run_remodel`) and restoring (`run_remodel_restore`) event files. -These programs can be called from the command line or from another Python program. - -The programs use a standard command-line argument list for specifying input as summarized in the following table. - -(remodeling-operation-summary-anchor)= -````{table} Summary of command-line arguments for the remodeling programs. -| Script name | Arguments | Purpose | -| ----------- | -------- | ------- | -|*run_remodel_backup* | *data_dir*
*-bd -\\-backup-dir*
*-bn -\\-backup-name*
*-e -\\-extensions*
*-f -\\-file-suffix*
*-t -\\-task-names*
*-v -\\-verbose*
*-x -\\-exclude-dirs*| Create a backup event files. | -|*run_remodel* | *data_dir*
*model_path*
*-b -\\-bids-format*
*-bd -\\-backup-dir*
*-bn -\\-backup-name*
*-e -\\-extensions*
*-f -\\-file-suffix*
*-i -\\-individual-summaries*
*-j -\\-json-sidecar*
*-ld -\\-log-dir*
*-nb -\\-no-backup*
*-ns -\\-no-summaries*
*-nu -\\-no-update*
*-r -\\-hed-version*
*-s -\\-save-formats*
*-t -\\-task-names*
*-v -\\-verbose*
*-w -\\-work-dir*
*-x -\\-exclude-dirs* | Restructure or summarize the event files. | -|*run_remodel_restore* | *data_dir*
*-bd -\\-backup-dir*
*-bn -\\-backup-name*
*-t -\\-task-names*
*-v -\\-verbose*
| Restore a backup of event files. | - -```` -All the scripts have a required argument, which is the full path of the dataset root (*data_dir*). -The `run_remodel` program has a required parameter which is the full path of a JSON file -containing a specification of the remodeling commands to be run. - -(remodel-command-line-arguments-anchor)= -### Remodel command-line arguments - -This section describes the arguments that are used for the remodeling command-line interface -with examples and more details. - -#### Positional arguments - -Positional arguments are required and must be given in the order specified. - -`data_dir` -> The full path of dataset root directory. - -`model_path` -> The full path of the JSON remodel file (for *run_remodel* only). -> -#### Named arguments - -Named arguments consist of a key starting with a hyphen and are possibly followed by a value. -Named arguments can be given in any order or omitted. -If omitted, a specified default is used. -Argument keys and values are separated by spaces. - -For argument values that are lists, the key is given followed by the items in the list, -all separated by spaces. - -Each command has two different forms of the key name: a short form (a single hyphen followed by a single character) -and a longer form (two hyphens followed by a more self-explanatory name). -Users are free to use either form. - -`-b`, `--bids-format` -> If this flag present, the dataset is in BIDS format with sidecars. Tabular files and their associated sidecars are located using BIDS naming. - -`-bd`, `--backup-dir` -> The path to the directory holding the backups (default: `[data_root]/derivatives/remodel/backups`). -> Use the `-nb` option if you wish to omit the backup (in `run_remodel`). - -`-bn`, `--backup-name` -> The name of the backup used for the remodeling (default: `default_back`). - -`-e`, `--extensions` -> This option is followed by a list of file extension(s) of the data files to process. -> The default is `.tsv`. Comma separated tabular files are not permitted. - -`-f`, `--file-suffix` -> This option is followed by the suffix names of the files to be processed. -> For example `events` (the default) captures files named `events.tsv` if the default extension is used. -> The filename without the extension must end in one of the specified suffixes in order to be -> backed up or transformed. - -`-i`, `--individual-summaries` -> This option offers a choice among three options: -> - `separate`: Individual summaries for each file in separate files in addition the overall summary. -> - `consolidated`: Individual summaries written in the same file as the overall summary. -> - `none`: Only an overall summary. - -`-j`, `--json-sidecar` -> This option is followed by the full path of the JSON sidecar with HED annotations to be -> applied during the processing of HED-related remodeling operations. - -`-ld`, `--log-dir` -> This option is followed by the full path of a directory for writing log files. -> A log file is written if the remodeling tools raise an exception and the program terminates. -> Note that a log file is not written for issues gathered during operations such as `summarize_hed_valistion` -> because reporting HED validation errors is a normal part of this operation. -> On the other hand, errors in the JSON remodeling file do raise and exception and are reported in the log. - -`-nb`, `--no-backup` -> If present, no backup is used. Rather operations are performed directly on the files. - -`-ns`, `--no-summaries` -> If present, no summary files are output. - -`-nu`, `--no-update` -> If present, the modified files are not output. - -`-r`, `--hed-versions` -> This option is followed by one or more HED versions. Versions of the standard schema are specified -> by their semantic versions (e.g., `8.1.0`), while library schema versions are prefixed by their -> library name (e.g., `score_1.0.0`). - -> If more than one HED schema version is given, all but one of the versions must start with an -> additional namespace designator (e.g., `sc:`). At most one version can omit the namespace designator -> when multiple schema are being used. In annotations, tags must start with the namespace -> designator of the corresponding schema from which they were selected (e.g. `sc:Sleep-modulator` -> if the SCORE library was designated by `sc:score_1.0.0`). - -`-s`, `--save-formats` -> This option is followed by the extensions (including .) of the formats in which -> to save summaries (default: `.txt` `.json`). - -`-t`, `--task-names` -> The name(s) of the tasks to be included (for BIDS-formatted files only). -> When a dataset includes multiple tasks, the event files are often structured -> differently for each task and thus require different transformation files. -> This option allows the backups and operations to be restricted to an individual task. - -> If this option is omitted, all tasks are used. This means that all `events.tsv` files are -> restored from a backup if the backup is used, the operations are performed on all `events.tsv` files, and summaries are combined over all tasks. - -> If a list of specific task names follows this option, only datafiles corresponding to -> the listed tasks are processed giving separate summaries for each listed task. - -> If a "*" follows this option, all event files are processed and separate summaries are created for each task. - -> Task detection follows the BIDS convention. Tasks are detected by finding "task-x" in the file names of `events.tsv` files. Here x is the name of the task. The task name is followed by an underbar, by a period, or be at the end of the filename. - -`-v`, `--verbose` -> If present, more comprehensive messages documenting transformation progress -> are printed to standard output. - -`-w`, `--work-dir` -> The path to the remodeling work root directory --both for summaries (default: `[data_root]/derivatives/remodel`). -> Use the `-nb` option if you wish to omit the backup (in `run_remodel`). - -`-x`, `--exclude-dirs` -> The directories to exclude when gathering the data files to process. -> For BIDS datasets, these are typically `derivatives`, `stimuli`, and `sourcecode`. -> Any subdirectory with a path component named `remodel` is automatically excluded from remodeling, as -> these directories are reserved for storing backup, state, and result information for the remodeling process itself. - -(remodel-scripts-anchor)= -## Remodel scripts - -This section discusses the three main remodeling scripts with command-line interfaces -to support backup, remodeling, and restoring the tabular files used in the remodeling process. -These scripts can be run from the command line or from another Python program using a function call. - -(backing-up-files-anchor)= -### Backing up files - -The `run_remodel_backup` Python program creates a backup of the specified files. -The backup is always created in the `derivatives/remodel/backups` subdirectory -under the dataset root as shown in the following example for the -sample dataset `eeg_ds003645s_hed_remodel`, -which can be found in the `datasets` subdirectory of the -[**hed-examples**](https://github.com/hed-standard/hed-examples) GitHub repository. - -![Remodeling backup structure](./_static/images/RemodelingBackupStructure.png) - - -The backup process creates a mirror of the directory structure of the source files to be backed up -in the directory `derivatives/remodel/backups/backup_name/backup_root` as shown in the figure above. -The default backup name is `default_back`. - -In the above example, the backup has subdirectories `sub-002` and `sub-003` just -like the main directory of the dataset. -These subdirectories only contain backups of the files to be transformed -(by default files with names ending in `events.tsv`). - -In addition to the `backup_root`, the backup directory also contains a dictionary of backup files -in the `backup_lock.json` file. This dictionary is used internally by the remodeling tools. -The backup should be created once and not modified by the user. - -The following example shows how to run the `run_remodel_backup` program from the command line -to back up the dataset located at `/datasets/eeg_ds003645s_hed_remodel`. - -(remodel-backup-anchor)= -````{admonition} Example of calling run_remodel_backup from the command line. -:class: tip - -```bash -python run_remodel_backup /datasets/eeg_ds003645s_hed_remodel -x derivatives stimuli - -``` -```` - -Since the `-f` and `-e` arguments are not given, the default file suffix and extension values -apply, so only files of the form `events.tsv` are backed up. -The `-x` option excludes any source files from the `derivatives` and `stimuli` subdirectories. -These choices can be overridden using additional command-line arguments. - -The following shows how the `run_remodel_backup` program can be called from a -Python program or a Jupyter notebook. -The command-line arguments are given in a list instead of on the command line. - -(remodel-backup-jupyter-anchor)= -````{admonition} Example of Python code to call run_remodel_backup using a function call. -:class: tip - -```python - -import hed.tools.remodeling.cli.run_remodel_backup as cli_backup - -data_root = '/datasets/eeg_ds003645s_hed_remodel' -arg_list = [data_root, '-x', 'derivatives', 'stimuli'] -cli_backup.main(arg_list) - -``` -```` - -During remodeling, each file in the source is associated with a backup file using -its relative path from the dataset root. -Remodeling is performed by reading the backup file, performing the operations specified in the -JSON remodel file, and overwriting the source file as needed. - -Users can also create alternatively named backups by providing the `-n` argument with a backup name to -the `run_remodel_backup` program. -To use backup files from another named backup, call the remodeling program with -the `-n` argument and the correct backup name. -Named backups can provide checkpoints to allow the execution of -transformations to start from intermediate points. - -**NOTE**: You should not delete backups, even if you have created multiple named backups. -The backups provide useful state and provenance information about the data. - -(remodeling-files-anchor)= -### Remodeling files - -Remodeling consists of applying a sequence of operations from the -[**remodel operation summary**](remodel-operation-summary-anchor) -to successively transform each backup file according to the instructions -and to overwrite the actual files with the final result. - -If the dataset has no backups, the actual data files rather than the backups are transformed. -You are expected to [**create the backup**](backing-up-files-anchor) (just once) -before running the remodeling operations. -Going without backup is not recommended unless you are only doing summarization operations. - -The operations are specified as a list of dictionaries in a JSON file in the -[**remodel sample files**](remodel-sample-files-anchor) as discussed below. - -Before running remodeling transformations on an entire dataset, -consider using the [**HED online tools**](https://hedtools.ucsd.edu/hed) -to debug your remodeling operation file on a single file. -The remodeling process always starts with the original backup files, -so the usual development path is to incrementally add operations to the end -of your transformation JSON file as you develop and test on a single file -until you have the desired end result. - -The following example shows how to run a remodeling script from the command line. -The example assumes that the backup has already been created for the dataset. - -(run-remodel-anchor)= -````{admonition} Example of calling run_remodel from the command line. -:class: tip - -```bash -python run_remodel /datasets/eeg_ds003645s_hed_remodel /datasets/remove_extra_rmdl.json -x derivatives simuli - -``` -```` - -The script has two required arguments the dataset root and the path to the JSON remodel file. -Usually, the JSON remodel files are stored with the dataset itself in the -`derivatives/remodel/remodeling_files` subdirectory, but common scripts can be stored in a central place elsewhere. - -The additional keyword option, `-x` in the example indicates that directory paths containing the component `derivatives` or the component `stimuli` should be excluded. -Excluded directories need not have their excluded path component at the top level of the dataset. -Subdirectory paths containing the `remodel` path component are automatically excluded. - -The command-line interface can also be used in a Jupyter notebook or as part of a larger Python -program by calling the `main` function with the equivalent command-line arguments provided -in a list with the positional arguments appearing first. - -The following example shows Python code to remodel a dataset using the command-line interface. -This code can be used in a Jupyter notebook or in another Python program. - -````{admonition} Example Python code to call run_remodel using a function call. -:class: tip - -```python -import hed.tools.remodeling.cli.run_remodel as cli_remodel - -data_root = '/datasets/eeg_ds003645s_hed_remodel' -model_path = '/datasets/remove_extra_rmdl.json' -arg_list = [data_root, model_path, '-x', 'derivatives', 'stimuli'] -cli_remodel.main(arg_list) - -``` -```` - -(restoring-files-anchor)= -### Restoring files - -Since remodeling always uses the backed up version of each data file, -there is no need to restore these files to their original state -between remodeling runs. -However, when finished with an analysis, -you may want to restore the data files to their original state. - -The following example shows how to call `run_remodel_restore` to -restore the data files from the default backup. -The restore operation restores all the files in the specified backup. - -(run-remodel-restore-anchor)= -````{admonition} Example of calling run_remodel_restore from the command line. -:class: tip - -```bash -python run_remodel_restore /datasets/eeg_ds003645s_hed_remodel - -``` -```` - -As with the other command-line programs, `run_remodel_restore` can be also called using a function call. - -````{admonition} Example Python code to call *run_remodel_restore* using a function call. -:class: tip - -```python -import hed.tools.remodeling.cli.run_restore as cli_remodel - -data_root = '/datasets/eeg_ds003645s_hed_remodel' -cli_remodel.main([data_root]) - -``` -```` -(remodel-with-hed-anchor)= -## Remodel with HED - -[**HED**](introduction-to-hed-anchor) (Hierarchical Event Descriptors) is a -system for annotating data in a manner that is both human-understandable and machine-actionable. -HED provides much more detail about the events and their meanings, -If you are new to HED see the -[**HED annotation quickstart**](./HedAnnotationQuickstart.md). -For information about HED's integration into BIDS (Brain Imaging Data Structure) see -[**BIDS annotation quickstart**](./BidsAnnotationQuickstart.md). - -Currently, five remodeling operations rely on HED annotations: -- [**factor_hed_tags**](factor-hed-tags-anchor) -- [**factor_hed_type**](factor-hed-type-anchor) -- [**summarize_hed_tags**](summarize-hed-tags-anchor) -- [**summarize_hed_type**](summarize-hed-type-anchor) -- [**summarize_hed_validation**](summarize-hed-validation-anchor). - -HED tags provide a mechanism for advanced data analysis and for -extracting experiment-specific information from the data files. -However, since HED information is not always stored in the data files themselves, -you may need to provide a HED schema and a JSON sidecar. - -The HED schema defines the allowed HED tag vocabulary, and the JSON sidecar -associates HED annotations with the information in the columns of the event files. -If you are not using any of the HED operations in your remodeling, -you do not have to provide this information. - - -(extracting-hed-information-from-bids-anchor)= -### Extracting HED information from BIDS - -The simplest way to use HED with `run_remodel` is to use the `-b` option, -which indicates that the dataset is in [**BIDS**](https://bids.neuroimaging.io/) (Brain Imaging Data Structure) format. - -BIDS is a standardized way of organizing neuroimaging data. -HED and BIDS are well integrated. -If you are new to BIDS, see the -[**BIDS annotation quickstart**](./BidsAnnotationQuickstart.md). - -A HED-annotated BIDS dataset provides the HED schema version in the `dataset_description.json` -file located directly under the BIDS dataset root. - -BIDS datasets must have filenames in a specific format, -and the HED tools can locate the relevant JSON sidecars for each data file based on this information. - - -(directly-specifying-hed-information-anchor)= -### Directly specifying HED information - -If your data is already in BIDS format, using the `-b` option is ideal since -the needed information can be located automatically. -However, early in the experimental process, -your datafiles are not likely to be organized in BIDS format, -and this option will not be available if you want to use HED. - -Without the `-b` option, the remodeling tools locate the appropriate files based -on specified filename suffixes and extensions. -In order to use HED operations, you must explicitly specify the HED versions -using the `-r` option. -The `-r` option supports a list of HED versions if multiple HED schemas are used. -For example: `-r 8.1.0 sc:score_1.0.0` specifies that vocabulary will be drawn -from standard HED Version 8.1.0 and from -HED SCORE library version 1.0.0. -Annotations containing tags from SCORE should be prefixed with `sc:`. -Note: both of the schemas can be viewed by the [**HED Schema Viewer**](https://www.hedtags.org/display_hed.html). - -Usually, annotators will consolidate HED annotations in a single JSON sidecar file -located at the top-level of the dataset. -The path of this sidecar can be passed as a command-line argument using the `-j` option. -If more than one JSON sidecar file contains HED annotations, users will need to call the lower-level -remodeling functions to perform these operations. - -The following example illustrates a command-line call that passes both a HED schema version and -the path to the JSON file with the HED annotations. - -(run-remodel-with-hed-direct-anchor)= -````{admonition} Remodeling a non-BIDS dataset using HED. -:class: tip - -```bash -python run_remodel /datasets/eeg_ds003645s_hed_remodel /datasets/summarize_conditions_rmdl.json \ --x derivatives simuli -r 8.1.0 -j /datasets/eeg_ds003645s_hed_remodel/task-FacePerception_events.json - -``` -```` - -(remodel-with-hed-direct-python-anchor)= -````{admonition} Example Python code to use run_remodel on a non-BIDS dataset. -:class: tip - -```python -import hed.tools.remodeling.cli.run_remodel as cli_remodel - -data_root = '/datasets/eeg_ds003645s_hed_remodel' -model_path = '/datasets/summarize_conditions_rmdl.json' -json_path = '/datasets/eeg_ds003645s_hed_remodel/task-FacePerception_events.json' -arg_list = [data_root, model_path, '-x', 'derivatives', 'stimuli', '-r' 8.1.0 '-j' json_path] -cli_remodel.main(arg_list) - -``` -```` - -(remodel-error-handling-anchor)= -## Remodel error handling - -Errors can occur during several stages in during remodeling and how they are -handled depends on the type of error and where the error occurs. -Except for the validation summary, the underlying remodeling code raises exceptions for most errors. - - -(errors-in-the-remodel-file-anchor)= -### Errors in the remodel file - -Each operation requires specific parameters to execute properly. -The underlying implementation for each operation defines these parameters using a [**json schema**](https://json-schema.org/) -as the `PARAMS` property of the operation's class definition. -The use of the JSON schema allows the remodeler to specify and validate requirements on most of an -operation's parameters using standardized methods. - -The [**remodeler_validator**](https://github.com/hed-standard/hed-python/blob/master/hed/tools/remodeling/remodeler_validator.py) -compiles a JSON schema for the remodeler from individual operations and validates -the remodel file against the compiled JSON schema. The validator should always before executing any remodel operations. - -For example, the command line [**run_remodel**](https://raw.githubusercontent.com/hed-standard/hed-python/develop/hed/tools/remodeling/cli/run_remodel.py) -program calls the validator before executing any operations. -If there are errors, `run_remodel` reports the errors for all operations and exits. -This allows users to correct errors in all operations in one pass without any data modification. -The [**HED online tools**](https://hedtools.org/hed) are particularly useful for debugging -the syntax and other issues in the remodeling process. - -(execution-time-remodel-errors-anchor)= -### Execution-time remodel errors - -When an error occurs during execution, an exception is raised. -Exceptions are raised for invalid or missing files or if a transformed file -is unable to be rewritten due to improper file permissions. -Each individual operation may also raise an exception if the -data file being processed does not have the expected information, -such as a column with a particular name. - -Exceptions raised during execution cause the process to be terminated and no -further files are processed. - - -(remodel-sample-files-anchor)= -## Remodel sample files - -All remodeling operations are specified in a standardized JSON remodel input file. -The following shows the contents of the JSON remodeling file `remove_extra_rmdl.json`, -which contains a single operation with instructions to remove the `value` and `sample` columns -from the data file if the columns exist. - -(sample-remodel-file-anchor)= -### Sample remodel file - -````{admonition} A sample JSON remodeling file with a single remove_columns transformation operation. -:class: tip - -```json -[ - { - "operation": "remove_columns", - "description": "Remove unwanted columns prior to analysis", - "parameters": { - "remove_names": ["value", "sample"] - } - } -] - -``` -```` - -Each operation is specified in a dictionary with three top-level keys: "operation", "description", -and "parameters". The value of the "operation" is the name of the operation. -The "description" value should include the reason this operation was needed, -not just a description of the operation itself. -Finally, the "parameters" value is a dictionary mapping parameter name to -parameter value. - -The parameters for each operation are listed in -[**Remodel transformations**](remodel-transformations-anchor) and -[**Remodel summarizations**](remodel-summarizations-anchor) sections. -An operation may have both required and optional parameters. -Optional parameters may be omitted if unneeded, but all parameters are specified in -the "parameters" section of the dictionary. -The full specification of the remodel file is also provided as a [**JSON schema**](https://json-schema.org/). - -The remodeling JSON files should have names ending in `_rmdl.json` to more easily -distinguish them from other JSON files. -Although these files can be stored anywhere, their preferred location is -in the `derivatives/remodel/models` subdirectory under the dataset root so -that they can provide provenance for the dataset. - -(sample-remodel-event-file-anchor)= -### Sample remodel event file - -Several examples illustrating the remodeling operations use the following excerpt of the stop-go task from sub-0013 -of the AOMIC-PIOP2 dataset available on [**OpenNeuro**](https://openneuro.org) as ds002790. -The full event file is -[**sub-0013_task-stopsignal_acq-seq_events.tsv**](./_static/data/sub-0013_task-stopsignal_acq-seq_events.tsv). - - -````{admonition} Excerpt from an event file from the stop-go task of AOMIC-PIOP2 (ds002790). -| onset | duration | trial_type | stop_signal_delay | response_time | response_accuracy | response_hand | sex | -| ----- | -------- | ---------- | ----------------- | ------------- | ----------------- | ------------- | --- | -| 0.0776 | 0.5083 | go | n/a | 0.565 | |correct | right | female -| 5.5774 | 0.5083 | unsuccesful_stop | 0.2 | 0.49 | correct | right | female | -| 9.5856 | 0.5084 | go | n/a | 0.45 | correct | right | female | -| 13.5939 | 0.5083 | succesful_stop | 0.2 | n/a | n/a | n/a | female | -| 17.1021 | 0.5083 | unsuccesful_stop | 0.25 | 0.633 | correct | left | male | -| 21.6103 | 0.5083 | go | n/a | 0.443 | correct | left | male | -```` - -(Sample-remodel-sidecar-file-anchor)= -### Sample remodel sidecar file - -For remodeling operations that use HED, a JSON sidecar is usually required to provide the -necessary HED annotations. The following JSON sidecar excerpt is used in several examples to -illustrate some of these operations. -The full JSON file can be found at -[**task-stopsiqnal_acq-seq_events.json**](./_static/data/task-stopsignal_acq-seq_events.json). - - -````{admonition} Excerpt of JSON sidecar with HED annotations for the stop-go task of AOMIC-PIOP2. -:class: tip - -```json -{ - "trial_type": { - "HED": { - "succesful_stop": "Sensory-presentation, Visual-presentation, Correct-action, Image, Label/succesful_stop", - "unsuccesful_stop": "Sensory-presentation, Visual-presentation, Incorrect-action, Image, Label/unsuccesful_stop", - "go": "Sensory-presentation, Visual-presentation, Image, Label/go" - } - }, - "stop_signal_delay": { - "HED": "(Auditory-presentation, Delay/# s)" - }, - "sex": { - "HED": { - "male": "Def/Male-image-cond", - "female": "Def/Female-image-cond" - } - }, - "hed_defs": { - "HED": { - "def_male": "(Definition/Male-image-cond, (Condition-variable/Image-sex, (Male, (Image, Face))))", - "def_female": "(Definition/Female-image-cond, (Condition-variable/Image-sex, (Female, (Image, Face))))" - } - } -} -``` -```` -Notice that the JSON file has some keys (e.g., "trial_type", "stop_signal_delay", and "sex") -which also correspond to columns in the events file. -The "hed_defs" key corresponds to an extra entry in the JSON file that, in this case, provides the definitions needed in the HED annotation. - -HED operations also require the HED schema. Most of the examples use HED standard schema version 8.1.0. - -(remodel-transformations-anchor)= -## Remodel transformations - -(factor-column-anchor)= -### Factor column - -The *factor_column* operation appends factor vectors to tabular files -based on the values in a specified file column. -Each factor vector contains a 1 if the corresponding row had that column value and a 0 otherwise. -The *factor_column* is used to reformat event files for analyses such as linear regression -based on column values. - -(factor-column-parameters-anchor)= -#### Factor column parameters - -```{admonition} Parameters for the *factor_column* operation. -:class: tip - -| Parameter | Type | Description | -| ------------ | ---- | ----------- | -| *column_name* | str | The name of the column to be factored.| -| *factor_values* | list | Column values to be included as factors. | -| *factor_names* | list| (**Optional**) Column names for created factors. | -``` - -If *column_name* is not a column in the data file, a `ValueError` is raised. - -If *factor_values* is empty, factors are created for each unique value in *column_name*. -Otherwise, only factors for the specified column values are generated. -If a specified value is missing in a particular file, the corresponding factor column contains all zeros. - -If *factor_names* is empty, the newly created columns are of the -form *column_name.factor_value*. -Otherwise, the newly created columns have names *factor_names*. -If *factor_names* is not empty, then *factor_values* must also be specified -and both lists must be of the same length. - -(factor-column-example-anchor)= -#### Factor column example - -The *factor_column* operation in the following example specifies that factor columns -should be created for *succesful_stop* and *unsuccesful_stop* of the *trial_type* column. -The resulting columns are called *stopped* and *stop_failed*, respectively. - - -````{admonition} A sample JSON file with a single *factor_column* transformation operation. -:class: tip - -```json -[{ - "operation": "factor_column", - "description": "Create factors for the succesful_stop and unsuccesful_stop values.", - "parameters": { - "column_name": "trial_type", - "factor_values": ["succesful_stop", "unsuccesful_stop"], - "factor_names": ["stopped", "stop_failed"] - } -}] -``` -```` - -The results of executing this *factor_column* operation on the -[**sample remodel event file**](sample-remodel-event-file-anchor) are: - -````{admonition} Results of the factor_column operation on the samplepip data. - -| onset | duration | trial_type | stop_signal_delay | response_time | response_accuracy | response_hand | sex | stopped | stop_failed | -| ----- | -------- | ---------- | ----------------- | ------------- | ----------------- | ------------- | --- | ---------- | ---------- | -| 0.0776 | 0.5083 | go | n/a | 0.565 | correct | right | female | 0 | 0 | -| 5.5774 | 0.5083 | unsuccesful_stop | 0.2 | 0.49 | correct | right | female | 0 | 1 | -| 9.5856 | 0.5084 | go | n/a | 0.45 | correct | right | female | 0 | 0 | -| 13.5939 | 0.5083 | succesful_stop | 0.2 | n/a | n/a | n/a | female | 1 | 0 | -| 17.1021 | 0.5083 | unsuccesful_stop | 0.25 | 0.633 | correct | left | male | 0 | 1 | -| 21.6103 | 0.5083 | go | n/a | 0.443 | correct | left | male | 0 | 0 | -```` - -(factor-hed-tags-anchor)= -### Factor HED tags - -The *factor_hed_tags* operation is similar to the *factor_column* operation -in that it produces factor vectors containing 0's and 1, -which are appended to the returned DataFrame. -However, rather than basing these vectors on values in a specified column, -the factors are computed by determining whether the assembled HED annotations for each row -satisfies a specified search query. - -An example search query is whether the assembled HED annotation contains a particular HED tag. -The [**HED search guide**](./HedSearchGuide.md) tutorial discusses the HED search facility in more detail. - - -(factor-hed-tags-parameters-anchor)= -#### Factor HED tags parameters - -```{admonition} Parameters for the *factor_hed_tags* operation. -:class: tip - -| Parameter | Type | Description | -| ------------ | ---- | ----------- | -| *queries* | list | A list of HED query strings. | -| *query_names* | list | (**Optional**) A list of names for the factor columns generated by the queries. | -| *remove_types* | list | (**Optional**) Structural HED tags to be removed (usually `Condition-variable` and `Task`). | -| *expand_context* | bool | (**Optional**: default True) Expand the context and remove
`Onset` and`Offset` tags before the query. | - -``` -The *query_names* list, which must be empty or the same length as *queries*, -contains the names of the factor columns produced by the search. -If the *query_names* list is empty, the result columns are titled "query_1", -"query_2", etc. - -Most of the time the *remove_types* should be set to `["Condition-variable", "Task"]` and the effects of -the experimental design captured using the `factor_hed_types_op`. -If *expand_context* is set to *false*, the additional context provided by `Onset`, `Offset`, and `Duration` -is ignored. - -(factor-hed-tags-example-anchor)= -#### Factor HED tags example - -The *factor_hed-tags* operation in the following example produce two factor -columns with 1's where the HED string for a row contains the `Correct-action` -and `Incorrect-action`, respectively. -The resulting factor columns are named *correct* and *incorrect*, respectively. - -````{admonition} A sample JSON file with a single *factor_hed_tags* transformation operation. -:class: tip - -```json -[{ - "operation": "factor_hed_tags", - "description": "Create factors based on whether the event represented a correct or incorrect action.",, - "parameters": { - "queries": ["correct-action", "incorrect-action"], - "query_names": ["correct", "incorrect"], - "remove_types": ["Condition-variable", "Task"], - "expand_context": false - } -}] -``` -```` - -The results of executing this *factor_hed-tags* operation on the -[**sample remodel event file**](sample-remodel-event-file-anchor) using the -[**sample remodel sidecar file**](sample-remodel-sidecar-file-anchor) for HED annotations is: - - -````{admonition} Results of *factor_hed_tags*. - -| onset | duration | trial_type | stop_signal_delay | response_time | response_accuracy | response_hand | sex | correct | incorrect | -| ----- | -------- | ---------- | ----------------- | ------------- | ----------------- | ------------- | --- | ---------- | ---------- | -| 0.0776 | 0.5083 | go | n/a | 0.565 | correct | right | female | 0 | 0 | -| 5.5774 | 0.5083 | unsuccesful_stop | 0.2 | 0.49 | correct | right | female | 0 | 1 | -| 9.5856 | 0.5084 | go | n/a | 0.45 | correct | right | female | 0 | 0 | -| 13.5939 | 0.5083 | succesful_stop | 0.2 | n/a | n/a | n/a | female | 1 | 0 | -| 17.1021 | 0.5083 | unsuccesful_stop | 0.25 | 0.633 | correct | left | male | 0 | 1 | -| 21.6103 | 0.5083 | go | n/a | 0.443 | correct | left | male | 0 | 0 | -```` - -(factor-hed-type-anchor)= -### Factor HED type - -The *factor_hed_type* operation produces factor columns -based on values of the specified HED type tag. -The most common type is the HED *Condition-variable* tag, which corresponds to -factor vectors based on the experimental design. -Other commonly use type tags include *Task*, *Control-variable*, and *Time-block*. - -We assume that the dataset has been annotated using HED tags to properly document -information such as experimental conditions, and focus on how such an annotated dataset can be -used with remodeling to produce factor columns corresponding to these -type variables. - -For additional information on how to encode experimental designs using HED, see -[**HED conditions and design matrices**](./HedConditionsAndDesignMatrices.md). - -(factor-hed-type-parameters-anchor)= -#### Factor HED type parameters - -```{admonition} Parameters for *factor_hed_type* operation. -:class: tip - -| Parameter | Type | Description | -| ------------ | ---- | ----------- | -| *type_tag* | str | HED tag used to find the factors (most commonly *Condition-variable*).| -| *type_values* | list | (**Optional**) Values to factor for the *type_tag*.
If omitted, all values of that *type_tag* are used. | -``` -The event context (as defined by onsets, offsets and durations) is always expanded and one-hot (0's and 1's) -encoding is used for the factors. - -(factor-hed-type-example-anchor)= -#### Factor HED type example - -The *factor_hed_type* operation in the following example appends -additional columns to each data file corresponding to -each possible value of each *Condition-variable* tag. -The columns contain 1's for rows corresponding to rows (e.g., events) for which that condition -applies and 0's otherwise. - -````{admonition} A JSON file with a single *factor_hed_type* transformation operation. -:class: tip - -```json -[{ - "operation": "factor_hed_type", - "description": "Factor based on the sex of the images being presented.", - "parameters": { - "type_tag": "Condition-variable" - } -}] -``` -```` - -The results of executing this *factor_hed-tags* operation on the -[**sample remodel event file**](sample-remodel-event-file-anchor) using the -[**sample remodel sidecar file**](sample-remodel-sidecar-file-anchor) for HED annotations are: - - -````{admonition} Results of *factor_hed_type*. - -| onset | duration | trial_type | stop_signal_delay | response_time | response_accuracy | response_hand | sex | Image-sex.Female-image-cond | Image-sex.Male-image-cond | -| ----- | -------- | ---------- | ----------------- | ------------- | ----------------- | ------------- | --- | ------- | ---------- | -| 0.0776 | 0.5083 | go | n/a | 0.565 | correct | right | female | 1 | 0 | -| 5.5774 | 0.5083 | unsuccesful_stop | 0.2 | 0.49 | correct | right | female | 1 | 0 | -| 9.5856 | 0.5084 | go | n/a | 0.45 | correct | right | female | 1 | 0 | -| 13.5939 | 0.5083 | succesful_stop | 0.2 | n/a | n/a | n/a | female | 1 | 0 | -| 17.1021 | 0.5083 | unsuccesful_stop | 0.25 | 0.633 | correct | left | male | 0 | 1 | -| 21.6103 | 0.5083 | go | n/a | 0.443 | correct | left | male | 0 | 1 | -```` - -(merge-consecutive-anchor)= -### Merge consecutive - -Sometimes a single long event in experimental logs is represented by multiple repeat events. -The *merge_consecutive* operation collapses these consecutive repeat events into one event with -duration updated to encompass the temporal extent of the merged events. - -(merge-consecutive-parameters-anchor)= -#### Merge consecutive parameters - -```{admonition} Parameters for the *merge_consecutive* operation. -:class: tip - -| Parameter | Type | Description | -| ------------ | ---- | ----------- | -| *column_name* | str | The name of the column which is the basis of the merge.| -| *event_code* | str, int, float | The value in *column_name* that triggers the merge. | -| *set_durations* | bool | If true, set durations based on merged events. | -| *ignore_missing* | bool | If true, missing *column_name* or *match_columns* do not raise an error. | -| *match_columns* | list | (**Optional**) Columns whose values must match to collapse events. | -``` - -The first of the group of rows (each representing an event) to be merged is called the anchor -for the merge. After the merge, it is the only row in the group -that remains in the data file. The result is identical -to its original version, except for the value in the `duration` column. - -If the *set_duration* parameter is true, the new duration is calculated as though -the event began with the onset of the first event (the anchor row) in the group and -ended at the point where all the events in the group have ended. -This method allows for small gaps between events and for events in which an -intermediate event in the group ends after later events. -If the *set_duration* parameter is false, the duration of the merged row is set to `n/a`. - -If the data file has other columns besides `onset`, `duration` and *column_name*, -the values in the other columns must be considered during the merging process. -The *match_columns* is a list of the other columns whose values must agree with those -of the anchor row in order for a merge to occur. If *match_columns* is empty, the -other columns in each row are not taken into account during the merge. - -(merge-consecutive-example-anchor)= -#### Merge consecutive example - -The *merge_consecutive* operation in the following example causes consecutive -`succesful_stop` events whose `stop_signal_delay`, `response_hand`, and `sex` columns -have the same values to be merged into a single event. - - -````{admonition} A JSON file with a single *merge_consecutive* transformation operation. -:class: tip - -```json -[{ - "operation": "merge_consecutive", - "description": "Merge consecutive *succesful_stop* events that match the *match_columns.", - "parameters": { - "column_name": "trial_type", - "event_code": "succesful_stop", - "set_durations": true, - "ignore_missing": true, - "match_columns": ["stop_signal_delay", "response_hand", "sex"] - } -}] -``` -```` - -When this operation is applied to the following input file, -the three events with a value of `succesful_stop` in the `trial_type` column starting -at `onset` value 13.5939 are merged into a single event. - -````{admonition} Input file for a *merge_consecutive* operation. - -| onset | duration | trial_type | stop_signal_delay | response_hand | sex | -| ----- | -------- | ---------- | ----------------- | ------------- | --- | -| 0.0776 | 0.5083 | go | n/a | right | female| -| 5.5774 | 0.5083 | unsuccesful_stop | 0.2 | right | female| -| 9.5856 | 0.5084 | go | n/a | right | female| -| 13.5939 | 0.5083 | succesful_stop | 0.2 | n/a | female| -| 14.2 | 0.5083 | succesful_stop | 0.2 | n/a | female| -| 15.3 | 0.7083 | succesful_stop | 0.2 | n/a | female| -| 17.3 | 0.5083 | unsuccesful_stop | 0.25 | n/a | female| -| 19.0 | 0.5083 | unsuccesful_stop | 0.25 | n/a | female| -| 21.1021 | 0.5083 | unsuccesful_stop | 0.25 | left | male| -| 22.6103 | 0.5083 | go | n/a | left | male | -```` - -Notice that the `succesful_stop` event at `onset` value `17.3` is not -merged because the `stop_signal_delay` column value does not match the value in the previous event. -The final result has `duration` computed as `2.4144` = `15.3` + `0.7083` - `13.5939`. - -````{admonition} The results of the *merge_consecutive* operation. - -| onset | duration | trial_type | stop_signal_delay | response_hand | sex | -| ----- | -------- | ---------- | ------------------ | ------------- | --- | -| 0.0776 | 0.5083 | go | n/a | right | female | -| 5.5774 | 0.5083 | unsuccesful_stop | 0.2 | right | female | -| 9.5856 | 0.5084 | go | n/a | right | female | -| 13.5939 | 2.4144 | succesful_stop | 0.2 | n/a | female | -| 17.3 | 2.2083 | unsuccesful_stop | 0.25 | n/a | female | -| 21.1021 | 0.5083 | unsuccesful_stop | 0.25 | left | male | -| 22.6103 | 0.5083 | go | n/a | left | male | -```` - -The events that had onsets at `17.3` and `19.0` are also merged in this example - -(remap-columns-anchor)= -### Remap columns - -The *remap_columns* operation maps combinations of values in *m* specified columns of a data file -into values in *n* columns using a defined mapping. -Remapping is useful during analysis to create columns in event files that are more directly useful -or informative for a particular analysis. - -Remapping is also important during the initial generation of event files from experimental logs. -The log files generated by experimental control software often generate a code for each type of log entry. -Remapping can be used to convert the column containing these codes into one or more columns with more informative information. - - -(remap-columns-parameters-anchor)= -#### Remap columns parameters - - -```{admonition} Parameters for the *remap_columns* operation. -:class: tip - -| Parameter | Type | Description | -| ------------ | ---- | ----------- | -| *source_columns* | list | A list of *m* names of the source columns for the map.| -| *destination_columns* | list | A list of *n* names of the destination columns for the map. | -| *map_list* | list | A list of mappings. Each element is a list of *m* source
column values followed by *n* destination values.
Mapping source values are treated as strings. | -| *ignore_missing* | bool | If false, source column values not in the map generate "n/a"
destination values instead of errors. | -| *integer_sources* | list | (**Optional**) A list of source columns that are integers.
The *integer_sources* must be a subset of *source_columns*. | -``` -A column cannot be both a source and a destination, -and all source columns must be present in the data files. -New columns are created for destination columns that are missing from a data file. - -The *remap_columns* operation only works for columns containing strings or integers, -as it is meant for remapping categorical codes. -You must specify which source columns contain integers so that `n/a` values -can be handled appropriately. - -The *map_list* parameter specifies how each unique combination of values from the source -columns will be mapped into the destination columns. -If there are *m* source columns and *n* destination columns, -then each entry in *map_list* must be a list with *m* + *n* elements. -The first *m* elements are the key values from the source columns. -The *map_list* should have targets for all combinations of values that appear in the *m* source columns -unless *ignore_missing* is true. - -After remapping, the tabular file will contain both source and destination columns. -If you wish to replace the source columns with the destination columns, -use a *remove_columns* transformation after the *remap_columns*. - - -(remap-columns-example-anchor)= -#### Remap columns example - -The *remap_columns* operation in the following example creates a new column called *response_type* -based on the unique values in the combination of columns *response_accuracy* and *response_hand*. - -````{admonition} A JSON file with a single *remap_columns* transformation operation. -:class: tip - -```json -[{ - "operation": "remap_columns", - "description": "Map response_accuracy and response hand into a single column.", - "parameters": { - "source_columns": ["response_accuracy", "response_hand"], - "destination_columns": ["response_type"], - "map_list": [["correct", "left", "correct_left"], - ["correct", "right", "correct_right"], - ["incorrect", "left", "incorrect_left"], - ["incorrect", "right", "incorrect_left"], - ["n/a", "n/a", "n/a"]], - "ignore_missing": true - } -}] -``` -```` -In this example there are two source columns and one destination column, -so each entry in *map_list* must be a list with three elements -two source values and one destination value. -Since all the values in *map_list* are strings, -the optional *integer_sources* list is not needed. - -The results of executing the previous *remap_column* command on the -[**sample remodel event file**](sample-remodel-event-file-anchor) are: - -````{admonition} Mapping columns *response_accuracy* and *response_hand* into a *response_type* column. - -| onset | duration | trial_type | stop_signal_delay | response_time | response_accuracy | response_hand | sex | response_type | -| ----- | -------- | ---------- | ---------- | ----------------- | ------------- | ----------------- | --- | ------------------- | -| 0.0776 | 0.5083 | go | n/a | 0.565 | correct | right | female | correct_right | -| 5.5774 | 0.5083 | unsuccesful_stop | 0.2 | 0.49 | correct | right | female | correct_right | -| 9.5856 | 0.5084 | go | n/a | 0.45 | correct | right | female | correct_right | -| 13.5939 | 0.5083 | succesful_stop | 0.2 | n/a | n/a | n/a | female | n/a | -| 17.1021 | 0.5083 | unsuccesful_stop | 0.25 | 0.633 | correct | left | male | correct_left | -| 21.6103 | 0.5083 | go | n/a | 0.443 | correct | left | male | correct_left | -```` - -In this example, *remap_columns* combines the values from columns `response_accuracy` and -`response_hand` to produce a new column called `response_type` that specifies both response hand and correctness information using a single code. - -(remove-columns-anchor)= -### Remove columns - -Sometimes columns are added during intermediate processing steps. The *remove_columns* -operation is useful for cleaning up unnecessary columns after these processing steps complete. - -(remove-columns-parameters-anchor)= -#### Remove columns parameters - -```{admonition} Parameters for the *remove_columns* operation. -:class: tip - -| Parameter | Type | Description | -| ------------ | ---- | ----------- | -| *column_names* | list of str | A list of columns to remove.| -| *ignore_missing* | boolean | If true, missing columns are ignored, otherwise raise `KeyError`. | -``` - -If one of the specified columns is not in the file and the *ignore_missing* -parameter is *false*, a `KeyError` is raised for the missing column. - -(remove-columns-example-anchor)= -#### Remove columns example - -The following example specifies that the *remove_columns* operation should remove the `stop_signal_delay`, -`response_accuracy`, and `face` columns from the tabular data. - -````{admonition} A JSON file with a single *remove_columns* transformation operation. -:class: tip - -```json -[{ - "operation": "remove_columns", - "description": "Remove extra columns before the next step.", - "parameters": { - "column_names": ["stop_signal_delay", "response_accuracy", "face"], - "ignore_missing": true - } -}] -``` -```` - -The results of executing this operation on the -[**sample remodel event file**](sample-remodel-event-file-anchor) -are shown below. -The *face* column is not in the data, but it is ignored, since *ignore_missing* is true. -If *ignore_missing* had been false, a `KeyError` would have been raised. - -```{admonition} Results of executing the *remove_columns*. -| onset | duration | trial_type | response_time | response_hand | sex | -| ----- | -------- | ---------- | ------------- | ------------- | --- | -| 0.0776 | 0.5083 | go | 0.565 | right | female | -| 5.5774 | 0.5083 | unsuccesful_stop | 0.49 | right | female | -| 9.5856 | 0.5084 | go | 0.45 | right | female | -| 13.5939 | 0.5083 | succesful_stop | n/a | n/a | female | -| 17.1021 | 0.5083 | unsuccesful_stop | 0.633 | left | male | -| 21.6103 | 0.5083 | go | 0.443 | left | male | -```` - -(remove-rows-anchor)= -### Remove rows - -The *remove_rows* operation eliminates rows in which the named column has one of the specified values. -This operation is useful for removing event markers corresponding to particular types of events -or, for example having `n/a` in a particular column. - - -(remove-rows-parameters-anchor)= -#### Remove rows parameters - -```{admonition} Parameters for *remove_rows*. -:class: tip - -| Parameter | Type | Description | -| ------------ | ---- | ----------- | -| *column_name* | str | The name of the column to be tested.| -| *remove_values* | list | A list of values to be tested for removal. | -``` -The operation does not raise an error if a data file does not have a column named -*column_name* or is missing a value in *remove_values*. - -(remove-rows-example-anchor)= -#### Remove rows example - -The following *remove_rows* operation removes the rows whose *trial_type* column -contains either `succesful_stop` or `unsuccesful_stop`. - -````{admonition} A JSON file with a single *remove_rows* transformation operation. -:class: tip - -```json -[{ - "operation": "remove_rows", - "description": "Remove rows where trial_type is either succesful_stop or unsuccesful_stop.", - "parameters": { - "column_name": "trial_type", - "remove_values": ["succesful_stop", "unsuccesful_stop"] - } -}] -``` -```` - -The results of executing the previous *remove_rows* operation on the -[**sample remodel event file**](sample-remodel-event-file-anchor) are: - -````{admonition} The results of executing the previous *remove_rows* operation. - -| onset | duration | trial_type | stop_signal_delay | response_time | response_accuracy | response_hand | sex | -| ----- | -------- | ---------- | ----------------- | ------------- | ----------------- | ------------- | --- | -| 0.0776 | 0.5083 | go | n/a | 0.565 | correct | right | female | -| 9.5856 | 0.5084 | go | n/a | 0.45 | correct | right | female | -| 21.6103 | 0.5083 | go | n/a | 0.443 | correct | left | male | -```` - -After removing rows with `trial_type` equal to `succesful_stop` or `unsuccesful_stop` only the -three `go` trials remain. - - -(rename-columns-anchor)= -### Rename columns - -The `rename_columns` operations uses a dictionary to map old column names into new ones. - -(rename-columns-parameters-anchor)= -#### Rename columns parameters - -```{admonition} Parameters for *rename_columns*. -:class: tip - -| Parameter | Type | Description | -| ------------ | ---- | ----------- | -| *column_mapping* | dict | The keys are the old column names and the values are the new names.| -| *ignore_missing* | bool | If false, a `KeyError` is raised if a dictionary key is not a column name. | - -``` - -If *ignore_missing* is false, a `KeyError` is raised if a column specified in -the mapping does not correspond to a column name in the data file. - -(rename-columns-example-anchor)= -#### Rename columns example - -The following example renames the `stop_signal_delay` column to be `stop_delay` and -the `response_hand` to be `hand_used`. - -````{admonition} A JSON file with a single *rename_columns* transformation operation. -:class: tip - -```json -[{ - "operation": "rename_columns", - "description": "Rename columns to be more descriptive.", - "parameters": { - "column_mapping": { - "stop_signal_delay": "stop_delay", - "response_hand": "hand_used" - }, - "ignore_missing": true - } -}] - -``` -```` - -The results of executing the previous *rename_columns* operation on the -[**sample remodel event file**](sample-remodel-event-file-anchor) are: - -````{admonition} After the *rename_columns* operation is executed, the sample events file is: -| onset | duration | trial_type | stop_delay | response_time | response_accuracy | hand_used | sex | -| ----- | -------- | ---------- | ----------------- | ------------- | ----------------- | ------------- | --- | -| 0.0776 | 0.5083 | go | n/a | 0.565 | correct | right | female | -| 5.5774 | 0.5083 | unsuccesful_stop | 0.2 | 0.49 | correct | right | female | -| 9.5856 | 0.5084 | go | n/a | 0.45 | correct | right | female | -| 13.5939 | 0.5083 | succesful_stop | 0.2 | n/a | n/a | n/a | female | -| 17.1021 | 0.5083 | unsuccesful_stop | 0.25 | 0.633 | correct | left | male | -| 21.6103 | 0.5083 | go | n/a | 0.443 | correct | left | male | -```` - -(reorder-columns-anchor)= -### Reorder columns - -The *reorder_columns* operation reorders the indicated columns in the specified order. -This operation is often used to place the most important columns near the beginning of the file for readability -or to assure that all the data files in dataset have the same column order. -Additional parameters control how non-specified columns are treated. - -(reorder-columns-parameters-anchor)= -#### Reorder columns parameters - -```{admonition} Parameters for the *reorder_columns* operation. -:class: tip - -| Parameter | Type | Description | -| ------------ | ---- | ----------- | -| *column_order* | list | A list of columns in the order they should appear in the data.| -| *ignore_missing* | bool | Controls handling column names in the reorder list that aren't in the data. | -| *keep_others* | bool | Controls handling of columns not in the reorder list. | - -``` - -If *ignore_missing* is true -and items in the reorder list do not exist in the file, the missing columns are ignored. -On the other hand, if *ignore_missing* is false, -a column name in the reorder list that is missing from the data raises a *ValueError*. - -The *keep_others* parameter controls whether columns in the data that -do not appear in the *column_order* list are dropped (*keep_others* is false) or -put at the end in the relative order that they appear in the file (*keep_others* is true). - -BIDS event files are required to have `onset` and `duration` as the first and second columns, respectively. - -(reorder-columns-example-anchor)= -#### Reorder columns example - -The *reorder_columns* operation in the following example specifies that the first four -columns of the dataset should be: `onset`, `duration`, `response_time`, and `trial_type`. -Since *keep_others* is false, these will be the only columns retained. - -````{admonition} A JSON file with a single *reorder_columns* transformation operation. -:class: tip - -```json -[{ - "operation": "reorder_columns", - "description": "Reorder columns.", - "parameters": { - "column_order": ["onset", "duration", "response_time", "trial_type"], - "ignore_missing": true, - "keep_others": false - } -}] -``` -```` - - -The results of executing the previous *reorder_columns* transformation on the -[**sample remodel event file**](sample-remodel-event-file-anchor) are: - -````{admonition} Results of *reorder_columns*. - -| onset | duration | response_time | trial_type | -| ----- | -------- | ---------- | ------------- | -| 0.0776 | 0.5083 | 0.565 | go | -| 5.5774 | 0.5083 | 0.49 | unsuccesful_stop | -| 9.5856 | 0.5084 | 0.45 | go | -| 13.5939 | 0.5083 | n/a | succesful_stop | -| 17.1021 | 0.5083 | 0.633 | unsuccesful_stop | -| 21.6103 | 0.5083 | 0.443 | go | -```` - -(split-rows-anchor)= -### Split rows - -The *split_rows* operation -is often used to convert event files from trial-level encoding to event-level encoding. -This operation is meant only for tabular files that have `onset` and `duration` columns. - -In **trial-level** encoding, all the events in a single trial -(usually some variation of the cue-stimulus-response-feedback-ready sequence) -are represented by a single row in the data file. -Often, the onset corresponds to the presentation of the stimulus, -and the other events are not reported or are implicitly reported. - -In **event-level** encoding, each row represents the temporal marker for a single event. -In this case a trial consists of a sequence of multiple events. - - -(split-rows-parameters-anchor)= -#### Split rows parameters - -```{admonition} Parameters for the *split_rows* operation. -:class: tip - -| Parameter | Type | Description | -| ------------ | ---- | ----------- | -| *anchor_column* | str | The name of the column that will be used for split_rows codes.| -| *new_events* | dict | Dictionary whose keys are the codes to be inserted as new events
in the *anchor_column* and whose values are dictionaries with
keys *onset_source*, *duration*, and *copy_columns (**Optional**)*. | -| *remove_parent_event* | bool | If true, remove parent event. | - -``` - -The *split_rows* operation requires an *anchor_column*, which could be an existing -column or a new column to be appended to the data. -The purpose of the *anchor_column* is to hold the codes for the new events. - -The *new_events* dictionary has the new events to be created. -The keys are the new event codes to be inserted into the *anchor_column*. -The values in *new_events* are themselves dictionaries. -Each of these dictionaries has three keys: - -- *onset_source* is a list of items to be added to the *onset* -of the event row being split to produce the `onset` column value for the new event. These items can be any combination of numerical values and column names. -- *duration* a list of numerical values and/or column names whose values are to be added -to compute the `duration` column value for the new event. -- *copy_columns* a list of column names whose values should be copied into each new event. -Unlisted columns are filled with `n/a`. - - -The *split_rows* operation sorts the split rows by the `onset` column and raises a `TypeError` -if the `onset` and `duration` are improperly defined. -The `onset` column is converted to numeric values as part splitting process. - -(split-rows-example-anchor)= -#### Split rows example - -The *split_rows* operation in the following example specifies that new rows should be added -to encode the response and stop signal. The anchor column is `trial_type`. - - -````{admonition} A JSON file with a single *split_rows* transformation operation. -:class: tip - -```json -[{ - "operation": "split_rows", - "description": "add response events to the trials.", - "parameters": { - "anchor_column": "trial_type", - "new_events": { - "response": { - "onset_source": ["response_time"], - "duration": [0], - "copy_columns": ["response_accuracy", "response_hand", "sex", "trial_number"] - }, - "stop_signal": { - "onset_source": ["stop_signal_delay"], - "duration": [0.5], - "copy_columns": ["trial_number"] - } - }, - "remove_parent_event": false - } - }] -``` -```` - -The results of executing this *split_rows* operation on the -[**sample remodel event file**](sample-remodel-event-file-anchor) are: - -````{admonition} Results of the previous *split_rows* operation. - -| onset | duration | trial_type | stop_signal_delay | response_time | response_accuracy | response_hand | sex | -| ----- | -------- | ---------- | ----------------- | ------------- | ----------------- | ------------- | --- | -| 0.0776 | 0.5083 | go | n/a | 0.565 | correct | right | female | -| 0.6426 | 0 | response | n/a | n/a | correct | right | female | -| 5.5774 | 0.5083 | unsuccesful_stop | 0.2 | 0.49 | correct | right | female | -| 5.7774 | 0.5 | stop_signal | n/a | n/a | n/a | n/a | n/a | -| 6.0674 | 0 | response | n/a | n/a | correct | right | female | -| 9.5856 | 0.5084 | go | n/a | 0.45 | correct | right | female | -| 10.0356 | 0 | response | n/a | n/a | correct | right | female | -| 13.5939 | 0.5083 | succesful_stop | 0.2 | n/a | n/a | n/a | female | -| 13.7939 | 0.5 | stop_signal | n/a | n/a | n/a | n/a | n/a | -| 17.1021 | 0.5083 | unsuccesful_stop | 0.25 | 0.633 | correct | left | male | -| 17.3521 | 0.5 | stop_signal | n/a | n/a | n/a | n/a | n/a | -| 17.7351 | 0 | response | n/a | n/a | correct | left | male | -| 21.6103 | 0.5083 | go | n/a | 0.443 | correct | left | male | -| 22.0533 | 0 | response | n/a | n/a | correct | left | male | -```` - -In a full processing example, it might make sense to rename `trial_type` to be -`event_type` and to delete the `response_time` and the `stop_signal_delay` columns, -since these items have been unfolded into separate events. -This could be accomplished in subsequent clean-up operations. - -(remodel-summarizations-anchor)= -## Remodel summarizations - -Summarizations differ transformations in two respects: they do not modify the input data file, -and they keep information about the results from each file that has been processed. -Summarization operations may be used at several points in the operation list as checkpoints -during debugging as well as for their more typical informational uses. - -All summary operations have two required parameters: *summary_name* and *summary_filename*. - -The *summary_name* is the unique key used to identify the -particular incarnation of this summary in the dispatcher. -Care should be taken to make sure that the *summary_name* is unique within -a given JSON remodeling file if the same summary operation is used more than -once within the file (e.g. for before and after summary information). - -The *summary_filename* should also be unique and is used for saving the summary upon request. -When the remodeler is applied to full datasets rather than single files, -the summaries are saved in the `derivatives/remodel/summaries` directory under the dataset root. -A time stamp and file extension are appended to the *summary_filename* when the -summary is saved. - -(summarize-column-names-anchor)= -### Summarize column names - -The *summarize_column_names* tracks the unique column name patterns found in data files across -the dataset and which files have these column names. -This summary is useful for determining whether there are any non-conforming data files. - -Often event files associated with different tasks have different column names, -and this summary can be used to verify that the files corresponding to the same task -have the same column names. - -A more problematic issue is when some event files for the same task -have reordered column names or use different column names. - -(summarize-columns-names-parameters-anchor)= -#### Summarize column names parameters - -The *summarize_column_names* operation has no parameters and only requires the -*summary_name* and the *summary_filename* to specify the operation. - -The *summarize_column_names* operation only has the two parameters required of -all summaries. - -```{admonition} Parameters for the *summarize_column_names* operation. -:class: tip - -| Parameter | Type | Description | -| ------------ | ---- | ----------- | -| *summary_name* | str | A unique name used to identify this summary.| -| *summary_filename* | str | A unique file basename to use for saving this summary. | -| *append_timecode* | bool | (**Optional**: Default false) If true, append a time code to filename. | -``` - -(summarize-column-names-example-anchor)= -#### Summarize column names example - -The following example remodeling file produces a summary, which when saved -will appear with file name `AOMIC_column_names_xxx.txt` or -`AOMIC_column_names_xxx.json` where `xxx` is a timestamp. - -````{admonition} A JSON file with a single *summarize_column_names* summarization operation. -:class: tip -```json -[{ - "operation": "summarize_column_names", - "description": "Summarize column names.", - "parameters": { - "summary_name": "AOMIC_column_names", - "summary_filename": "AOMIC_column_names" - } -}] -``` -```` - -When this operation is applied to the [**sample remodel event file**](sample-remodel-event-file-anchor), -the following text summary is produced. - -````{admonition} Result of applying *summarize_column_names* to the sample remodel file. -:class: tip - -```text - -Summary name: AOMIC_column_names -Summary type: column_names -Summary filename: AOMIC_column_names - -Summary details: - -Dataset: Number of files=1 - Columns: ['onset', 'duration', 'trial_type', 'stop_signal_delay', 'response_time', 'response_accuracy', 'response_hand', 'sex'] - sub-0013_task-stopsignal_acq-seq_events.tsv - -Individual files: - -sub-0013_task-stopsignal_acq-seq_events.tsv: - ['onset', 'duration', 'trial_type', 'stop_signal_delay', 'response_time', 'response_accuracy', 'response_hand', 'sex'] - -``` -```` - -Since we are only summarizing one event file, there is only one unique pattern -- corresponding -to the columns: *onset*, *duration*, *trial_type*, *stop_signal_delay*, *response_time*, *response_accuracy*, *response_hand*, and *response_time*. - -When the dataset has multiple column name patterns, the summary lists unique pattern separately along -with the names of the data files that have this pattern. - -The JSON version of the summary is useful for programmatic manipulation, -while the text version shown above is more readable. - - -(summarize-column-values-anchor)= -### Summarize column values - -The summarize column values operation provides a summary of the number of times various -column values appear in event files across the dataset. - - -(summarize-columns-values-parameters-anchor)= -#### Summarize column values parameters - -The following table lists the parameters required for using the summary. - -```{admonition} Parameters for the *summarize_column_values* operation. -:class: tip - -| Parameter | Type | Description | -| ------------ | ---- | ----------- | -| *summary_name* | str | A unique name used to identify this summary.| -| *summary_filename* | str | A unique file basename to use for saving this summary. | -| *append_timecode* | bool | (**Optional**: Default false) If True, append a time code to filename. | -| *max_categorical* | int | (**Optional**: Default 50) If given, the text summary shows top *max_categorical* values.
Otherwise the text summary displays all categorical values.| -| *skip_columns* | list | (**Optional**) A list of column names to omit from the summary.| -| *value_columns* | list | (**Optional**) A list of columns to omit the listing unique values. | -| *values_per_line* | int | (**Optional**: Default 5) If given, the text summary displays this
number of values per line (default is 5).| - -``` - -In addition to the standard parameters, *summary_name* and *summary_filename* required of all summaries, -the *summarize_column_values* operation requires two additional lists to be supplied. -The *skip_columns* list specifies the names of columns to skip entirely in the summary. -Typically, the `onset`, `duration`, and `sample` columns are skipped, since they have unique values for -each row and their values have limited information. - -The *summarize_column_values* is mainly meant for creating summary information about columns -containing a finite number of distinct values. -Columns that contain numeric information will usually have distinct entries for -each row in a tabular file and are not amenable to such summarization. -These columns could be specified as *skip_columns*, but another option is to -designate them as *value_columns*. The *value_columns* are reported in the summary, -but their distinct values are not reported individually. - -For datasets that include multiple tasks, the event values for each task may be distinct. -The *summarize_column_values* operation does not separate by task, but expects the -calling programs filter the files by task as desired. -The `run_remodel` program supports selecting files corresponding to a particular task. - -Two additional optional parameters are available for specifying aspects of the text summary output. -The *max_categorical* optional parameter specifies how many unique values should be displayed -for each column. The *values_per_line* controls how many categorical column values (with counts) -are displayed on each line of the output. By default, 5 values are displayed. - -(summarize-column-values-example-anchor)= -#### Summarize column values example - -The following example shows the JSON for including this operation in a remodeling file. - -````{admonition} A JSON file with a single *summarize_column_values* summarization operation. -:class: tip -```json -[{ - "operation": "summarize_column_values", - "description": "Summarize the column values in an excerpt.", - "parameters": { - "summary_name": "AOMIC_column_values", - "summary_filename": "AOMIC_column_values", - "skip_columns": ["onset", "duration"], - "value_columns": ["response_time", "stop_signal_delay"] - } -}] -``` -```` - -A text format summary of the results of executing this operation on the -[**sample remodel event file**](sample-remodel-event-file-anchor) -is shown in the following example. - -````{admonition} Sample *summarize_column_values* operation results in text format. -:class: tip -```text -Summary name: AOMIC_column_values -Summary type: column_values -Summary filename: AOMIC_column_values - -Overall summary: -Dataset: Total events=6 Total files=1 - Categorical column values[Events, Files]: - response_accuracy: - correct[5, 1] n/a[1, 1] - response_hand: - left[2, 1] n/a[1, 1] right[3, 1] - sex: - female[4, 1] male[2, 1] - trial_type: - go[3, 1] succesful_stop[1, 1] unsuccesful_stop[2, 1] - Value columns[Events, Files]: - response_time[6, 1] - stop_signal_delay[6, 1] - -Individual files: - -sub-0013_task-stopsignal_acq-seq_events.tsv: -Total events=200 - Categorical column values[Events, Files]: - response_accuracy: - correct[5, 1] n/a[1, 1] - response_hand: - left[2, 1] n/a[1, 1] right[3, 1] - sex: - female[4, 1] male[2, 1] - trial_type: - go[3, 1] succesful_stop[1, 1] unsuccesful_stop[2, 1] - Value columns[Events, Files]: - response_time[6, 1] - stop_signal_delay[6, 1] -``` -```` - -Because the [**sample remodel event file**](sample-remodel-event-file-anchor) -only has 6 events, we expect that no value will be represented in more than 6 events. -The column names corresponding to value columns just have the event counts in them. - -This command was executed with the `-i` option in `run_remodel`, -results from the individual data files are shown after the overall summary. -The individual results are similar to the overall summary because only one data file -was processed. - -For a more extensive example see the -[**text**](./_static/data/summaries/FacePerception_column_values_summary.txt) -and [**JSON**](./_static/data/summaries/FacePerception_column_values_summary.json) -format summaries of the sample dataset -[**ds003645s_hed**](https://github.com/hed-standard/hed-examples/tree/main/datasets/eeg_ds003645s_hed) -using the [**summarize_columns_rmdl.json**](./_static/data/summaries/summarize_columns_rmdl.json) -remodeling file. - - -(summarize-definitions-anchor)= -### Summarize definitions - -The summarize definitions operation provides a summary of the `Def-expand` tags found across the dataset, -nothing any ambiguous or erroneous ones. If working on a BIDS dataset, it will initialize with the known definitions -from the sidecar, reporting any deviations from the known definitions as errors. - -(summarize-definitions-parameters-anchor)= -#### Summarize definitions parameters - -**NOTE: This summary is still under development** -The following table lists the parameters required for using the summary. - -```{admonition} Parameters for the *summarize_definitions* operation. -:class: tip - -| Parameter | Type | Description | -| ------------ | ---- | ----------- | -| *summary_name* | str | A unique name used to identify this summary.| -| *summary_filename* | str | A unique file basename to use for saving this summary. | -| *append_timecode* | bool | (**Optional**: Default false) If true, append a time code to filename. | -``` - -The *summarize_definitions* is mainly meant for verifying consistency in unknown `Def-expand` tags. -This comes up where you have an assembled dataset, but no longer have the definitions stored (or never created them to begin with). - - -(summarize-definitions-example-anchor)= -#### Summarize definitions example - -The following example shows the JSON for including this operation in a remodeling file. - -````{admonition} A JSON file with a single *summarize_definitions* summarization operation. -:class: tip -```json -[{ - "operation": "summarize_definitions", - "description": "Summarize the definitions used in this dataset.", - "parameters": { - "summary_name": "HED_column_definition_summary", - "summary_filename": "HED_column_definition_summary" - } -}] -``` -```` - -A text format summary of the results of executing this operation on the -[**sub-003_task-FacePerception_run-3_events.tsv**](_static/data/sub-003_task-FacePerception_run-3_events.tsv) file -of the [**eeg_ds_003645s_hed_column**](https://github.com/hed-standard/hed-examples/tree/main/datasets/eeg_ds003645s_hed_column) dataset is shown in the following example. - -````{admonition} Sample *summarize_definitions* operation results in text format. -:class: tip -```text -Summary name: HED_column_definition_summary -Summary type: definitions -Summary filename: HED_column_definition_summary - -Overall summary: - Known Definitions: 17 items - cross-only: 2 items - description: A white fixation cross on a black background in the center of the screen. - contents: (Visual-presentation,(Background-view,Black),(Foreground-view,(Center-of,Computer-screen),(Cross,White))) - face-image: 2 items - description: A happy or neutral face in frontal or three-quarters frontal pose with long hair cropped presented as an achromatic foreground image on a black background with a white fixation cross superposed. - contents: (Visual-presentation,(Background-view,Black),(Foreground-view,((Center-of,Computer-screen),(Cross,White)),(Grayscale,(Face,Hair,Image)))) - circle-only: 2 items - description: A white circle on a black background in the center of the screen. - contents: (Visual-presentation,(Background-view,Black),(Foreground-view,((Center-of,Computer-screen),(Circle,White)))) - press-left-finger: 2 items - description: The participant presses a key with the left index finger to indicate a face symmetry judgment. - contents: ((Index-finger,(Experiment-participant,Left-side-of)),(Keyboard-key,Press)) - press-right-finger: 2 items - description: The participant presses a key with the right index finger to indicate a face symmetry evaluation. - contents: ((Index-finger,(Experiment-participant,Right-side-of)),(Keyboard-key,Press)) - famous-face-cond: 2 items - description: A face that should be recognized by the participants - contents: (Condition-variable/Face-type,(Image,(Face,Famous))) - unfamiliar-face-cond: 2 items - description: A face that should not be recognized by the participants. - contents: (Condition-variable/Face-type,(Image,(Face,Unfamiliar))) - scrambled-face-cond: 2 items - description: A scrambled face image generated by taking face 2D FFT. - contents: (Condition-variable/Face-type,(Image,(Disordered,Face))) - first-show-cond: 2 items - description: Factor level indicating the first display of this face. - contents: ((Condition-variable/Repetition-type,Item-interval/0,(Face,Item-count/1))) - immediate-repeat-cond: 2 items - description: Factor level indicating this face was the same as previous one. - contents: ((Condition-variable/Repetition-type,Item-interval/1,(Face,Item-count/2))) - delayed-repeat-cond: 2 items - description: Factor level indicating face was seen 5 to 15 trials ago. - contents: (Condition-variable/Repetition-type,(Face,Item-count/2),(Item-interval,(Greater-than-or-equal-to,Item-interval/5))) - left-sym-cond: 2 items - description: Left index finger key press indicates a face with above average symmetry. - contents: (Condition-variable/Key-assignment,((Asymmetrical,Behavioral-evidence),(Index-finger,(Experiment-participant,Right-side-of))),((Behavioral-evidence,Symmetrical),(Index-finger,(Experiment-participant,Left-side-of)))) - right-sym-cond: 2 items - description: Right index finger key press indicates a face with above average symmetry. - contents: (Condition-variable/Key-assignment,((Asymmetrical,Behavioral-evidence),(Index-finger,(Experiment-participant,Left-side-of))),((Behavioral-evidence,Symmetrical),(Index-finger,(Experiment-participant,Right-side-of)))) - face-symmetry-evaluation-task: 2 items - description: Evaluate degree of image symmetry and respond with key press evaluation. - contents: (Experiment-participant,Task,(Discriminate,(Face,Symmetrical)),(Face,See),(Keyboard-key,Press)) - blink-inhibition-task: 2 items - description: Do not blink while the face image is displayed. - contents: (Experiment-participant,Inhibit-blinks,Task) - fixation-task: 2 items - description: Fixate on the cross at the screen center. - contents: (Experiment-participant,Task,(Cross,Fixate)) - initialize-recording: 2 items - description: - contents: (Recording) - Ambiguous Definitions: 0 items - - Errors: 0 items -``` -```` - -Since this file didn't have any ambiguous or incorrect `Def-expand` groups, those sections are empty. -Ambiguous definitions are those that take a placeholder, but it doesn't have enough information -to be sure to which tag the placeholder applies. -Erroneous ones are ones with conflicting expanded forms. - -Currently, summaries are not generated for individual files, -but this is likely to change in the future. - -Below is a simple example showing the format when erroneous or ambiguous definitions are found. - -````{admonition} Sample input for *summarize_definitions* operation documenting ambiguous/erroneous definitions. -:class: tip -```text -((Def-expand/Initialize-recording,(Recording)),Onset) -((Def-expand/Initialize-recording,(Recording, Event)),Onset) -(Def-expand/Specify-age/1,(Age/1, Item-count/1)) -``` -```` - -````{admonition} Sample *summarize_definitions* operation error results in text format. -:class: tip -```text -Summary name: HED_column_definition_summary -Summary type: definitions -Summary filename: HED_column_definition_summary - -Overall summary: - Known Definitions: 1 items - initialize-recording: 2 items - description: - contents: (Recording) - Ambiguous Definitions: 1 items - specify-age/#: (Age/#,Item-count/#) - Errors: 1 items - initialize-recording: - (Event,Recording) -``` -```` - -It is assumed the first definition encountered is the correct definition, unless the first one is ambiguous. -Thus, it finds (`Def-expand/Initialize-recording`,(`Recording`) and considers it valid, before encountering -(`Def-expand/Initialize-recording`,(`Recording`, `Event`)), which is now deemed an error. - - -(summarize-hed-tags-anchor)= -### Summarize HED tags - -The *summarize_hed_tags* operation extracts a summary of the HED tags present -in the annotations of a dataset. -This summary operation assumes that the structure in question is suitably -annotated with HED (Hierarchical Event Descriptors). -You must provide a HED schema version. -If the data has annotations in a JSON sidecar, you must also provide its path. - -(summarize-hed-tags-parameters-anchor)= -#### Summarize HED tags parameters - -The *summarize_hed_tags* operation has the two required parameters -(*tags* and *expand_context*) in addition to the standard *summary_name* and *summary_filename* parameters. - -```{admonition} Parameters for the *summarize_hed_tags* operation. -:class: tip - -| Parameter | Type | Description | -| ------------ | ---- | ----------- | -| *summary_name* | str | A unique name used to identify this summary.| -| *summary_filename* | str | A unique file basename to use for saving this summary. | -| *tags* | dict | Dictionary with category title keys and tags in that category as values. | -| *append_timecode* | bool | (**Optional**: Default false) If true, append a time code to filename. | -| *include_context* | bool | (**Optional**: Default true) If true, expand the event context to
account for onsets and offsets. | -| *remove_types* | list | (**Optional**) A list of types such as
`Condition-variable` and `Task` to remove. | -| *replace_defs* | bool | (**Optional**: Default true) If true, the `Def` tags are replaced with the
contents of the definition (no `Def` or `Def-expand`). | -| *word_cloud* | dict | (**Optional**) If present, the operation produces a
word cloud image in addition to the summaries. | -``` - -The *tags* dictionary has keys that specify how the user wishes the tags -to be categorized for display. -Note that these keys are titles designating display categories, not HED tags. - -The *tags* dictionary values are lists of actual HED tags (or their children) -that should be listed under the respective display categories. - -If the optional parameter *include_context* is true, the counts include tags contributing -to the event context in events intermediate between onsets and offsets. - -If the optional parameter *replace_defs* is true, the tag counts include -tags contributed by contents of the definitions. - -If *word_cloud* parameter is provided but its value is empty, the default word cloud settings are used. -The following table lists the optional parameters used to control the appearance of the word cloud image. - -```{admonition} Optional keys in the word cloud dictionary value. -:class: tip - -| Parameter | Type | Description | -| ------------ | ---- | ----------- | -| *background_color* | str | The matplotlib name of the background color (default "black").| -| *contour_color* | str | The matplotlib name of the contour color if mask provided. | -| *contour_width* | float | Width of contour if mask provided (default 3). | -| *font_path* | str | The path of the system font to use in place of the default font. | -| *height* | int | Height in pixels of the image (default 300).| -| *mask_path* | str | The path of the mask image to use if *use_mask* is true
and an image other than the brain is needed. | -| *max_font_size* | float | The maximum font size to use in the image (default 15). | -| *min_font_size* | float | The minimum font size to use in the image (default 8).| -| *prefer_horizontal* | float | Fraction of horizontal words in image (default 0.75). | -| *scale_adjustment* | float | Constant to add to log10 count transformation (default 7). | -| *use_mask* | dict | If true, a mask image is used to provide a contour around the words. | -| *width* | int | Width in pixels of image (default 400). | -``` - -(summarize-hed-tags-example-anchor)= -#### Summarize HED tags example - -The following remodeling command specifies that the tag counts should be grouped -under the titles: *Sensory events*, *Agent actions*, and *Objects*. -Any leftover tags will appear under the title "Other tags". - -````{admonition} A JSON file with a single *summarize_hed_tags* summarization operation. -:class: tip -```json -[{ - "operation": "summarize_hed_tags", - "description": "Summarize the HED tags in the dataset.", - "parameters": { - "summary_name": "summarize_hed_tags", - "summary_filename": "summarize_hed_tags", - "tags": { - "Sensory events": ["Sensory-event", "Sensory-presentation", - "Task-stimulus-role", "Experimental-stimulus"], - "Agent actions": ["Agent-action", "Agent", "Action", "Agent-task-role", - "Task-action-type", "Participant-response"], - "Objects": ["Item"] - } - } -}] -``` -```` - -The results of executing this operation on the -[**sample remodel event file**](sample-remodel-event-file-anchor) are shown below. - -````{admonition} Text summary of *summarize_hed_tags* operation on the sample remodel file. -:class: tip - -```text -Summary name: summarize_hed_tags -Summary type: hed_tag_summary -Summary filename: summarize_hed_tags - -Overall summary: -Dataset: Total events=1200 Total1 file=6 - Main tags[events,files]: - Sensory events: - Sensory-presentation[6,1] Visual-presentation[6,1] Auditory-presentation[3,1] - Agent actions: - Incorrect-action[2,1] Correct-action[1,1] - Objects: - Image[6,1] - Other tags[events,files]: - Label[6,1] Def[6,1] Delay[3,1] - -Individual files: - -aomic_sub-0013_excerpt_events.tsv: -Total events=6 - Main tags[events,files]: - Sensory events: - Sensory-presentation[6,1] Visual-presentation[6,1] Auditory-presentation[3,1] - Agent actions: - Incorrect-action[2,1] Correct-action[1,1] - Objects: - Image[6,1] - Other tags[events,files]: - Label[6,1] Def[6,1] Delay[3,1] - -``` -```` - -The HED tag *Task-action-type* was specified in the "Agent actions" category, -*Incorrect-action* and *Correct-action*, which are children of *Task-action-type* -in the [**HED schema**](https://www.hedtags.org/display_hed.html), -will appear with counts in the list under this category. - -The sample events file had 6 events, including 1 correct action and 2 incorrect actions. -Since only one file was processed, the information for *Dataset* was -similar to that presented under *Individual files*. - -For a more extensive example, see the -[**text**](./_static/data/summaries/FacePerception_hed_tag_summary.txt) -and [**JSON**](./_static/data/summaries/FacePerception_hed_tag_summary.json) -format summaries of the sample dataset -[**ds003645s_hed**](https://github.com/hed-standard/hed-examples/tree/main/datasets/eeg_ds003645s_hed) -using the [**summarize_hed_tags_rmdl.json**](./_static/data/summaries/summarize_hed_tags_rmdl.json) -remodeling file. - -(summarize-hed-type-anchor)= -### Summarize HED type - -The *summarize_hed_type* operation is designed to extract experimental design matrices or other -experimental structure. -This summary operation assumes that the structure in question is suitably -annotated with HED (Hierarchical Event Descriptors). -The [**HED conditions and design matrices**](https://hed-examples.readthedocs.io/en/latest/HedConditionsAndDesignMatrices.html) -explains how this works. - -(summarize-hed-type-parameters-anchor)= -#### Summarize HED type parameters - -The *summarize_hed_type* operation provides detailed information about a specified tag, -usually `Condition-variable` or `Task`. -This summary provides useful information about experimental design. - -```{admonition} Parameters for the *summarize_hed_type* operation. -:class: tip - -| Parameter | Type | Description | -| ------------ | ---- | ----------- | -| *summary_name* | str | A unique name used to identify this summary.| -| *summary_filename* | str | A unique file basename to use for saving this summary. | -| *type_tag* | str | Tag to produce a summary for (most often *condition-variable*).| -| *append_timecode* | bool | (**Optional**: Default false) If true, append a time code to filename.| -``` -In addition to the two standard parameters (*summary_name* and *summary_filename*), -the *type_tag* parameter is required. -Only one tag can be given, so you must provide a separate operations in the remodel file -for multiple type tags. - -(summarize-hed-type-example-anchor)= -#### Summarize HED type example - -````{admonition} A JSON file with a single *summarize_hed_type* summarization operation. -:class: tip -```json -[{ - "operation": "summarize_hed_type", - "description": "Summarize column names.", - "parameters": { - "summary_name": "AOMIC_condition_variables", - "summary_filename": "AOMIC_condition_variables", - "type_tag": "condition-variable" - } -}] -``` -```` - -The results of executing this operation on the -[**sample remodel event file**](sample-remodel-event-file-anchor) are shown below. - -````{admonition} Text summary of *summarize_hed_types* operation on the sample remodel file. -:class: tip - -```text -Summary name: AOMIC_condition_variables -Summary type: hed_type_summary -Summary filename: AOMIC_condition_variables - -Overall summary: - -Dataset: Type=condition-variable Type values=1 Total events=6 Total files=1 - image-sex: 2 levels in 6 event(s)s out of 6 total events in 1 file(s) - female-image-cond [4,1]: ['Female', 'Image', 'Face'] - male-image-cond [2,1]: ['Male', 'Image', 'Face'] - -Individual files: - -aomic_sub-0013_excerpt_events.tsv: -Type=condition-variable Total events=6 - image-sex: 2 levels in 6 events - female-image-cond [4 events, 1 files]: - Tags: ['Female', 'Image', 'Face'] - male-image-cond [2 events, 1 files]: - Tags: ['Male', 'Image', 'Face'] -``` -```` - -Because *summarize_hed_type* is a HED operation, -a HED schema version is required and a JSON sidecar is also usually needed. -This summary was produced by using `hed_version="8.1.0"` when creating the `dispatcher` -and using the [**sample remodel sidecar file**](sample-remodel-sidecar-file-anchor) in the `do_op`. -The sidecar provides the annotations that use the `condition-variable` tag in the summary. - -For a more extensive example, see the -[**text**](./_static/data/summaries/FacePerception_hed_type_summary.txt) -and [**JSON**](./_static/data/summaries/FacePerception_hed_type_summary.json) -format summaries of the sample dataset -[**ds003645s_hed**](https://github.com/hed-standard/hed-examples/tree/main/datasets/eeg_ds003645s_hed) -using the [**summarize_hed_types_rmdl.json**](./_static/data/summaries/summarize_hed_types_rmdl.json) -remodeling file. - -(summarize-hed-validation-anchor)= -### Summarize HED validation - -The *summarize_hed_validation* operation runs the HED validator on the requested data -and produces a summary of the errors. -See the [**HED validation guide**](./HedValidationGuide.md) for available methods of -running the HED validator. - - -(summarize-hed-validation-parameters-anchor)= -#### Summarize HED validation parameters - -In addition to the required *summary_name* and *summary_filename* parameters, -the *summarize_hed_validation* operation has a required boolean parameter *check_for_warnings*. -If *check_for_warnings* is false, the summary will not report warnings. - -```{admonition} Parameters for the *summarize_hed_validation* operation. -:class: tip - -| Parameter | Type | Description | -| ------------ | ---- | ----------- | -| *summary_name* | str | A unique name used to identify this summary.| -| *summary_filename* | str | A unique file basename to use for saving this summary. | -| *append_timecode* | bool | (**Optional**: Default false) If true, append a time code to filename. | -| *check_for_warnings* | bool | (**Optional**: Default false) If true, warnings are reported in addition to errors. | -``` -The *summarize_hed_validation* is a HED operation and the calling program must provide a HED schema version -and usually a JSON sidecar containing the HED annotations. - -The validation process takes place in two stages: first the JSON sidecar is validated. -This strategy is used because a single error in the JSON sidecar can generate an error message -for every line in the corresponding data file. - -If the JSON sidecar has errors (warnings don't count), the validation process is terminated -without validation of the data file and assembled HED annotations. - -If the JSON sidecar does not have errors, -the validator assembles the annotations for each line in the data files and validates -the assembled HED annotation. -Data file-wide consistency, such as matched onsets and offsets, is also checked. - - -(summarize-hed-validation-example-anchor)= -#### Summarize HED validation example - -````{admonition} A JSON file with a single *summarize_hed_validation* summarization operation. -:class: tip -```json -[{ - "operation": "summarize_hed_validation", - "description": "Summarize validation errors in the sample dataset.", - "parameters": { - "summary_name": "AOMIC_sample_validation", - "summary_filename": "AOMIC_sample_validation", - "check_for_warnings": true - } -}] -``` -```` - -To demonstrate the output of the validation operation, we modified the first row of the -[**sample remodel event file**](sample-remodel-event-file-anchor) -so that `trial_type` column contained the value `baloney` rather than `go`. -This modification generates a warning because the meaning of `baloney` is not defined -in the [**sample remodel sidecar file**](sample-remodel-sidecar-file-anchor). -The results of executing the example operation with the modified file are shown -in the following example. - - -````{admonition} Text summary of *summarize_hed_validation* operation on a modified sample data file. -:class: tip - -```text -Summary name: AOMIC_sample_validation -Summary type: hed_validation -Summary filename: AOMIC_sample_validation - -Summary details: - -Dataset: [1 sidecar files, 1 event files] - task-stopsignal_acq-seq_events.json: 0 issues - sub-0013_task-stopsignal_acq-seq_events.tsv: 6 issues - -Individual files: - - sub-0013_task-stopsignal_acq-seq_events.tsv: 1 sidecar files - task-stopsignal_acq-seq_events.json has no issues - sub-0013_task-stopsignal_acq-seq_events.tsv issues: - HED_UNKNOWN_COLUMN: WARNING: Column named 'onset' found in file, but not specified as a tag column or identified in sidecars. - HED_UNKNOWN_COLUMN: WARNING: Column named 'duration' found in file, but not specified as a tag column or identified in sidecars. - HED_UNKNOWN_COLUMN: WARNING: Column named 'response_time' found in file, but not specified as a tag column or identified in sidecars. - HED_UNKNOWN_COLUMN: WARNING: Column named 'response_accuracy' found in file, but not specified as a tag column or identified in sidecars. - HED_UNKNOWN_COLUMN: WARNING: Column named 'response_hand' found in file, but not specified as a tag column or identified in sidecars. - HED_SIDECAR_KEY_MISSING[row=0,column=2]: WARNING: Category key 'baloney' does not exist in column. Valid keys are: ['succesful_stop', 'unsuccesful_stop', 'go'] - -``` -```` - -This summary was produced using HED schema version `hed_version="8.1.0"` when creating the `dispatcher` -and using the [**sample remodel sidecar file**](sample-remodel-sidecar-file-anchor) in the `do_op`. - - -(summarize-sidecar-from-events-anchor)= -### Summarize sidecar from events - -The summarize sidecar from events operation generates a sidecar template from the event -files in the dataset. - - -(summarize-sidecar-from-events-parameters-anchor)= -#### Summarize sidecar from events parameters - -The following table lists the parameters required for using the summary. - -```{admonition} Parameters for the *summarize_sidcar_from_events* operation. -:class: tip - -| Parameter | Type | Description | -| ------------ | ---- | ----------- | -| *summary_name* | str | A unique name used to identify this summary.| -| *summary_filename* | str | A unique file basename to use for saving this summary. | -| *skip_columns* | list | A list of column names to omit from the sidecar.| -| *value_columns* | list | A list of columns to treat as value columns in the sidecar. | -| *append_timecode* | bool | (Optional) If True, append a time code to filename.
False is the default. | -``` -The standard summary parameters, *summary_name* and *summary_filename* are required. -The *summary_name* is the unique key used to identify the -particular incarnation of this summary in the dispatcher. -Since a particular operation file may use a given operation multiple times, -care should be taken to make sure that it is unique. - -The *summary_filename* should also be unique and is used for saving the summary upon request. -When the remodeler is applied to full datasets rather than single files, -the summaries are saved in the `derivatives/remodel/summaries` directory under the dataset root. -A time stamp and file extension are appended to the *summary_filename* when the -summary is saved. - -In addition to the standard parameters, *summary_name* and *summary_filename* required of all summaries, -the *summarize_sidecar_from_events* operation requires two additional lists to be supplied. -The *skip_columns* list specifies the names of columns to skip entirely in -generating the sidecar template. -The *value_columns* list specifies the names of columns to treat as value columns -when generating the sidecar template. - -(summarize-sidecar-from-events-example-anchor)= -#### Summarize sidecar from events example - -The following example shows the JSON for including this operation in a remodeling file. - -````{admonition} A JSON file with a single *summarize_sidecar_from_events* summarization operation. -:class: tip -```json -[{ - "operation": "summarize_sidecar_from_events", - "description": "Generate a sidecar from the excerpted events file.", - "parameters": { - "summary_name": "AOMIC_generate_sidecar", - "summary_filename": "AOMIC_generate_sidecar", - "skip_columns": ["onset", "duration"], - "value_columns": ["response_time", "stop_signal_delay"] - } -}] - -``` -```` - -The results of executing this operation on the -[**sample remodel event file**](sample-remodel-event-file-anchor) -are shown in the following example using the text format. - -````{admonition} Sample *summarize_sidecar_from_events* operation results in text format. -:class: tip -```text -Summary name: AOMIC_generate_sidecar -Summary type: events_to_sidecar -Summary filename: AOMIC_generate_sidecar - -Dataset: Currently no overall sidecar extraction is available - -Individual files: - -aomic_sub-0013_excerpt_events.tsv: Total events=6 Skip columns: ['onset', 'duration'] -Sidecar: -{ - "trial_type": { - "Description": "Description for trial_type", - "HED": { - "go": "(Label/trial_type, Label/go)", - "succesful_stop": "(Label/trial_type, Label/succesful_stop)", - "unsuccesful_stop": "(Label/trial_type, Label/unsuccesful_stop)" - }, - "Levels": { - "go": "Here describe column value go of column trial_type", - "succesful_stop": "Here describe column value succesful_stop of column trial_type", - "unsuccesful_stop": "Here describe column value unsuccesful_stop of column trial_type" - } - }, - "response_accuracy": { - "Description": "Description for response_accuracy", - "HED": { - "correct": "(Label/response_accuracy, Label/correct)" - }, - "Levels": { - "correct": "Here describe column value correct of column response_accuracy" - } - }, - "response_hand": { - "Description": "Description for response_hand", - "HED": { - "left": "(Label/response_hand, Label/left)", - "right": "(Label/response_hand, Label/right)" - }, - "Levels": { - "left": "Here describe column value left of column response_hand", - "right": "Here describe column value right of column response_hand" - } - }, - "sex": { - "Description": "Description for sex", - "HED": { - "female": "(Label/sex, Label/female)", - "male": "(Label/sex, Label/male)" - }, - "Levels": { - "female": "Here describe column value female of column sex", - "male": "Here describe column value male of column sex" - } - }, - "response_time": { - "Description": "Description for response_time", - "HED": "(Label/response_time, Label/#)" - }, - "stop_signal_delay": { - "Description": "Description for stop_signal_delay", - "HED": "(Label/stop_signal_delay, Label/#)" - } -} -``` -```` - -(remodel-implementation-anchor)= -## Remodel implementation - -Operations are defined as classes that extend `BaseOp` regardless of whether -they are transformations or summaries. However, summaries must also implement -an additional supporting class that extends `BaseSummary` to hold the summary information. - -In order to be executed by the remodeling functions, -an operation must appear in the `valid_operations` dictionary. - -Each operation class must have a `NAME` class variable, specifying the operation name (string) and a -`PARAMS` class variable containing a dictionary of the operation's parameters represented as a json schema. -The operation's constructor extends the `BaseOp` class constructor by calling: - -````{admonition} A remodel operation class must call the BaseOp constructor first. -:class: tip -```python - super().__init__(parameters) -``` -```` - -A remodel operation class must implement the `BaseOp` abstract methods `do_ops` and `validate_input_data`. - -### The PARAMS dictionary - -The class-wide `PARAMS` dictionary specifies the required and optional parameters of the operation as a [**JSON schema**](https://json-schema.org/). -We currently use draft-2020-12. -The basic vocabulary allows specifying the type of parameters that are expected and -whether a parameter is required or optional. - -It is also possible to add dependencies between parameters. More information can be found in the JSON schema -[**documentation**](https://json-schema.org/learn/getting-started-step-by-step). - -On the highest level the type should always be specified as an object, as the parameters are always provided as a dictionary or json object. -Under the properties key, the expected parameters should be listed, along with what datatype is expected for every parameter. -The specifications can be nested, for example, the `rename_columns` operation requires a parameter `column_mapping`, -which should be a JSON object whose keys are any valid string, and whose values are also strings. -This is represented in the following way: - -```json -{ - "type": "object", - "properties": { - "column_mapping": { - "type": "object", - "patternProperties": { - ".*": { - "type": "string" - } - }, - "minProperties": 1 - }, - "ignore_missing": { - "type": "boolean" - } - }, - "required": [ - "column_mapping", - "ignore_missing" - ], - "additionalProperties": false - } -``` - -The `PARAMS` dictionaries for all available operations are read by the `validator` and compiled into a -single JSON schema which represents the specification for remodeler files. -The `properties` dictionary explicitly specifies the parameters that are allowed for this operation. -The `required` list specifies which parameters must be included when calling operation. -Parameters that are not required may be omitted in the operation call. -The `"additionalProperties": false` is the way that JSON schema use to -indicate that no other parameters are allowed in the call to the operation. - -A limitation to JSON schema representations is that although it can handle specific dependencies between keys in the data, -it cannot validate the data that is provided in the JSON file against other data in the JSON file. -For example, if the requirement is a list of elements whose length should be specified by another parameter, -JSON schema does not provide a vocabulary for setting this dependency. -Instead, we handle these type of dependencies in the `validate_input_data` method. - -(operation-class-constructor-anchor)= -### Operation class constructor - -All the operation classes have constructors that start with a call to the superclass constructor `BaseOp`. -The following example shows the constructor for the `RemoveColumnsOp` class. - -````{admonition} The class-wide PARAMS dictionary for the RemoveColumnsOp class. -:class: tip -```python - def __init__(self, parameters): - super().__init__(parameters) - self.column_names = parameters['column_names'] - ignore_missing = parameters['ignore_missing'] - if ignore_missing: - self.error_handling = 'ignore' - else: - self.error_handling = 'raise' -``` -```` - -After the call to the base class constructor, the operation constructor assigns the operation-specific -values to class properties. Validation takes place before the operation classes are initialized. - - -(the-do_op-implementation-anchor)= -### The do_op implementation -The remodeling script is meant to be executed by the `Dispatcher`, -which keeps a compiled version of the remodeling script to execute on each tabular file to be remodeled. - -The main method that must be implemented by each operation is `do_op`, which takes -an instance of the `Dispatcher` class as the first parameter and a Pandas [`DataFrame`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) -representing the tabular file as the second parameter. -A third required parameter is a name used to identify the tabular file in error messages and summaries. -This name is usually the filename or the filepath from the dataset root. -An additional optional argument, a sidecar containing HED annotations, -only need be included for HED operations. -Note that the `Dispatcher` is responsible for holding the appropriate version of the HED schema if -HED remodeling operations are included. - -The following example shows a sample implementation for `do_op`. - -````{admonition} The implementation of do_op for the RemoveColumnsOp class. -:class: tip -```python - - def do_op(self, dispatcher, df, name, sidecar=None): - return df.drop(self.remove_names, axis=1, errors=self.error_handling) -``` -```` - -The `do_op` in this case is a wrapper for the underlying Pandas `DataFrame` -operation for removing columns. - -**IMPORTANT NOTE**: The `do_op` operation always assumes that `n/a` values have been -replaced by `numpy.NaN` values in the incoming dataframe `df`. -The `Dispatcher` class has a static method `prep_data` that does this replacement. -At the end of running all the remodeling operations on a data file `Dispatcher` `run_operations` -method replaces all of the `numpy.NaN` values with `n/a`, the value expected by BIDS. -This operation is performed by the `Dispatcher` static method `post_proc_data`. - - -### The validate_input_data implementation - -This method exist to handle additional input data validation that cannot be specified in JSON schema. -It is a class method which is called by the `validator`. -If there is no additional validation to be done, -a minimal implementation of this method should take in a dictionary with the operation parameters and return an empty list. -In case additional validation is required, the method should directly implement validation and return a list of user-friendly -error messages (strings) if validation fails, or an empty list if there are no errors. - -The following implementation of `validate_input_data` method, for the `factor_hed_tags` operation, -checks whether the parameter `query_names` is the same length as the input for parameter `queries`, -since the names specified in the first parameter are meant to represent the queries provided in the latter. -The check only takes place if `query_names` exists, since naming is handled automatically otherwise. - -```python -@staticmethod -def validate_input_data(parameters): - errors = [] - if parameters.get("query_names", False): - if len(parameters.get("query_names")) != len(parameters.get("queries")): - errors.append("The list in query_names, in the factor_hed_tags operation, should have the same number of items as queries.") - return errors -``` - - -(the-do_op-for summarization-anchor)= -### The do_op for summarization - -The `do_op` operation for summarization operations has a slightly different form, -as it serves primarily as a wrapper for the actual summary information as illustrated -by the following example. - -(implementation-of-do-op_summarize-column-names-anchor)= -````{admonition} The implementation of do_op for SummarizeColumnNamesOp. -:class: tip -```python - def do_op(self, dispatcher, df, name, sidecar=None): - summary = dispatcher.summary_dict.get(self.summary_name, None) - if not summary: - summary = ColumnNameSummary(self) - dispatcher.summary_dict[self.summary_name] = summary - summary.update_summary({"name": name, "column_names": list(df.columns)}) - return df - -``` -```` - -A `do_op` operation for a summarization checks the `dispatcher` to see if the -summary name is already in the dispatcher's `summary_dict`. -If that summary is not yet in the `summary_dict`, -the operation creates a `BaseSummary` object for its summary (e.g., `ColumnNameSummary`) -and adds this object to the dispatcher's `summary_dict`, -otherwise the operation fetches the `BaseSummary` object from the dispatcher's `summary_dict`. -It then asks this `BaseSummary` object to update the summary based on the `DataFrame` -as explained in the next section. - -(additional-requirements-for-summarization-anchor)= -### Additional requirements for summarization - -Any summary operation must implement a supporting class that extends `BaseSummary`. -This class is used to hold and accumulate the information specific to the summary. -This support class must implement two methods: `update_summary` and `get_summary_details`. - -The `update_summary` method is called by its associated `BaseOp` operation during the `do_op` -to update the summary information based on the current `DataFrame`. -The `update_summary` information takes a single parameter, which is a dictionary of information -specific to this operation. - -````{admonition} The update_summary method required to be implemented by all BaseSummary objects. -:class: tip -```python - def update_summary(self, summary_dict) -``` -```` - -In the example [do_op for ColumnNamesOp](implementation-of-do-op_summarize-column-names-anchor), -the dictionary contains keys for `name` and `column_names. - -The `get_summary_details` returns a dictionary with the summary-specific information -currently in the summary. -The `BaseSummary` provides universal methods for converting this summary to JSON or text format. - - -````{admonition} The get_summary_details method required to be implemented by all BaseSummary objects. -:class: tip -```python - get_summary_details(self, verbose=True) -``` -```` -The operation associated with this instance of it associated with a given format -implementation - -### Validator implementation - -The required input for the remodeler is specified in JSON format and must follow -the rules laid out by the JSON schema. -The parameters in the remodeler file must conform to the properties specified -in the corresponding JSON schema associated with each operation. -The errors are retrieved from the validator but are not passed on directly but instead -modified for display as user-friendly error messages. - -Validation errors are organized by stages as follows. - -#### Stage 0: top-level structure - -Stage 0 refers to the top-level structure of the remodel JSON file. -As specified by the validator's `BASE_ARRAY`, -a JSON remodel file must be an array of operation dictionaries containing at least 1 item. - -#### Stage 1: operation dictionary format - -Stage 1 validation refers the structure of the individual operations as specified by the validator's `OPERATION_DICT`. -Every operation dictionary should have exactly the keys: `operation`, `description`, and `parameters`. - -#### Stage 2: operation dictionary values - -Stage 2 validates the values associated with the keys in each operation dictionary. -The `operation` and `description` keys should have a string values, -while the `parameters` key should have a dictionary value. - -Stage 2 validation also verifies that the operation value is one of the valid operations as -enumerated in the `valid_operations` dictionary. - -Several checks are also applied to the `parameters` dictionary. -The properties listed as `required` in the schema must appear as keys in the `parameters` dictionary. - -If additional properties are not allowed, as designated by `"additionalProperties": False` in the JSON schema, -the validator verifies that parameters not mentioned in the schema do not appear. -Note this is currently true for all operations and recommended for new operations. - -If the schema for the operation has a `dependentRequired` dictionary, the validator -verifies that the indicated keys are present if the listed parameter values are also present. -For example, the `factor_column_op` only allows the `factor_names` parameter if the `factor_values` -parameter is also present. In this case the dependency works only one way, such that `factor_values` -can be provided without `factor_names`. If `factor_names` is provided alone the operation automatically generates the -factor names based on the column names, however, without `factor_values` the names provided -in `factor_names` do not correspond to anything, so this key cannot appear on its own. - -#### Later validation stages - -Later stages in validation concern the values given within the parameter object, which can be nested to an arbitrary level -and are handled in a general way. -The user is provided with the operation index, name and the 'path' of the value that is invalid. -Note that while parameters always contains an object, the values in parameters can be of any type. -Thus, parameter values can be objects whose values might also be expected to be objects, arrays, or arrays of objects. -The validator has appropriate messages for many of the conditions that can be set with json schema, -but if an new operation parameter has a condition that has not been used yet, a new error message will need to be added to the validator. - - -When validation against JSON schema passes, -the validator performs additional data-specific validation by calling `validate_input_data` -for each operation to verify that input data satisfies the -constraints that fall outside the scope of JSON schema. -Also see [**The validate_input_data implementation**](#the-validate_input_data-implementation) and -[**The PARAMS dictionary**](#the-params-dictionary) sections for additional information. +(hed-remodeling-tools-anchor)= +# HED remodeling tools + +**Remodeling** refers to the process of transforming a tabular file +into a different form in order to disambiguate the +information or to facilitate a particular analysis. +The remodeling operations are specified in a JSON (`.json`) file, +giving a record of the transformations performed. + +There are two types of remodeling operations: **transformation** and **summarization**. +The **transformation** operations modify the tabular files, +while **summarization** produces an auxiliary information file but leaves +the tabular files unchanged. + +The file remodeling tools can be applied to any tab-separated value (`.tsv`) file +but are particularly useful for restructuring files representing experimental events. +Please read the [**HED remodeling quickstart**](./HedRemodelingQuickstart) +tutorials for an introduction and basic use of the tools. + +The file remodeling tools can be applied to individual files using the +[**HED online tools**](https://hedtools.ucsd.edu/hed) or to entire datasets +using the [**remodel command-line interface**](remodel-command-line-interface-anchor) +either by calling Python scripts directly from the command line +or by embedding calls in a Jupyter notebook. +The tools are also available as +[**HED RESTful services**](./HedOnlineTools.md#hed-restful-services). +The online tools are particularly useful for debugging. + +This user's guide contains the following topics: + +* [**Overview of remodeling**](overview-of-remodeling-anchor) +* [**Installing the remodel tools**](installing-the-remodel-tools-anchor) +* [**Remodel command-line interface**](remodel-command-line-interface-anchor) +* [**Remodel scripts**](remodel-scripts-anchor) + * [**Backing up files**](backing-up-files-anchor) + * [**Remodeling files**](remodeling-files-anchor) + * [**Restoring files**](restoring-files-anchor) +* [**Remodel with HED**](remodel-with-hed-anchor) +* [**Remodel sample files**](remodel-sample-files-anchor) + * [**Sample remodel file**](sample-remodel-file-anchor) + * [**Sample remodel event file**](sample-remodel-event-file-anchor) + * [**Sample remodel sidecar file**](sample-remodel-sidecar-file-anchor) +* [**Remodel transformations**](remodel-transformations-anchor) + * [**Factor column**](factor-column-anchor) + * [**Factor HED tags**](factor-hed-tags-anchor) + * [**Factor HED type**](factor-hed-type-anchor) + * [**Merge consecutive**](merge-consecutive-anchor) + * [**Remap columns**](remap-columns-anchor) + * [**Remove columns**](remove-columns-anchor) + * [**Remove rows**](remove-rows-anchor) + * [**Rename columns**](rename-columns-anchor) + * [**Reorder columns**](reorder-columns-anchor) + * [**Split rows**](split-rows-anchor) +* [**Remodel summarizations**](remodel-summarizations-anchor) + * [**Summarize column names**](summarize-column-names-anchor) + * [**Summarize column values**](summarize-column-values-anchor) + * [**Summarize definitions**](summarize-definitions-anchor) + * [**Summarize hed tags**](summarize-hed-tags-anchor) + * [**Summarize hed type**](summarize-hed-type-anchor) + * [**Summarize hed validation**](summarize-hed-validation-anchor) + * [**Summarize sidecar from events**](summarize-sidecar-from-events-anchor) +* [**Remodel implementation**](remodel-implementation-anchor) + + +(overview-of-remodeling-anchor)= +## Overview of remodeling + +Remodeling consists of restructuring and/or extracting information from tab-separated +value files based on a specified list of operations contained in a JSON file. + +Internally, the remodeling operations represent the tabular file using a +[**Pandas DataFrame**](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html). + +(transformation-operations-anchor)= +### Transformation operations + +**Transformation** operations, shown schematically in the following +figure, are designed to transform an incoming tabular file +into a new DataFrame without modifying the incoming data. + +![Transformation operations](./_static/images/TransformationOperations.png) + +Transformation operations are stateless and do not save any context information or +affect future applications of the transformation. + +Transformations, themselves, do not have any output and just return a new, +transformed DataFrame. +In other words, transformations do not operate in place on the incoming DataFrame, +but rather, they create a new DataFrame containing the result. + +Typically, the calling program is responsible for reading and saving the tabular file, +so the user can choose whether to overwrite or create a new file. + +See the [**remodeling tool program interface**](remodel-command-line-interface-anchor) +section for information on how to call the operations. + +(summarization-operations-anchor)= +### Summarization operations + +**Summarization** operations do not modify the input DataFrame but rather extract and save information in an internally stored summary dictionary as shown schematically in the following figure. + +![Summary operations](./_static/images/SummaryOperation.png) + +The dispatcher that executes remodeling operations can be interrogated at any time +for the state information contained in the global summary dictionary and can save additional summary information at any time during execution. +Usually summaries are dumped at the end of processing to the `derivatives/remodel/summaries` +subdirectory under the dataset root. + +Summarization operations may appear anywhere in the operation list, +and the same type of summary may appear multiple times under different names in order to track progress. + +The dispatcher stores information from each uniquely named summarization operation +as a separate summary dictionary entry. +Within its summary information, most summarization operations keep a separate +summary for each individual file and have methods to create an overall summary +of the information for all the files that have been processed by the summarization. + +Summarization results are available in JSON (`.json`) and text (`.txt`) formats. + +(available-operations-anchor)= +### Available operations + +The following table lists the available remodeling operations with brief example use cases +and links to further documentation. Operations not listed in the summarize section are transformations. + +(remodel-operation-summary-anchor)= +````{table} Summary of the HED remodeling operations for tabular files. +| Category | Operation | Example use case | +| -------- | ------- | -----| +| **clean-up** | | | +| | [*remove_columns*](remove-columns-anchor) | Remove temporary columns created during restructuring. | +| | [*remove_rows*](remove-rows-anchor) | Remove rows with n/a values in a specified column. | +| | [*rename_columns*](rename-columns-anchor) | Make columns names consistent across a dataset. | +| | [*reorder_columns*](reorder-columns-anchor) | Make column order consistent across a dataset. | +| **factor** | | | +| | [*factor_column*](factor-column-anchor) | Extract factor vectors from a column of condition variables. | +| | [*factor_hed_tags*](factor-hed-tags-anchor) | Extract factor vectors from search queries of HED annotations. | +| | [*factor_hed_type*](factor-hed-type-anchor) | Extract design matrices and/or condition variables. | +| **restructure** | | | +| | [*merge_consecutive*](merge-consecutive-anchor) | Replace multiple consecutive events of the same type
with one event of longer duration. | +| | [*remap_columns*](remap-columns-anchor) | Create m columns from values in n columns (for recoding). | +| | [*split_rows*](split-rows-anchor) | Split trial-encoded rows into multiple events. | +| **summarize** | | | +| | [*summarize_column_names*](summarize-column-names-anchor) | Summarize column names and order in the files. | +| | [*summarize_column_values*](summarize-column-values-anchor) | Count the occurrences of the unique column values. | +| | [*summarize_hed_tags*](summarize-hed-tags-anchor) | Summarize the HED tags present in the
HED annotations for the dataset. | +| | [*summarize_hed_type*](summarize-hed-type-anchor) | Summarize the detailed usage of a particular type tag
such as *Condition-variable* or *Task*
(used to automatically extract experimental designs). | +| | [*summarize_hed_validation*](summarize-hed-validation-anchor) | Validate the data files and report any errors. | +| | [*summarize_sidecar_from_events*](summarize-sidecar-from-events-anchor) | Generate a sidecar template from an event file. | +```` + +The **clean-up** operations are used at various phases of restructuring to assure consistency +across dataset files. + +The **factor** operations produce column vectors with the same number of rows as the data file +from which they were calculated. +They encode condition variables, design matrices, or other search criteria. +See the [**HED conditions and design matrices**](./HedConditionsAndDesignMatrices.md) +for more information on factoring and analysis. + +The **restructure** operations modify the way in which a data file represents its information. + +The **summarize** operations produce dataset-wide summaries of various aspects of the data files +as well as summaries of the individual files. + +(installing-the-remodel-tools-anchor)= +## Installing the remodel tools + +The remodeling tools are available in the GitHub +[**hed-python**](https://github.com/hed-standard/hed-python) repository +along with other tools for data cleaning and curation. +Although version 0.1.0 of this repository is available on [**PyPI**](https://pypi.org/) +as `hedtools`, the version containing the restructuring tools (Version 0.2.0) +is still under development and has not been officially released. +However, the code is publicly available on the `develop` branch of the +hed-python repository and +can be directly installed from GitHub using `pip`: + +```text +pip install git+https://github.com/hed-standard/hed-python/@develop +``` + +The web services and online tools supporting remodeling are available +on the [**HED online tools dev server**](https://hedtools.ucsd.edu/hed_dev). +When version 0.2.0 of `hedtools` is officially released on PyPI, restructuring +will become available on the released [**HED online tools**](https://hedtools.ucsd.edu/hed). +A docker version is also under development. + +The following diagram shows a schematic of the remodeling process. + +![Event remodeling process](./_static/images/EventRemappingProcess.png) + +Initially, the user creates a backup of the specified tabular files (usually `events.tsv` files). +This backup is a mirror of the data files in the dataset, +but is located in the `derivatives/remodel/backups` directory and never modified once the backup is created. + +Remodeling applies a sequence of operations specified in a JSON remodel file +to the backup versions of the data files. +The JSON remodel file provides a record of the operations performed on the file. +If the user detects a mistake in the transformations, +he/she can correct the transformation file and rerun the transformations. + +Remodeling always runs on the original backup version of the file rather than +the transformed version, so the transformations can always be corrected and rerun. +It is possible to by-pass the backup, particularly if only using summarization operations, +but this is not recommended and should be done with care. + +(remodel-command-line-interface-anchor)= +## Remodel command-line interface + +The remodeling toolbox provides Python scripts with command-line interfaces +to create or restore backups and to apply operations to the files in a dataset. +The file remodeling tools may be applied to datasets that are in free form under a directory root +or that are in [**BIDS-format**](https://bids.neuroimaging.io/). +Some operations use [**HED (Hierarchical Event Descriptors)**](./IntroductionToHed.md) annotations. +See the [**Remodel with HED**](remodel-with-hed-anchor) section for a discussion +of these operations and how to use them. + +The remodeling command-line interface can be used from the command line, +called from another Python program, or used in a Jupyter notebooks. +Example Jupyter notebooks using the remodeling commands can be found +[**here**](https://github.com/hed-standard/hed-examples/tree/main/src/jupyter_notebooks/remodeling). + + +(calling-remodel-tools-anchor)= +### Calling remodel tools + +The remodeling tools provide three Python programs for backup (`run_remodel_backup`), +remodeling (`run_remodel`) and restoring (`run_remodel_restore`) event files. +These programs can be called from the command line or from another Python program. + +The programs use a standard command-line argument list for specifying input as summarized in the following table. + +(remodeling-operation-summary-anchor)= +````{table} Summary of command-line arguments for the remodeling programs. +| Script name | Arguments | Purpose | +| ----------- | -------- | ------- | +|*run_remodel_backup* | *data_dir*
*-bd -\\-backup-dir*
*-bn -\\-backup-name*
*-e -\\-extensions*
*-f -\\-file-suffix*
*-t -\\-task-names*
*-v -\\-verbose*
*-x -\\-exclude-dirs*| Create a backup event files. | +|*run_remodel* | *data_dir*
*model_path*
*-b -\\-bids-format*
*-bd -\\-backup-dir*
*-bn -\\-backup-name*
*-e -\\-extensions*
*-f -\\-file-suffix*
*-i -\\-individual-summaries*
*-j -\\-json-sidecar*
*-ld -\\-log-dir*
*-nb -\\-no-backup*
*-ns -\\-no-summaries*
*-nu -\\-no-update*
*-r -\\-hed-version*
*-s -\\-save-formats*
*-t -\\-task-names*
*-v -\\-verbose*
*-w -\\-work-dir*
*-x -\\-exclude-dirs* | Restructure or summarize the event files. | +|*run_remodel_restore* | *data_dir*
*-bd -\\-backup-dir*
*-bn -\\-backup-name*
*-t -\\-task-names*
*-v -\\-verbose*
| Restore a backup of event files. | + +```` +All the scripts have a required argument, which is the full path of the dataset root (*data_dir*). +The `run_remodel` program has a required parameter which is the full path of a JSON file +containing a specification of the remodeling commands to be run. + +(remodel-command-line-arguments-anchor)= +### Remodel command-line arguments + +This section describes the arguments that are used for the remodeling command-line interface +with examples and more details. + +#### Positional arguments + +Positional arguments are required and must be given in the order specified. + +`data_dir` +> The full path of dataset root directory. + +`model_path` +> The full path of the JSON remodel file (for *run_remodel* only). +> +#### Named arguments + +Named arguments consist of a key starting with a hyphen and are possibly followed by a value. +Named arguments can be given in any order or omitted. +If omitted, a specified default is used. +Argument keys and values are separated by spaces. + +For argument values that are lists, the key is given followed by the items in the list, +all separated by spaces. + +Each command has two different forms of the key name: a short form (a single hyphen followed by a single character) +and a longer form (two hyphens followed by a more self-explanatory name). +Users are free to use either form. + +`-b`, `--bids-format` +> If this flag present, the dataset is in BIDS format with sidecars. Tabular files and their associated sidecars are located using BIDS naming. + +`-bd`, `--backup-dir` +> The path to the directory holding the backups (default: `[data_root]/derivatives/remodel/backups`). +> Use the `-nb` option if you wish to omit the backup (in `run_remodel`). + +`-bn`, `--backup-name` +> The name of the backup used for the remodeling (default: `default_back`). + +`-e`, `--extensions` +> This option is followed by a list of file extension(s) of the data files to process. +> The default is `.tsv`. Comma separated tabular files are not permitted. + +`-f`, `--file-suffix` +> This option is followed by the suffix names of the files to be processed. +> For example `events` (the default) captures files named `events.tsv` if the default extension is used. +> The filename without the extension must end in one of the specified suffixes in order to be +> backed up or transformed. + +`-i`, `--individual-summaries` +> This option offers a choice among three options: +> - `separate`: Individual summaries for each file in separate files in addition the overall summary. +> - `consolidated`: Individual summaries written in the same file as the overall summary. +> - `none`: Only an overall summary. + +`-j`, `--json-sidecar` +> This option is followed by the full path of the JSON sidecar with HED annotations to be +> applied during the processing of HED-related remodeling operations. + +`-ld`, `--log-dir` +> This option is followed by the full path of a directory for writing log files. +> A log file is written if the remodeling tools raise an exception and the program terminates. +> Note that a log file is not written for issues gathered during operations such as `summarize_hed_valistion` +> because reporting HED validation errors is a normal part of this operation. +> On the other hand, errors in the JSON remodeling file do raise and exception and are reported in the log. + +`-nb`, `--no-backup` +> If present, no backup is used. Rather operations are performed directly on the files. + +`-ns`, `--no-summaries` +> If present, no summary files are output. + +`-nu`, `--no-update` +> If present, the modified files are not output. + +`-r`, `--hed-versions` +> This option is followed by one or more HED versions. Versions of the standard schema are specified +> by their semantic versions (e.g., `8.1.0`), while library schema versions are prefixed by their +> library name (e.g., `score_1.0.0`). + +> If more than one HED schema version is given, all but one of the versions must start with an +> additional namespace designator (e.g., `sc:`). At most one version can omit the namespace designator +> when multiple schema are being used. In annotations, tags must start with the namespace +> designator of the corresponding schema from which they were selected (e.g. `sc:Sleep-modulator` +> if the SCORE library was designated by `sc:score_1.0.0`). + +`-s`, `--save-formats` +> This option is followed by the extensions (including .) of the formats in which +> to save summaries (default: `.txt` `.json`). + +`-t`, `--task-names` +> The name(s) of the tasks to be included (for BIDS-formatted files only). +> When a dataset includes multiple tasks, the event files are often structured +> differently for each task and thus require different transformation files. +> This option allows the backups and operations to be restricted to an individual task. + +> If this option is omitted, all tasks are used. This means that all `events.tsv` files are +> restored from a backup if the backup is used, the operations are performed on all `events.tsv` files, and summaries are combined over all tasks. + +> If a list of specific task names follows this option, only datafiles corresponding to +> the listed tasks are processed giving separate summaries for each listed task. + +> If a "*" follows this option, all event files are processed and separate summaries are created for each task. + +> Task detection follows the BIDS convention. Tasks are detected by finding "task-x" in the file names of `events.tsv` files. Here x is the name of the task. The task name is followed by an underbar, by a period, or be at the end of the filename. + +`-v`, `--verbose` +> If present, more comprehensive messages documenting transformation progress +> are printed to standard output. + +`-w`, `--work-dir` +> The path to the remodeling work root directory --both for summaries (default: `[data_root]/derivatives/remodel`). +> Use the `-nb` option if you wish to omit the backup (in `run_remodel`). + +`-x`, `--exclude-dirs` +> The directories to exclude when gathering the data files to process. +> For BIDS datasets, these are typically `derivatives`, `stimuli`, and `sourcecode`. +> Any subdirectory with a path component named `remodel` is automatically excluded from remodeling, as +> these directories are reserved for storing backup, state, and result information for the remodeling process itself. + +(remodel-scripts-anchor)= +## Remodel scripts + +This section discusses the three main remodeling scripts with command-line interfaces +to support backup, remodeling, and restoring the tabular files used in the remodeling process. +These scripts can be run from the command line or from another Python program using a function call. + +(backing-up-files-anchor)= +### Backing up files + +The `run_remodel_backup` Python program creates a backup of the specified files. +The backup is always created in the `derivatives/remodel/backups` subdirectory +under the dataset root as shown in the following example for the +sample dataset `eeg_ds003645s_hed_remodel`, +which can be found in the `datasets` subdirectory of the +[**hed-examples**](https://github.com/hed-standard/hed-examples) GitHub repository. + +![Remodeling backup structure](./_static/images/RemodelingBackupStructure.png) + + +The backup process creates a mirror of the directory structure of the source files to be backed up +in the directory `derivatives/remodel/backups/backup_name/backup_root` as shown in the figure above. +The default backup name is `default_back`. + +In the above example, the backup has subdirectories `sub-002` and `sub-003` just +like the main directory of the dataset. +These subdirectories only contain backups of the files to be transformed +(by default files with names ending in `events.tsv`). + +In addition to the `backup_root`, the backup directory also contains a dictionary of backup files +in the `backup_lock.json` file. This dictionary is used internally by the remodeling tools. +The backup should be created once and not modified by the user. + +The following example shows how to run the `run_remodel_backup` program from the command line +to back up the dataset located at `/datasets/eeg_ds003645s_hed_remodel`. + +(remodel-backup-anchor)= +````{admonition} Example of calling run_remodel_backup from the command line. +:class: tip + +```bash +python run_remodel_backup /datasets/eeg_ds003645s_hed_remodel -x derivatives stimuli + +``` +```` + +Since the `-f` and `-e` arguments are not given, the default file suffix and extension values +apply, so only files of the form `events.tsv` are backed up. +The `-x` option excludes any source files from the `derivatives` and `stimuli` subdirectories. +These choices can be overridden using additional command-line arguments. + +The following shows how the `run_remodel_backup` program can be called from a +Python program or a Jupyter notebook. +The command-line arguments are given in a list instead of on the command line. + +(remodel-backup-jupyter-anchor)= +````{admonition} Example of Python code to call run_remodel_backup using a function call. +:class: tip + +```python + +import hed.tools.remodeling.cli.run_remodel_backup as cli_backup + +data_root = '/datasets/eeg_ds003645s_hed_remodel' +arg_list = [data_root, '-x', 'derivatives', 'stimuli'] +cli_backup.main(arg_list) + +``` +```` + +During remodeling, each file in the source is associated with a backup file using +its relative path from the dataset root. +Remodeling is performed by reading the backup file, performing the operations specified in the +JSON remodel file, and overwriting the source file as needed. + +Users can also create alternatively named backups by providing the `-n` argument with a backup name to +the `run_remodel_backup` program. +To use backup files from another named backup, call the remodeling program with +the `-n` argument and the correct backup name. +Named backups can provide checkpoints to allow the execution of +transformations to start from intermediate points. + +**NOTE**: You should not delete backups, even if you have created multiple named backups. +The backups provide useful state and provenance information about the data. + +(remodeling-files-anchor)= +### Remodeling files + +Remodeling consists of applying a sequence of operations from the +[**remodel operation summary**](remodel-operation-summary-anchor) +to successively transform each backup file according to the instructions +and to overwrite the actual files with the final result. + +If the dataset has no backups, the actual data files rather than the backups are transformed. +You are expected to [**create the backup**](backing-up-files-anchor) (just once) +before running the remodeling operations. +Going without backup is not recommended unless you are only doing summarization operations. + +The operations are specified as a list of dictionaries in a JSON file in the +[**remodel sample files**](remodel-sample-files-anchor) as discussed below. + +Before running remodeling transformations on an entire dataset, +consider using the [**HED online tools**](https://hedtools.ucsd.edu/hed) +to debug your remodeling operation file on a single file. +The remodeling process always starts with the original backup files, +so the usual development path is to incrementally add operations to the end +of your transformation JSON file as you develop and test on a single file +until you have the desired end result. + +The following example shows how to run a remodeling script from the command line. +The example assumes that the backup has already been created for the dataset. + +(run-remodel-anchor)= +````{admonition} Example of calling run_remodel from the command line. +:class: tip + +```bash +python run_remodel /datasets/eeg_ds003645s_hed_remodel /datasets/remove_extra_rmdl.json -x derivatives simuli + +``` +```` + +The script has two required arguments the dataset root and the path to the JSON remodel file. +Usually, the JSON remodel files are stored with the dataset itself in the +`derivatives/remodel/remodeling_files` subdirectory, but common scripts can be stored in a central place elsewhere. + +The additional keyword option, `-x` in the example indicates that directory paths containing the component `derivatives` or the component `stimuli` should be excluded. +Excluded directories need not have their excluded path component at the top level of the dataset. +Subdirectory paths containing the `remodel` path component are automatically excluded. + +The command-line interface can also be used in a Jupyter notebook or as part of a larger Python +program by calling the `main` function with the equivalent command-line arguments provided +in a list with the positional arguments appearing first. + +The following example shows Python code to remodel a dataset using the command-line interface. +This code can be used in a Jupyter notebook or in another Python program. + +````{admonition} Example Python code to call run_remodel using a function call. +:class: tip + +```python +import hed.tools.remodeling.cli.run_remodel as cli_remodel + +data_root = '/datasets/eeg_ds003645s_hed_remodel' +model_path = '/datasets/remove_extra_rmdl.json' +arg_list = [data_root, model_path, '-x', 'derivatives', 'stimuli'] +cli_remodel.main(arg_list) + +``` +```` + +(restoring-files-anchor)= +### Restoring files + +Since remodeling always uses the backed up version of each data file, +there is no need to restore these files to their original state +between remodeling runs. +However, when finished with an analysis, +you may want to restore the data files to their original state. + +The following example shows how to call `run_remodel_restore` to +restore the data files from the default backup. +The restore operation restores all the files in the specified backup. + +(run-remodel-restore-anchor)= +````{admonition} Example of calling run_remodel_restore from the command line. +:class: tip + +```bash +python run_remodel_restore /datasets/eeg_ds003645s_hed_remodel + +``` +```` + +As with the other command-line programs, `run_remodel_restore` can be also called using a function call. + +````{admonition} Example Python code to call *run_remodel_restore* using a function call. +:class: tip + +```python +import hed.tools.remodeling.cli.run_restore as cli_remodel + +data_root = '/datasets/eeg_ds003645s_hed_remodel' +cli_remodel.main([data_root]) + +``` +```` +(remodel-with-hed-anchor)= +## Remodel with HED + +[**HED**](introduction-to-hed-anchor) (Hierarchical Event Descriptors) is a +system for annotating data in a manner that is both human-understandable and machine-actionable. +HED provides much more detail about the events and their meanings, +If you are new to HED see the +[**HED annotation quickstart**](./HedAnnotationQuickstart.md). +For information about HED's integration into BIDS (Brain Imaging Data Structure) see +[**BIDS annotation quickstart**](./BidsAnnotationQuickstart.md). + +Currently, five remodeling operations rely on HED annotations: +- [**factor_hed_tags**](factor-hed-tags-anchor) +- [**factor_hed_type**](factor-hed-type-anchor) +- [**summarize_hed_tags**](summarize-hed-tags-anchor) +- [**summarize_hed_type**](summarize-hed-type-anchor) +- [**summarize_hed_validation**](summarize-hed-validation-anchor). + +HED tags provide a mechanism for advanced data analysis and for +extracting experiment-specific information from the data files. +However, since HED information is not always stored in the data files themselves, +you may need to provide a HED schema and a JSON sidecar. + +The HED schema defines the allowed HED tag vocabulary, and the JSON sidecar +associates HED annotations with the information in the columns of the event files. +If you are not using any of the HED operations in your remodeling, +you do not have to provide this information. + + +(extracting-hed-information-from-bids-anchor)= +### Extracting HED information from BIDS + +The simplest way to use HED with `run_remodel` is to use the `-b` option, +which indicates that the dataset is in [**BIDS**](https://bids.neuroimaging.io/) (Brain Imaging Data Structure) format. + +BIDS is a standardized way of organizing neuroimaging data. +HED and BIDS are well integrated. +If you are new to BIDS, see the +[**BIDS annotation quickstart**](./BidsAnnotationQuickstart.md). + +A HED-annotated BIDS dataset provides the HED schema version in the `dataset_description.json` +file located directly under the BIDS dataset root. + +BIDS datasets must have filenames in a specific format, +and the HED tools can locate the relevant JSON sidecars for each data file based on this information. + + +(directly-specifying-hed-information-anchor)= +### Directly specifying HED information + +If your data is already in BIDS format, using the `-b` option is ideal since +the needed information can be located automatically. +However, early in the experimental process, +your datafiles are not likely to be organized in BIDS format, +and this option will not be available if you want to use HED. + +Without the `-b` option, the remodeling tools locate the appropriate files based +on specified filename suffixes and extensions. +In order to use HED operations, you must explicitly specify the HED versions +using the `-r` option. +The `-r` option supports a list of HED versions if multiple HED schemas are used. +For example: `-r 8.1.0 sc:score_1.0.0` specifies that vocabulary will be drawn +from standard HED Version 8.1.0 and from +HED SCORE library version 1.0.0. +Annotations containing tags from SCORE should be prefixed with `sc:`. +Note: both of the schemas can be viewed by the [**HED Schema Viewer**](https://www.hedtags.org/display_hed.html). + +Usually, annotators will consolidate HED annotations in a single JSON sidecar file +located at the top-level of the dataset. +The path of this sidecar can be passed as a command-line argument using the `-j` option. +If more than one JSON sidecar file contains HED annotations, users will need to call the lower-level +remodeling functions to perform these operations. + +The following example illustrates a command-line call that passes both a HED schema version and +the path to the JSON file with the HED annotations. + +(run-remodel-with-hed-direct-anchor)= +````{admonition} Remodeling a non-BIDS dataset using HED. +:class: tip + +```bash +python run_remodel /datasets/eeg_ds003645s_hed_remodel /datasets/summarize_conditions_rmdl.json \ +-x derivatives simuli -r 8.1.0 -j /datasets/eeg_ds003645s_hed_remodel/task-FacePerception_events.json + +``` +```` + +(remodel-with-hed-direct-python-anchor)= +````{admonition} Example Python code to use run_remodel on a non-BIDS dataset. +:class: tip + +```python +import hed.tools.remodeling.cli.run_remodel as cli_remodel + +data_root = '/datasets/eeg_ds003645s_hed_remodel' +model_path = '/datasets/summarize_conditions_rmdl.json' +json_path = '/datasets/eeg_ds003645s_hed_remodel/task-FacePerception_events.json' +arg_list = [data_root, model_path, '-x', 'derivatives', 'stimuli', '-r' 8.1.0 '-j' json_path] +cli_remodel.main(arg_list) + +``` +```` + +(remodel-error-handling-anchor)= +## Remodel error handling + +Errors can occur during several stages in during remodeling and how they are +handled depends on the type of error and where the error occurs. +Except for the validation summary, the underlying remodeling code raises exceptions for most errors. + + +(errors-in-the-remodel-file-anchor)= +### Errors in the remodel file + +Each operation requires specific parameters to execute properly. +The underlying implementation for each operation defines these parameters using a [**json schema**](https://json-schema.org/) +as the `PARAMS` property of the operation's class definition. +The use of the JSON schema allows the remodeler to specify and validate requirements on most of an +operation's parameters using standardized methods. + +The [**remodeler_validator**](https://github.com/hed-standard/hed-python/blob/master/hed/tools/remodeling/remodeler_validator.py) +compiles a JSON schema for the remodeler from individual operations and validates +the remodel file against the compiled JSON schema. The validator should always before executing any remodel operations. + +For example, the command line [**run_remodel**](https://raw.githubusercontent.com/hed-standard/hed-python/develop/hed/tools/remodeling/cli/run_remodel.py) +program calls the validator before executing any operations. +If there are errors, `run_remodel` reports the errors for all operations and exits. +This allows users to correct errors in all operations in one pass without any data modification. +The [**HED online tools**](https://hedtools.org/hed) are particularly useful for debugging +the syntax and other issues in the remodeling process. + +(execution-time-remodel-errors-anchor)= +### Execution-time remodel errors + +When an error occurs during execution, an exception is raised. +Exceptions are raised for invalid or missing files or if a transformed file +is unable to be rewritten due to improper file permissions. +Each individual operation may also raise an exception if the +data file being processed does not have the expected information, +such as a column with a particular name. + +Exceptions raised during execution cause the process to be terminated and no +further files are processed. + + +(remodel-sample-files-anchor)= +## Remodel sample files + +All remodeling operations are specified in a standardized JSON remodel input file. +The following shows the contents of the JSON remodeling file `remove_extra_rmdl.json`, +which contains a single operation with instructions to remove the `value` and `sample` columns +from the data file if the columns exist. + +(sample-remodel-file-anchor)= +### Sample remodel file + +````{admonition} A sample JSON remodeling file with a single remove_columns transformation operation. +:class: tip + +```json +[ + { + "operation": "remove_columns", + "description": "Remove unwanted columns prior to analysis", + "parameters": { + "remove_names": ["value", "sample"] + } + } +] + +``` +```` + +Each operation is specified in a dictionary with three top-level keys: "operation", "description", +and "parameters". The value of the "operation" is the name of the operation. +The "description" value should include the reason this operation was needed, +not just a description of the operation itself. +Finally, the "parameters" value is a dictionary mapping parameter name to +parameter value. + +The parameters for each operation are listed in +[**Remodel transformations**](remodel-transformations-anchor) and +[**Remodel summarizations**](remodel-summarizations-anchor) sections. +An operation may have both required and optional parameters. +Optional parameters may be omitted if unneeded, but all parameters are specified in +the "parameters" section of the dictionary. +The full specification of the remodel file is also provided as a [**JSON schema**](https://json-schema.org/). + +The remodeling JSON files should have names ending in `_rmdl.json` to more easily +distinguish them from other JSON files. +Although these files can be stored anywhere, their preferred location is +in the `derivatives/remodel/models` subdirectory under the dataset root so +that they can provide provenance for the dataset. + +(sample-remodel-event-file-anchor)= +### Sample remodel event file + +Several examples illustrating the remodeling operations use the following excerpt of the stop-go task from sub-0013 +of the AOMIC-PIOP2 dataset available on [**OpenNeuro**](https://openneuro.org) as ds002790. +The full event file is +[**sub-0013_task-stopsignal_acq-seq_events.tsv**](./_static/data/sub-0013_task-stopsignal_acq-seq_events.tsv). + + +````{admonition} Excerpt from an event file from the stop-go task of AOMIC-PIOP2 (ds002790). +| onset | duration | trial_type | stop_signal_delay | response_time | response_accuracy | response_hand | sex | +| ----- | -------- | ---------- | ----------------- | ------------- | ----------------- | ------------- | --- | +| 0.0776 | 0.5083 | go | n/a | 0.565 | |correct | right | female +| 5.5774 | 0.5083 | unsuccesful_stop | 0.2 | 0.49 | correct | right | female | +| 9.5856 | 0.5084 | go | n/a | 0.45 | correct | right | female | +| 13.5939 | 0.5083 | succesful_stop | 0.2 | n/a | n/a | n/a | female | +| 17.1021 | 0.5083 | unsuccesful_stop | 0.25 | 0.633 | correct | left | male | +| 21.6103 | 0.5083 | go | n/a | 0.443 | correct | left | male | +```` + +(Sample-remodel-sidecar-file-anchor)= +### Sample remodel sidecar file + +For remodeling operations that use HED, a JSON sidecar is usually required to provide the +necessary HED annotations. The following JSON sidecar excerpt is used in several examples to +illustrate some of these operations. +The full JSON file can be found at +[**task-stopsiqnal_acq-seq_events.json**](./_static/data/task-stopsignal_acq-seq_events.json). + + +````{admonition} Excerpt of JSON sidecar with HED annotations for the stop-go task of AOMIC-PIOP2. +:class: tip + +```json +{ + "trial_type": { + "HED": { + "succesful_stop": "Sensory-presentation, Visual-presentation, Correct-action, Image, Label/succesful_stop", + "unsuccesful_stop": "Sensory-presentation, Visual-presentation, Incorrect-action, Image, Label/unsuccesful_stop", + "go": "Sensory-presentation, Visual-presentation, Image, Label/go" + } + }, + "stop_signal_delay": { + "HED": "(Auditory-presentation, Delay/# s)" + }, + "sex": { + "HED": { + "male": "Def/Male-image-cond", + "female": "Def/Female-image-cond" + } + }, + "hed_defs": { + "HED": { + "def_male": "(Definition/Male-image-cond, (Condition-variable/Image-sex, (Male, (Image, Face))))", + "def_female": "(Definition/Female-image-cond, (Condition-variable/Image-sex, (Female, (Image, Face))))" + } + } +} +``` +```` +Notice that the JSON file has some keys (e.g., "trial_type", "stop_signal_delay", and "sex") +which also correspond to columns in the events file. +The "hed_defs" key corresponds to an extra entry in the JSON file that, in this case, provides the definitions needed in the HED annotation. + +HED operations also require the HED schema. Most of the examples use HED standard schema version 8.1.0. + +(remodel-transformations-anchor)= +## Remodel transformations + +(factor-column-anchor)= +### Factor column + +The *factor_column* operation appends factor vectors to tabular files +based on the values in a specified file column. +Each factor vector contains a 1 if the corresponding row had that column value and a 0 otherwise. +The *factor_column* is used to reformat event files for analyses such as linear regression +based on column values. + +(factor-column-parameters-anchor)= +#### Factor column parameters + +```{admonition} Parameters for the *factor_column* operation. +:class: tip + +| Parameter | Type | Description | +| ------------ | ---- | ----------- | +| *column_name* | str | The name of the column to be factored.| +| *factor_values* | list | Column values to be included as factors. | +| *factor_names* | list| (**Optional**) Column names for created factors. | +``` + +If *column_name* is not a column in the data file, a `ValueError` is raised. + +If *factor_values* is empty, factors are created for each unique value in *column_name*. +Otherwise, only factors for the specified column values are generated. +If a specified value is missing in a particular file, the corresponding factor column contains all zeros. + +If *factor_names* is empty, the newly created columns are of the +form *column_name.factor_value*. +Otherwise, the newly created columns have names *factor_names*. +If *factor_names* is not empty, then *factor_values* must also be specified +and both lists must be of the same length. + +(factor-column-example-anchor)= +#### Factor column example + +The *factor_column* operation in the following example specifies that factor columns +should be created for *succesful_stop* and *unsuccesful_stop* of the *trial_type* column. +The resulting columns are called *stopped* and *stop_failed*, respectively. + + +````{admonition} A sample JSON file with a single *factor_column* transformation operation. +:class: tip + +```json +[{ + "operation": "factor_column", + "description": "Create factors for the succesful_stop and unsuccesful_stop values.", + "parameters": { + "column_name": "trial_type", + "factor_values": ["succesful_stop", "unsuccesful_stop"], + "factor_names": ["stopped", "stop_failed"] + } +}] +``` +```` + +The results of executing this *factor_column* operation on the +[**sample remodel event file**](sample-remodel-event-file-anchor) are: + +````{admonition} Results of the factor_column operation on the samplepip data. + +| onset | duration | trial_type | stop_signal_delay | response_time | response_accuracy | response_hand | sex | stopped | stop_failed | +| ----- | -------- | ---------- | ----------------- | ------------- | ----------------- | ------------- | --- | ---------- | ---------- | +| 0.0776 | 0.5083 | go | n/a | 0.565 | correct | right | female | 0 | 0 | +| 5.5774 | 0.5083 | unsuccesful_stop | 0.2 | 0.49 | correct | right | female | 0 | 1 | +| 9.5856 | 0.5084 | go | n/a | 0.45 | correct | right | female | 0 | 0 | +| 13.5939 | 0.5083 | succesful_stop | 0.2 | n/a | n/a | n/a | female | 1 | 0 | +| 17.1021 | 0.5083 | unsuccesful_stop | 0.25 | 0.633 | correct | left | male | 0 | 1 | +| 21.6103 | 0.5083 | go | n/a | 0.443 | correct | left | male | 0 | 0 | +```` + +(factor-hed-tags-anchor)= +### Factor HED tags + +The *factor_hed_tags* operation is similar to the *factor_column* operation +in that it produces factor vectors containing 0's and 1, +which are appended to the returned DataFrame. +However, rather than basing these vectors on values in a specified column, +the factors are computed by determining whether the assembled HED annotations for each row +satisfies a specified search query. + +An example search query is whether the assembled HED annotation contains a particular HED tag. +The [**HED search guide**](./HedSearchGuide.md) tutorial discusses the HED search facility in more detail. + + +(factor-hed-tags-parameters-anchor)= +#### Factor HED tags parameters + +```{admonition} Parameters for the *factor_hed_tags* operation. +:class: tip + +| Parameter | Type | Description | +| ------------ | ---- | ----------- | +| *queries* | list | A list of HED query strings. | +| *query_names* | list | (**Optional**) A list of names for the factor columns generated by the queries. | +| *remove_types* | list | (**Optional**) Structural HED tags to be removed (usually `Condition-variable` and `Task`). | +| *expand_context* | bool | (**Optional**: default True) Expand the context and remove
`Onset` and`Offset` tags before the query. | + +``` +The *query_names* list, which must be empty or the same length as *queries*, +contains the names of the factor columns produced by the search. +If the *query_names* list is empty, the result columns are titled "query_1", +"query_2", etc. + +Most of the time the *remove_types* should be set to `["Condition-variable", "Task"]` and the effects of +the experimental design captured using the `factor_hed_types_op`. +If *expand_context* is set to *false*, the additional context provided by `Onset`, `Offset`, and `Duration` +is ignored. + +(factor-hed-tags-example-anchor)= +#### Factor HED tags example + +The *factor_hed-tags* operation in the following example produce two factor +columns with 1's where the HED string for a row contains the `Correct-action` +and `Incorrect-action`, respectively. +The resulting factor columns are named *correct* and *incorrect*, respectively. + +````{admonition} A sample JSON file with a single *factor_hed_tags* transformation operation. +:class: tip + +```json +[{ + "operation": "factor_hed_tags", + "description": "Create factors based on whether the event represented a correct or incorrect action.",, + "parameters": { + "queries": ["correct-action", "incorrect-action"], + "query_names": ["correct", "incorrect"], + "remove_types": ["Condition-variable", "Task"], + "expand_context": false + } +}] +``` +```` + +The results of executing this *factor_hed-tags* operation on the +[**sample remodel event file**](sample-remodel-event-file-anchor) using the +[**sample remodel sidecar file**](sample-remodel-sidecar-file-anchor) for HED annotations is: + + +````{admonition} Results of *factor_hed_tags*. + +| onset | duration | trial_type | stop_signal_delay | response_time | response_accuracy | response_hand | sex | correct | incorrect | +| ----- | -------- | ---------- | ----------------- | ------------- | ----------------- | ------------- | --- | ---------- | ---------- | +| 0.0776 | 0.5083 | go | n/a | 0.565 | correct | right | female | 0 | 0 | +| 5.5774 | 0.5083 | unsuccesful_stop | 0.2 | 0.49 | correct | right | female | 0 | 1 | +| 9.5856 | 0.5084 | go | n/a | 0.45 | correct | right | female | 0 | 0 | +| 13.5939 | 0.5083 | succesful_stop | 0.2 | n/a | n/a | n/a | female | 1 | 0 | +| 17.1021 | 0.5083 | unsuccesful_stop | 0.25 | 0.633 | correct | left | male | 0 | 1 | +| 21.6103 | 0.5083 | go | n/a | 0.443 | correct | left | male | 0 | 0 | +```` + +(factor-hed-type-anchor)= +### Factor HED type + +The *factor_hed_type* operation produces factor columns +based on values of the specified HED type tag. +The most common type is the HED *Condition-variable* tag, which corresponds to +factor vectors based on the experimental design. +Other commonly use type tags include *Task*, *Control-variable*, and *Time-block*. + +We assume that the dataset has been annotated using HED tags to properly document +information such as experimental conditions, and focus on how such an annotated dataset can be +used with remodeling to produce factor columns corresponding to these +type variables. + +For additional information on how to encode experimental designs using HED, see +[**HED conditions and design matrices**](./HedConditionsAndDesignMatrices.md). + +(factor-hed-type-parameters-anchor)= +#### Factor HED type parameters + +```{admonition} Parameters for *factor_hed_type* operation. +:class: tip + +| Parameter | Type | Description | +| ------------ | ---- | ----------- | +| *type_tag* | str | HED tag used to find the factors (most commonly *Condition-variable*).| +| *type_values* | list | (**Optional**) Values to factor for the *type_tag*.
If omitted, all values of that *type_tag* are used. | +``` +The event context (as defined by onsets, offsets and durations) is always expanded and one-hot (0's and 1's) +encoding is used for the factors. + +(factor-hed-type-example-anchor)= +#### Factor HED type example + +The *factor_hed_type* operation in the following example appends +additional columns to each data file corresponding to +each possible value of each *Condition-variable* tag. +The columns contain 1's for rows corresponding to rows (e.g., events) for which that condition +applies and 0's otherwise. + +````{admonition} A JSON file with a single *factor_hed_type* transformation operation. +:class: tip + +```json +[{ + "operation": "factor_hed_type", + "description": "Factor based on the sex of the images being presented.", + "parameters": { + "type_tag": "Condition-variable" + } +}] +``` +```` + +The results of executing this *factor_hed-tags* operation on the +[**sample remodel event file**](sample-remodel-event-file-anchor) using the +[**sample remodel sidecar file**](sample-remodel-sidecar-file-anchor) for HED annotations are: + + +````{admonition} Results of *factor_hed_type*. + +| onset | duration | trial_type | stop_signal_delay | response_time | response_accuracy | response_hand | sex | Image-sex.Female-image-cond | Image-sex.Male-image-cond | +| ----- | -------- | ---------- | ----------------- | ------------- | ----------------- | ------------- | --- | ------- | ---------- | +| 0.0776 | 0.5083 | go | n/a | 0.565 | correct | right | female | 1 | 0 | +| 5.5774 | 0.5083 | unsuccesful_stop | 0.2 | 0.49 | correct | right | female | 1 | 0 | +| 9.5856 | 0.5084 | go | n/a | 0.45 | correct | right | female | 1 | 0 | +| 13.5939 | 0.5083 | succesful_stop | 0.2 | n/a | n/a | n/a | female | 1 | 0 | +| 17.1021 | 0.5083 | unsuccesful_stop | 0.25 | 0.633 | correct | left | male | 0 | 1 | +| 21.6103 | 0.5083 | go | n/a | 0.443 | correct | left | male | 0 | 1 | +```` + +(merge-consecutive-anchor)= +### Merge consecutive + +Sometimes a single long event in experimental logs is represented by multiple repeat events. +The *merge_consecutive* operation collapses these consecutive repeat events into one event with +duration updated to encompass the temporal extent of the merged events. + +(merge-consecutive-parameters-anchor)= +#### Merge consecutive parameters + +```{admonition} Parameters for the *merge_consecutive* operation. +:class: tip + +| Parameter | Type | Description | +| ------------ | ---- | ----------- | +| *column_name* | str | The name of the column which is the basis of the merge.| +| *event_code* | str, int, float | The value in *column_name* that triggers the merge. | +| *set_durations* | bool | If true, set durations based on merged events. | +| *ignore_missing* | bool | If true, missing *column_name* or *match_columns* do not raise an error. | +| *match_columns* | list | (**Optional**) Columns whose values must match to collapse events. | +``` + +The first of the group of rows (each representing an event) to be merged is called the anchor +for the merge. After the merge, it is the only row in the group +that remains in the data file. The result is identical +to its original version, except for the value in the `duration` column. + +If the *set_duration* parameter is true, the new duration is calculated as though +the event began with the onset of the first event (the anchor row) in the group and +ended at the point where all the events in the group have ended. +This method allows for small gaps between events and for events in which an +intermediate event in the group ends after later events. +If the *set_duration* parameter is false, the duration of the merged row is set to `n/a`. + +If the data file has other columns besides `onset`, `duration` and *column_name*, +the values in the other columns must be considered during the merging process. +The *match_columns* is a list of the other columns whose values must agree with those +of the anchor row in order for a merge to occur. If *match_columns* is empty, the +other columns in each row are not taken into account during the merge. + +(merge-consecutive-example-anchor)= +#### Merge consecutive example + +The *merge_consecutive* operation in the following example causes consecutive +`succesful_stop` events whose `stop_signal_delay`, `response_hand`, and `sex` columns +have the same values to be merged into a single event. + + +````{admonition} A JSON file with a single *merge_consecutive* transformation operation. +:class: tip + +```json +[{ + "operation": "merge_consecutive", + "description": "Merge consecutive *succesful_stop* events that match the *match_columns.", + "parameters": { + "column_name": "trial_type", + "event_code": "succesful_stop", + "set_durations": true, + "ignore_missing": true, + "match_columns": ["stop_signal_delay", "response_hand", "sex"] + } +}] +``` +```` + +When this operation is applied to the following input file, +the three events with a value of `succesful_stop` in the `trial_type` column starting +at `onset` value 13.5939 are merged into a single event. + +````{admonition} Input file for a *merge_consecutive* operation. + +| onset | duration | trial_type | stop_signal_delay | response_hand | sex | +| ----- | -------- | ---------- | ----------------- | ------------- | --- | +| 0.0776 | 0.5083 | go | n/a | right | female| +| 5.5774 | 0.5083 | unsuccesful_stop | 0.2 | right | female| +| 9.5856 | 0.5084 | go | n/a | right | female| +| 13.5939 | 0.5083 | succesful_stop | 0.2 | n/a | female| +| 14.2 | 0.5083 | succesful_stop | 0.2 | n/a | female| +| 15.3 | 0.7083 | succesful_stop | 0.2 | n/a | female| +| 17.3 | 0.5083 | unsuccesful_stop | 0.25 | n/a | female| +| 19.0 | 0.5083 | unsuccesful_stop | 0.25 | n/a | female| +| 21.1021 | 0.5083 | unsuccesful_stop | 0.25 | left | male| +| 22.6103 | 0.5083 | go | n/a | left | male | +```` + +Notice that the `succesful_stop` event at `onset` value `17.3` is not +merged because the `stop_signal_delay` column value does not match the value in the previous event. +The final result has `duration` computed as `2.4144` = `15.3` + `0.7083` - `13.5939`. + +````{admonition} The results of the *merge_consecutive* operation. + +| onset | duration | trial_type | stop_signal_delay | response_hand | sex | +| ----- | -------- | ---------- | ------------------ | ------------- | --- | +| 0.0776 | 0.5083 | go | n/a | right | female | +| 5.5774 | 0.5083 | unsuccesful_stop | 0.2 | right | female | +| 9.5856 | 0.5084 | go | n/a | right | female | +| 13.5939 | 2.4144 | succesful_stop | 0.2 | n/a | female | +| 17.3 | 2.2083 | unsuccesful_stop | 0.25 | n/a | female | +| 21.1021 | 0.5083 | unsuccesful_stop | 0.25 | left | male | +| 22.6103 | 0.5083 | go | n/a | left | male | +```` + +The events that had onsets at `17.3` and `19.0` are also merged in this example + +(remap-columns-anchor)= +### Remap columns + +The *remap_columns* operation maps combinations of values in *m* specified columns of a data file +into values in *n* columns using a defined mapping. +Remapping is useful during analysis to create columns in event files that are more directly useful +or informative for a particular analysis. + +Remapping is also important during the initial generation of event files from experimental logs. +The log files generated by experimental control software often generate a code for each type of log entry. +Remapping can be used to convert the column containing these codes into one or more columns with more informative information. + + +(remap-columns-parameters-anchor)= +#### Remap columns parameters + + +```{admonition} Parameters for the *remap_columns* operation. +:class: tip + +| Parameter | Type | Description | +| ------------ | ---- | ----------- | +| *source_columns* | list | A list of *m* names of the source columns for the map.| +| *destination_columns* | list | A list of *n* names of the destination columns for the map. | +| *map_list* | list | A list of mappings. Each element is a list of *m* source
column values followed by *n* destination values.
Mapping source values are treated as strings. | +| *ignore_missing* | bool | If false, source column values not in the map generate "n/a"
destination values instead of errors. | +| *integer_sources* | list | (**Optional**) A list of source columns that are integers.
The *integer_sources* must be a subset of *source_columns*. | +``` +A column cannot be both a source and a destination, +and all source columns must be present in the data files. +New columns are created for destination columns that are missing from a data file. + +The *remap_columns* operation only works for columns containing strings or integers, +as it is meant for remapping categorical codes. +You must specify which source columns contain integers so that `n/a` values +can be handled appropriately. + +The *map_list* parameter specifies how each unique combination of values from the source +columns will be mapped into the destination columns. +If there are *m* source columns and *n* destination columns, +then each entry in *map_list* must be a list with *m* + *n* elements. +The first *m* elements are the key values from the source columns. +The *map_list* should have targets for all combinations of values that appear in the *m* source columns +unless *ignore_missing* is true. + +After remapping, the tabular file will contain both source and destination columns. +If you wish to replace the source columns with the destination columns, +use a *remove_columns* transformation after the *remap_columns*. + + +(remap-columns-example-anchor)= +#### Remap columns example + +The *remap_columns* operation in the following example creates a new column called *response_type* +based on the unique values in the combination of columns *response_accuracy* and *response_hand*. + +````{admonition} A JSON file with a single *remap_columns* transformation operation. +:class: tip + +```json +[{ + "operation": "remap_columns", + "description": "Map response_accuracy and response hand into a single column.", + "parameters": { + "source_columns": ["response_accuracy", "response_hand"], + "destination_columns": ["response_type"], + "map_list": [["correct", "left", "correct_left"], + ["correct", "right", "correct_right"], + ["incorrect", "left", "incorrect_left"], + ["incorrect", "right", "incorrect_left"], + ["n/a", "n/a", "n/a"]], + "ignore_missing": true + } +}] +``` +```` +In this example there are two source columns and one destination column, +so each entry in *map_list* must be a list with three elements +two source values and one destination value. +Since all the values in *map_list* are strings, +the optional *integer_sources* list is not needed. + +The results of executing the previous *remap_column* command on the +[**sample remodel event file**](sample-remodel-event-file-anchor) are: + +````{admonition} Mapping columns *response_accuracy* and *response_hand* into a *response_type* column. + +| onset | duration | trial_type | stop_signal_delay | response_time | response_accuracy | response_hand | sex | response_type | +| ----- | -------- | ---------- | ---------- | ----------------- | ------------- | ----------------- | --- | ------------------- | +| 0.0776 | 0.5083 | go | n/a | 0.565 | correct | right | female | correct_right | +| 5.5774 | 0.5083 | unsuccesful_stop | 0.2 | 0.49 | correct | right | female | correct_right | +| 9.5856 | 0.5084 | go | n/a | 0.45 | correct | right | female | correct_right | +| 13.5939 | 0.5083 | succesful_stop | 0.2 | n/a | n/a | n/a | female | n/a | +| 17.1021 | 0.5083 | unsuccesful_stop | 0.25 | 0.633 | correct | left | male | correct_left | +| 21.6103 | 0.5083 | go | n/a | 0.443 | correct | left | male | correct_left | +```` + +In this example, *remap_columns* combines the values from columns `response_accuracy` and +`response_hand` to produce a new column called `response_type` that specifies both response hand and correctness information using a single code. + +(remove-columns-anchor)= +### Remove columns + +Sometimes columns are added during intermediate processing steps. The *remove_columns* +operation is useful for cleaning up unnecessary columns after these processing steps complete. + +(remove-columns-parameters-anchor)= +#### Remove columns parameters + +```{admonition} Parameters for the *remove_columns* operation. +:class: tip + +| Parameter | Type | Description | +| ------------ | ---- | ----------- | +| *column_names* | list of str | A list of columns to remove.| +| *ignore_missing* | boolean | If true, missing columns are ignored, otherwise raise `KeyError`. | +``` + +If one of the specified columns is not in the file and the *ignore_missing* +parameter is *false*, a `KeyError` is raised for the missing column. + +(remove-columns-example-anchor)= +#### Remove columns example + +The following example specifies that the *remove_columns* operation should remove the `stop_signal_delay`, +`response_accuracy`, and `face` columns from the tabular data. + +````{admonition} A JSON file with a single *remove_columns* transformation operation. +:class: tip + +```json +[{ + "operation": "remove_columns", + "description": "Remove extra columns before the next step.", + "parameters": { + "column_names": ["stop_signal_delay", "response_accuracy", "face"], + "ignore_missing": true + } +}] +``` +```` + +The results of executing this operation on the +[**sample remodel event file**](sample-remodel-event-file-anchor) +are shown below. +The *face* column is not in the data, but it is ignored, since *ignore_missing* is true. +If *ignore_missing* had been false, a `KeyError` would have been raised. + +```{admonition} Results of executing the *remove_columns*. +| onset | duration | trial_type | response_time | response_hand | sex | +| ----- | -------- | ---------- | ------------- | ------------- | --- | +| 0.0776 | 0.5083 | go | 0.565 | right | female | +| 5.5774 | 0.5083 | unsuccesful_stop | 0.49 | right | female | +| 9.5856 | 0.5084 | go | 0.45 | right | female | +| 13.5939 | 0.5083 | succesful_stop | n/a | n/a | female | +| 17.1021 | 0.5083 | unsuccesful_stop | 0.633 | left | male | +| 21.6103 | 0.5083 | go | 0.443 | left | male | +```` + +(remove-rows-anchor)= +### Remove rows + +The *remove_rows* operation eliminates rows in which the named column has one of the specified values. +This operation is useful for removing event markers corresponding to particular types of events +or, for example having `n/a` in a particular column. + + +(remove-rows-parameters-anchor)= +#### Remove rows parameters + +```{admonition} Parameters for *remove_rows*. +:class: tip + +| Parameter | Type | Description | +| ------------ | ---- | ----------- | +| *column_name* | str | The name of the column to be tested.| +| *remove_values* | list | A list of values to be tested for removal. | +``` +The operation does not raise an error if a data file does not have a column named +*column_name* or is missing a value in *remove_values*. + +(remove-rows-example-anchor)= +#### Remove rows example + +The following *remove_rows* operation removes the rows whose *trial_type* column +contains either `succesful_stop` or `unsuccesful_stop`. + +````{admonition} A JSON file with a single *remove_rows* transformation operation. +:class: tip + +```json +[{ + "operation": "remove_rows", + "description": "Remove rows where trial_type is either succesful_stop or unsuccesful_stop.", + "parameters": { + "column_name": "trial_type", + "remove_values": ["succesful_stop", "unsuccesful_stop"] + } +}] +``` +```` + +The results of executing the previous *remove_rows* operation on the +[**sample remodel event file**](sample-remodel-event-file-anchor) are: + +````{admonition} The results of executing the previous *remove_rows* operation. + +| onset | duration | trial_type | stop_signal_delay | response_time | response_accuracy | response_hand | sex | +| ----- | -------- | ---------- | ----------------- | ------------- | ----------------- | ------------- | --- | +| 0.0776 | 0.5083 | go | n/a | 0.565 | correct | right | female | +| 9.5856 | 0.5084 | go | n/a | 0.45 | correct | right | female | +| 21.6103 | 0.5083 | go | n/a | 0.443 | correct | left | male | +```` + +After removing rows with `trial_type` equal to `succesful_stop` or `unsuccesful_stop` only the +three `go` trials remain. + + +(rename-columns-anchor)= +### Rename columns + +The `rename_columns` operations uses a dictionary to map old column names into new ones. + +(rename-columns-parameters-anchor)= +#### Rename columns parameters + +```{admonition} Parameters for *rename_columns*. +:class: tip + +| Parameter | Type | Description | +| ------------ | ---- | ----------- | +| *column_mapping* | dict | The keys are the old column names and the values are the new names.| +| *ignore_missing* | bool | If false, a `KeyError` is raised if a dictionary key is not a column name. | + +``` + +If *ignore_missing* is false, a `KeyError` is raised if a column specified in +the mapping does not correspond to a column name in the data file. + +(rename-columns-example-anchor)= +#### Rename columns example + +The following example renames the `stop_signal_delay` column to be `stop_delay` and +the `response_hand` to be `hand_used`. + +````{admonition} A JSON file with a single *rename_columns* transformation operation. +:class: tip + +```json +[{ + "operation": "rename_columns", + "description": "Rename columns to be more descriptive.", + "parameters": { + "column_mapping": { + "stop_signal_delay": "stop_delay", + "response_hand": "hand_used" + }, + "ignore_missing": true + } +}] + +``` +```` + +The results of executing the previous *rename_columns* operation on the +[**sample remodel event file**](sample-remodel-event-file-anchor) are: + +````{admonition} After the *rename_columns* operation is executed, the sample events file is: +| onset | duration | trial_type | stop_delay | response_time | response_accuracy | hand_used | sex | +| ----- | -------- | ---------- | ----------------- | ------------- | ----------------- | ------------- | --- | +| 0.0776 | 0.5083 | go | n/a | 0.565 | correct | right | female | +| 5.5774 | 0.5083 | unsuccesful_stop | 0.2 | 0.49 | correct | right | female | +| 9.5856 | 0.5084 | go | n/a | 0.45 | correct | right | female | +| 13.5939 | 0.5083 | succesful_stop | 0.2 | n/a | n/a | n/a | female | +| 17.1021 | 0.5083 | unsuccesful_stop | 0.25 | 0.633 | correct | left | male | +| 21.6103 | 0.5083 | go | n/a | 0.443 | correct | left | male | +```` + +(reorder-columns-anchor)= +### Reorder columns + +The *reorder_columns* operation reorders the indicated columns in the specified order. +This operation is often used to place the most important columns near the beginning of the file for readability +or to assure that all the data files in dataset have the same column order. +Additional parameters control how non-specified columns are treated. + +(reorder-columns-parameters-anchor)= +#### Reorder columns parameters + +```{admonition} Parameters for the *reorder_columns* operation. +:class: tip + +| Parameter | Type | Description | +| ------------ | ---- | ----------- | +| *column_order* | list | A list of columns in the order they should appear in the data.| +| *ignore_missing* | bool | Controls handling column names in the reorder list that aren't in the data. | +| *keep_others* | bool | Controls handling of columns not in the reorder list. | + +``` + +If *ignore_missing* is true +and items in the reorder list do not exist in the file, the missing columns are ignored. +On the other hand, if *ignore_missing* is false, +a column name in the reorder list that is missing from the data raises a *ValueError*. + +The *keep_others* parameter controls whether columns in the data that +do not appear in the *column_order* list are dropped (*keep_others* is false) or +put at the end in the relative order that they appear in the file (*keep_others* is true). + +BIDS event files are required to have `onset` and `duration` as the first and second columns, respectively. + +(reorder-columns-example-anchor)= +#### Reorder columns example + +The *reorder_columns* operation in the following example specifies that the first four +columns of the dataset should be: `onset`, `duration`, `response_time`, and `trial_type`. +Since *keep_others* is false, these will be the only columns retained. + +````{admonition} A JSON file with a single *reorder_columns* transformation operation. +:class: tip + +```json +[{ + "operation": "reorder_columns", + "description": "Reorder columns.", + "parameters": { + "column_order": ["onset", "duration", "response_time", "trial_type"], + "ignore_missing": true, + "keep_others": false + } +}] +``` +```` + + +The results of executing the previous *reorder_columns* transformation on the +[**sample remodel event file**](sample-remodel-event-file-anchor) are: + +````{admonition} Results of *reorder_columns*. + +| onset | duration | response_time | trial_type | +| ----- | -------- | ---------- | ------------- | +| 0.0776 | 0.5083 | 0.565 | go | +| 5.5774 | 0.5083 | 0.49 | unsuccesful_stop | +| 9.5856 | 0.5084 | 0.45 | go | +| 13.5939 | 0.5083 | n/a | succesful_stop | +| 17.1021 | 0.5083 | 0.633 | unsuccesful_stop | +| 21.6103 | 0.5083 | 0.443 | go | +```` + +(split-rows-anchor)= +### Split rows + +The *split_rows* operation +is often used to convert event files from trial-level encoding to event-level encoding. +This operation is meant only for tabular files that have `onset` and `duration` columns. + +In **trial-level** encoding, all the events in a single trial +(usually some variation of the cue-stimulus-response-feedback-ready sequence) +are represented by a single row in the data file. +Often, the onset corresponds to the presentation of the stimulus, +and the other events are not reported or are implicitly reported. + +In **event-level** encoding, each row represents the temporal marker for a single event. +In this case a trial consists of a sequence of multiple events. + + +(split-rows-parameters-anchor)= +#### Split rows parameters + +```{admonition} Parameters for the *split_rows* operation. +:class: tip + +| Parameter | Type | Description | +| ------------ | ---- | ----------- | +| *anchor_column* | str | The name of the column that will be used for split_rows codes.| +| *new_events* | dict | Dictionary whose keys are the codes to be inserted as new events
in the *anchor_column* and whose values are dictionaries with
keys *onset_source*, *duration*, and *copy_columns (**Optional**)*. | +| *remove_parent_event* | bool | If true, remove parent event. | + +``` + +The *split_rows* operation requires an *anchor_column*, which could be an existing +column or a new column to be appended to the data. +The purpose of the *anchor_column* is to hold the codes for the new events. + +The *new_events* dictionary has the new events to be created. +The keys are the new event codes to be inserted into the *anchor_column*. +The values in *new_events* are themselves dictionaries. +Each of these dictionaries has three keys: + +- *onset_source* is a list of items to be added to the *onset* +of the event row being split to produce the `onset` column value for the new event. These items can be any combination of numerical values and column names. +- *duration* a list of numerical values and/or column names whose values are to be added +to compute the `duration` column value for the new event. +- *copy_columns* a list of column names whose values should be copied into each new event. +Unlisted columns are filled with `n/a`. + + +The *split_rows* operation sorts the split rows by the `onset` column and raises a `TypeError` +if the `onset` and `duration` are improperly defined. +The `onset` column is converted to numeric values as part splitting process. + +(split-rows-example-anchor)= +#### Split rows example + +The *split_rows* operation in the following example specifies that new rows should be added +to encode the response and stop signal. The anchor column is `trial_type`. + + +````{admonition} A JSON file with a single *split_rows* transformation operation. +:class: tip + +```json +[{ + "operation": "split_rows", + "description": "add response events to the trials.", + "parameters": { + "anchor_column": "trial_type", + "new_events": { + "response": { + "onset_source": ["response_time"], + "duration": [0], + "copy_columns": ["response_accuracy", "response_hand", "sex", "trial_number"] + }, + "stop_signal": { + "onset_source": ["stop_signal_delay"], + "duration": [0.5], + "copy_columns": ["trial_number"] + } + }, + "remove_parent_event": false + } + }] +``` +```` + +The results of executing this *split_rows* operation on the +[**sample remodel event file**](sample-remodel-event-file-anchor) are: + +````{admonition} Results of the previous *split_rows* operation. + +| onset | duration | trial_type | stop_signal_delay | response_time | response_accuracy | response_hand | sex | +| ----- | -------- | ---------- | ----------------- | ------------- | ----------------- | ------------- | --- | +| 0.0776 | 0.5083 | go | n/a | 0.565 | correct | right | female | +| 0.6426 | 0 | response | n/a | n/a | correct | right | female | +| 5.5774 | 0.5083 | unsuccesful_stop | 0.2 | 0.49 | correct | right | female | +| 5.7774 | 0.5 | stop_signal | n/a | n/a | n/a | n/a | n/a | +| 6.0674 | 0 | response | n/a | n/a | correct | right | female | +| 9.5856 | 0.5084 | go | n/a | 0.45 | correct | right | female | +| 10.0356 | 0 | response | n/a | n/a | correct | right | female | +| 13.5939 | 0.5083 | succesful_stop | 0.2 | n/a | n/a | n/a | female | +| 13.7939 | 0.5 | stop_signal | n/a | n/a | n/a | n/a | n/a | +| 17.1021 | 0.5083 | unsuccesful_stop | 0.25 | 0.633 | correct | left | male | +| 17.3521 | 0.5 | stop_signal | n/a | n/a | n/a | n/a | n/a | +| 17.7351 | 0 | response | n/a | n/a | correct | left | male | +| 21.6103 | 0.5083 | go | n/a | 0.443 | correct | left | male | +| 22.0533 | 0 | response | n/a | n/a | correct | left | male | +```` + +In a full processing example, it might make sense to rename `trial_type` to be +`event_type` and to delete the `response_time` and the `stop_signal_delay` columns, +since these items have been unfolded into separate events. +This could be accomplished in subsequent clean-up operations. + +(remodel-summarizations-anchor)= +## Remodel summarizations + +Summarizations differ transformations in two respects: they do not modify the input data file, +and they keep information about the results from each file that has been processed. +Summarization operations may be used at several points in the operation list as checkpoints +during debugging as well as for their more typical informational uses. + +All summary operations have two required parameters: *summary_name* and *summary_filename*. + +The *summary_name* is the unique key used to identify the +particular incarnation of this summary in the dispatcher. +Care should be taken to make sure that the *summary_name* is unique within +a given JSON remodeling file if the same summary operation is used more than +once within the file (e.g. for before and after summary information). + +The *summary_filename* should also be unique and is used for saving the summary upon request. +When the remodeler is applied to full datasets rather than single files, +the summaries are saved in the `derivatives/remodel/summaries` directory under the dataset root. +A time stamp and file extension are appended to the *summary_filename* when the +summary is saved. + +(summarize-column-names-anchor)= +### Summarize column names + +The *summarize_column_names* tracks the unique column name patterns found in data files across +the dataset and which files have these column names. +This summary is useful for determining whether there are any non-conforming data files. + +Often event files associated with different tasks have different column names, +and this summary can be used to verify that the files corresponding to the same task +have the same column names. + +A more problematic issue is when some event files for the same task +have reordered column names or use different column names. + +(summarize-columns-names-parameters-anchor)= +#### Summarize column names parameters + +The *summarize_column_names* operation has no parameters and only requires the +*summary_name* and the *summary_filename* to specify the operation. + +The *summarize_column_names* operation only has the two parameters required of +all summaries. + +```{admonition} Parameters for the *summarize_column_names* operation. +:class: tip + +| Parameter | Type | Description | +| ------------ | ---- | ----------- | +| *summary_name* | str | A unique name used to identify this summary.| +| *summary_filename* | str | A unique file basename to use for saving this summary. | +| *append_timecode* | bool | (**Optional**: Default false) If true, append a time code to filename. | +``` + +(summarize-column-names-example-anchor)= +#### Summarize column names example + +The following example remodeling file produces a summary, which when saved +will appear with file name `AOMIC_column_names_xxx.txt` or +`AOMIC_column_names_xxx.json` where `xxx` is a timestamp. + +````{admonition} A JSON file with a single *summarize_column_names* summarization operation. +:class: tip +```json +[{ + "operation": "summarize_column_names", + "description": "Summarize column names.", + "parameters": { + "summary_name": "AOMIC_column_names", + "summary_filename": "AOMIC_column_names" + } +}] +``` +```` + +When this operation is applied to the [**sample remodel event file**](sample-remodel-event-file-anchor), +the following text summary is produced. + +````{admonition} Result of applying *summarize_column_names* to the sample remodel file. +:class: tip + +```text + +Summary name: AOMIC_column_names +Summary type: column_names +Summary filename: AOMIC_column_names + +Summary details: + +Dataset: Number of files=1 + Columns: ['onset', 'duration', 'trial_type', 'stop_signal_delay', 'response_time', 'response_accuracy', 'response_hand', 'sex'] + sub-0013_task-stopsignal_acq-seq_events.tsv + +Individual files: + +sub-0013_task-stopsignal_acq-seq_events.tsv: + ['onset', 'duration', 'trial_type', 'stop_signal_delay', 'response_time', 'response_accuracy', 'response_hand', 'sex'] + +``` +```` + +Since we are only summarizing one event file, there is only one unique pattern -- corresponding +to the columns: *onset*, *duration*, *trial_type*, *stop_signal_delay*, *response_time*, *response_accuracy*, *response_hand*, and *response_time*. + +When the dataset has multiple column name patterns, the summary lists unique pattern separately along +with the names of the data files that have this pattern. + +The JSON version of the summary is useful for programmatic manipulation, +while the text version shown above is more readable. + + +(summarize-column-values-anchor)= +### Summarize column values + +The summarize column values operation provides a summary of the number of times various +column values appear in event files across the dataset. + + +(summarize-columns-values-parameters-anchor)= +#### Summarize column values parameters + +The following table lists the parameters required for using the summary. + +```{admonition} Parameters for the *summarize_column_values* operation. +:class: tip + +| Parameter | Type | Description | +| ------------ | ---- | ----------- | +| *summary_name* | str | A unique name used to identify this summary.| +| *summary_filename* | str | A unique file basename to use for saving this summary. | +| *append_timecode* | bool | (**Optional**: Default false) If True, append a time code to filename. | +| *max_categorical* | int | (**Optional**: Default 50) If given, the text summary shows top *max_categorical* values.
Otherwise the text summary displays all categorical values.| +| *skip_columns* | list | (**Optional**) A list of column names to omit from the summary.| +| *value_columns* | list | (**Optional**) A list of columns to omit the listing unique values. | +| *values_per_line* | int | (**Optional**: Default 5) If given, the text summary displays this
number of values per line (default is 5).| + +``` + +In addition to the standard parameters, *summary_name* and *summary_filename* required of all summaries, +the *summarize_column_values* operation requires two additional lists to be supplied. +The *skip_columns* list specifies the names of columns to skip entirely in the summary. +Typically, the `onset`, `duration`, and `sample` columns are skipped, since they have unique values for +each row and their values have limited information. + +The *summarize_column_values* is mainly meant for creating summary information about columns +containing a finite number of distinct values. +Columns that contain numeric information will usually have distinct entries for +each row in a tabular file and are not amenable to such summarization. +These columns could be specified as *skip_columns*, but another option is to +designate them as *value_columns*. The *value_columns* are reported in the summary, +but their distinct values are not reported individually. + +For datasets that include multiple tasks, the event values for each task may be distinct. +The *summarize_column_values* operation does not separate by task, but expects the +calling programs filter the files by task as desired. +The `run_remodel` program supports selecting files corresponding to a particular task. + +Two additional optional parameters are available for specifying aspects of the text summary output. +The *max_categorical* optional parameter specifies how many unique values should be displayed +for each column. The *values_per_line* controls how many categorical column values (with counts) +are displayed on each line of the output. By default, 5 values are displayed. + +(summarize-column-values-example-anchor)= +#### Summarize column values example + +The following example shows the JSON for including this operation in a remodeling file. + +````{admonition} A JSON file with a single *summarize_column_values* summarization operation. +:class: tip +```json +[{ + "operation": "summarize_column_values", + "description": "Summarize the column values in an excerpt.", + "parameters": { + "summary_name": "AOMIC_column_values", + "summary_filename": "AOMIC_column_values", + "skip_columns": ["onset", "duration"], + "value_columns": ["response_time", "stop_signal_delay"] + } +}] +``` +```` + +A text format summary of the results of executing this operation on the +[**sample remodel event file**](sample-remodel-event-file-anchor) +is shown in the following example. + +````{admonition} Sample *summarize_column_values* operation results in text format. +:class: tip +```text +Summary name: AOMIC_column_values +Summary type: column_values +Summary filename: AOMIC_column_values + +Overall summary: +Dataset: Total events=6 Total files=1 + Categorical column values[Events, Files]: + response_accuracy: + correct[5, 1] n/a[1, 1] + response_hand: + left[2, 1] n/a[1, 1] right[3, 1] + sex: + female[4, 1] male[2, 1] + trial_type: + go[3, 1] succesful_stop[1, 1] unsuccesful_stop[2, 1] + Value columns[Events, Files]: + response_time[6, 1] + stop_signal_delay[6, 1] + +Individual files: + +sub-0013_task-stopsignal_acq-seq_events.tsv: +Total events=200 + Categorical column values[Events, Files]: + response_accuracy: + correct[5, 1] n/a[1, 1] + response_hand: + left[2, 1] n/a[1, 1] right[3, 1] + sex: + female[4, 1] male[2, 1] + trial_type: + go[3, 1] succesful_stop[1, 1] unsuccesful_stop[2, 1] + Value columns[Events, Files]: + response_time[6, 1] + stop_signal_delay[6, 1] +``` +```` + +Because the [**sample remodel event file**](sample-remodel-event-file-anchor) +only has 6 events, we expect that no value will be represented in more than 6 events. +The column names corresponding to value columns just have the event counts in them. + +This command was executed with the `-i` option in `run_remodel`, +results from the individual data files are shown after the overall summary. +The individual results are similar to the overall summary because only one data file +was processed. + +For a more extensive example see the +[**text**](./_static/data/summaries/FacePerception_column_values_summary.txt) +and [**JSON**](./_static/data/summaries/FacePerception_column_values_summary.json) +format summaries of the sample dataset +[**ds003645s_hed**](https://github.com/hed-standard/hed-examples/tree/main/datasets/eeg_ds003645s_hed) +using the [**summarize_columns_rmdl.json**](./_static/data/summaries/summarize_columns_rmdl.json) +remodeling file. + + +(summarize-definitions-anchor)= +### Summarize definitions + +The summarize definitions operation provides a summary of the `Def-expand` tags found across the dataset, +nothing any ambiguous or erroneous ones. If working on a BIDS dataset, it will initialize with the known definitions +from the sidecar, reporting any deviations from the known definitions as errors. + +(summarize-definitions-parameters-anchor)= +#### Summarize definitions parameters + +**NOTE: This summary is still under development** +The following table lists the parameters required for using the summary. + +```{admonition} Parameters for the *summarize_definitions* operation. +:class: tip + +| Parameter | Type | Description | +| ------------ | ---- | ----------- | +| *summary_name* | str | A unique name used to identify this summary.| +| *summary_filename* | str | A unique file basename to use for saving this summary. | +| *append_timecode* | bool | (**Optional**: Default false) If true, append a time code to filename. | +``` + +The *summarize_definitions* is mainly meant for verifying consistency in unknown `Def-expand` tags. +This comes up where you have an assembled dataset, but no longer have the definitions stored (or never created them to begin with). + + +(summarize-definitions-example-anchor)= +#### Summarize definitions example + +The following example shows the JSON for including this operation in a remodeling file. + +````{admonition} A JSON file with a single *summarize_definitions* summarization operation. +:class: tip +```json +[{ + "operation": "summarize_definitions", + "description": "Summarize the definitions used in this dataset.", + "parameters": { + "summary_name": "HED_column_definition_summary", + "summary_filename": "HED_column_definition_summary" + } +}] +``` +```` + +A text format summary of the results of executing this operation on the +[**sub-003_task-FacePerception_run-3_events.tsv**](_static/data/sub-003_task-FacePerception_run-3_events.tsv) file +of the [**eeg_ds_003645s_hed_column**](https://github.com/hed-standard/hed-examples/tree/main/datasets/eeg_ds003645s_hed_column) dataset is shown in the following example. + +````{admonition} Sample *summarize_definitions* operation results in text format. +:class: tip +```text +Summary name: HED_column_definition_summary +Summary type: definitions +Summary filename: HED_column_definition_summary + +Overall summary: + Known Definitions: 17 items + cross-only: 2 items + description: A white fixation cross on a black background in the center of the screen. + contents: (Visual-presentation,(Background-view,Black),(Foreground-view,(Center-of,Computer-screen),(Cross,White))) + face-image: 2 items + description: A happy or neutral face in frontal or three-quarters frontal pose with long hair cropped presented as an achromatic foreground image on a black background with a white fixation cross superposed. + contents: (Visual-presentation,(Background-view,Black),(Foreground-view,((Center-of,Computer-screen),(Cross,White)),(Grayscale,(Face,Hair,Image)))) + circle-only: 2 items + description: A white circle on a black background in the center of the screen. + contents: (Visual-presentation,(Background-view,Black),(Foreground-view,((Center-of,Computer-screen),(Circle,White)))) + press-left-finger: 2 items + description: The participant presses a key with the left index finger to indicate a face symmetry judgment. + contents: ((Index-finger,(Experiment-participant,Left-side-of)),(Keyboard-key,Press)) + press-right-finger: 2 items + description: The participant presses a key with the right index finger to indicate a face symmetry evaluation. + contents: ((Index-finger,(Experiment-participant,Right-side-of)),(Keyboard-key,Press)) + famous-face-cond: 2 items + description: A face that should be recognized by the participants + contents: (Condition-variable/Face-type,(Image,(Face,Famous))) + unfamiliar-face-cond: 2 items + description: A face that should not be recognized by the participants. + contents: (Condition-variable/Face-type,(Image,(Face,Unfamiliar))) + scrambled-face-cond: 2 items + description: A scrambled face image generated by taking face 2D FFT. + contents: (Condition-variable/Face-type,(Image,(Disordered,Face))) + first-show-cond: 2 items + description: Factor level indicating the first display of this face. + contents: ((Condition-variable/Repetition-type,Item-interval/0,(Face,Item-count/1))) + immediate-repeat-cond: 2 items + description: Factor level indicating this face was the same as previous one. + contents: ((Condition-variable/Repetition-type,Item-interval/1,(Face,Item-count/2))) + delayed-repeat-cond: 2 items + description: Factor level indicating face was seen 5 to 15 trials ago. + contents: (Condition-variable/Repetition-type,(Face,Item-count/2),(Item-interval,(Greater-than-or-equal-to,Item-interval/5))) + left-sym-cond: 2 items + description: Left index finger key press indicates a face with above average symmetry. + contents: (Condition-variable/Key-assignment,((Asymmetrical,Behavioral-evidence),(Index-finger,(Experiment-participant,Right-side-of))),((Behavioral-evidence,Symmetrical),(Index-finger,(Experiment-participant,Left-side-of)))) + right-sym-cond: 2 items + description: Right index finger key press indicates a face with above average symmetry. + contents: (Condition-variable/Key-assignment,((Asymmetrical,Behavioral-evidence),(Index-finger,(Experiment-participant,Left-side-of))),((Behavioral-evidence,Symmetrical),(Index-finger,(Experiment-participant,Right-side-of)))) + face-symmetry-evaluation-task: 2 items + description: Evaluate degree of image symmetry and respond with key press evaluation. + contents: (Experiment-participant,Task,(Discriminate,(Face,Symmetrical)),(Face,See),(Keyboard-key,Press)) + blink-inhibition-task: 2 items + description: Do not blink while the face image is displayed. + contents: (Experiment-participant,Inhibit-blinks,Task) + fixation-task: 2 items + description: Fixate on the cross at the screen center. + contents: (Experiment-participant,Task,(Cross,Fixate)) + initialize-recording: 2 items + description: + contents: (Recording) + Ambiguous Definitions: 0 items + + Errors: 0 items +``` +```` + +Since this file didn't have any ambiguous or incorrect `Def-expand` groups, those sections are empty. +Ambiguous definitions are those that take a placeholder, but it doesn't have enough information +to be sure to which tag the placeholder applies. +Erroneous ones are ones with conflicting expanded forms. + +Currently, summaries are not generated for individual files, +but this is likely to change in the future. + +Below is a simple example showing the format when erroneous or ambiguous definitions are found. + +````{admonition} Sample input for *summarize_definitions* operation documenting ambiguous/erroneous definitions. +:class: tip +```text +((Def-expand/Initialize-recording,(Recording)),Onset) +((Def-expand/Initialize-recording,(Recording, Event)),Onset) +(Def-expand/Specify-age/1,(Age/1, Item-count/1)) +``` +```` + +````{admonition} Sample *summarize_definitions* operation error results in text format. +:class: tip +```text +Summary name: HED_column_definition_summary +Summary type: definitions +Summary filename: HED_column_definition_summary + +Overall summary: + Known Definitions: 1 items + initialize-recording: 2 items + description: + contents: (Recording) + Ambiguous Definitions: 1 items + specify-age/#: (Age/#,Item-count/#) + Errors: 1 items + initialize-recording: + (Event,Recording) +``` +```` + +It is assumed the first definition encountered is the correct definition, unless the first one is ambiguous. +Thus, it finds (`Def-expand/Initialize-recording`,(`Recording`) and considers it valid, before encountering +(`Def-expand/Initialize-recording`,(`Recording`, `Event`)), which is now deemed an error. + + +(summarize-hed-tags-anchor)= +### Summarize HED tags + +The *summarize_hed_tags* operation extracts a summary of the HED tags present +in the annotations of a dataset. +This summary operation assumes that the structure in question is suitably +annotated with HED (Hierarchical Event Descriptors). +You must provide a HED schema version. +If the data has annotations in a JSON sidecar, you must also provide its path. + +(summarize-hed-tags-parameters-anchor)= +#### Summarize HED tags parameters + +The *summarize_hed_tags* operation has the two required parameters +(*tags* and *expand_context*) in addition to the standard *summary_name* and *summary_filename* parameters. + +```{admonition} Parameters for the *summarize_hed_tags* operation. +:class: tip + +| Parameter | Type | Description | +| ------------ | ---- | ----------- | +| *summary_name* | str | A unique name used to identify this summary.| +| *summary_filename* | str | A unique file basename to use for saving this summary. | +| *tags* | dict | Dictionary with category title keys and tags in that category as values. | +| *append_timecode* | bool | (**Optional**: Default false) If true, append a time code to filename. | +| *include_context* | bool | (**Optional**: Default true) If true, expand the event context to
account for onsets and offsets. | +| *remove_types* | list | (**Optional**) A list of types such as
`Condition-variable` and `Task` to remove. | +| *replace_defs* | bool | (**Optional**: Default true) If true, the `Def` tags are replaced with the
contents of the definition (no `Def` or `Def-expand`). | +| *word_cloud* | dict | (**Optional**) If present, the operation produces a
word cloud image in addition to the summaries. | +``` + +The *tags* dictionary has keys that specify how the user wishes the tags +to be categorized for display. +Note that these keys are titles designating display categories, not HED tags. + +The *tags* dictionary values are lists of actual HED tags (or their children) +that should be listed under the respective display categories. + +If the optional parameter *include_context* is true, the counts include tags contributing +to the event context in events intermediate between onsets and offsets. + +If the optional parameter *replace_defs* is true, the tag counts include +tags contributed by contents of the definitions. + +If *word_cloud* parameter is provided but its value is empty, the default word cloud settings are used. +The following table lists the optional parameters used to control the appearance of the word cloud image. + +```{admonition} Optional keys in the word cloud dictionary value. +:class: tip + +| Parameter | Type | Description | +| ------------ | ---- | ----------- | +| *background_color* | str | The matplotlib name of the background color (default "black").| +| *contour_color* | str | The matplotlib name of the contour color if mask provided. | +| *contour_width* | float | Width of contour if mask provided (default 3). | +| *font_path* | str | The path of the system font to use in place of the default font. | +| *height* | int | Height in pixels of the image (default 300).| +| *mask_path* | str | The path of the mask image to use if *use_mask* is true
and an image other than the brain is needed. | +| *max_font_size* | float | The maximum font size to use in the image (default 15). | +| *min_font_size* | float | The minimum font size to use in the image (default 8).| +| *prefer_horizontal* | float | Fraction of horizontal words in image (default 0.75). | +| *scale_adjustment* | float | Constant to add to log10 count transformation (default 7). | +| *use_mask* | dict | If true, a mask image is used to provide a contour around the words. | +| *width* | int | Width in pixels of image (default 400). | +``` + +(summarize-hed-tags-example-anchor)= +#### Summarize HED tags example + +The following remodeling command specifies that the tag counts should be grouped +under the titles: *Sensory events*, *Agent actions*, and *Objects*. +Any leftover tags will appear under the title "Other tags". + +````{admonition} A JSON file with a single *summarize_hed_tags* summarization operation. +:class: tip +```json +[{ + "operation": "summarize_hed_tags", + "description": "Summarize the HED tags in the dataset.", + "parameters": { + "summary_name": "summarize_hed_tags", + "summary_filename": "summarize_hed_tags", + "tags": { + "Sensory events": ["Sensory-event", "Sensory-presentation", + "Task-stimulus-role", "Experimental-stimulus"], + "Agent actions": ["Agent-action", "Agent", "Action", "Agent-task-role", + "Task-action-type", "Participant-response"], + "Objects": ["Item"] + } + } +}] +``` +```` + +The results of executing this operation on the +[**sample remodel event file**](sample-remodel-event-file-anchor) are shown below. + +````{admonition} Text summary of *summarize_hed_tags* operation on the sample remodel file. +:class: tip + +```text +Summary name: summarize_hed_tags +Summary type: hed_tag_summary +Summary filename: summarize_hed_tags + +Overall summary: +Dataset: Total events=1200 Total1 file=6 + Main tags[events,files]: + Sensory events: + Sensory-presentation[6,1] Visual-presentation[6,1] Auditory-presentation[3,1] + Agent actions: + Incorrect-action[2,1] Correct-action[1,1] + Objects: + Image[6,1] + Other tags[events,files]: + Label[6,1] Def[6,1] Delay[3,1] + +Individual files: + +aomic_sub-0013_excerpt_events.tsv: +Total events=6 + Main tags[events,files]: + Sensory events: + Sensory-presentation[6,1] Visual-presentation[6,1] Auditory-presentation[3,1] + Agent actions: + Incorrect-action[2,1] Correct-action[1,1] + Objects: + Image[6,1] + Other tags[events,files]: + Label[6,1] Def[6,1] Delay[3,1] + +``` +```` + +The HED tag *Task-action-type* was specified in the "Agent actions" category, +*Incorrect-action* and *Correct-action*, which are children of *Task-action-type* +in the [**HED schema**](https://www.hedtags.org/display_hed.html), +will appear with counts in the list under this category. + +The sample events file had 6 events, including 1 correct action and 2 incorrect actions. +Since only one file was processed, the information for *Dataset* was +similar to that presented under *Individual files*. + +For a more extensive example, see the +[**text**](./_static/data/summaries/FacePerception_hed_tag_summary.txt) +and [**JSON**](./_static/data/summaries/FacePerception_hed_tag_summary.json) +format summaries of the sample dataset +[**ds003645s_hed**](https://github.com/hed-standard/hed-examples/tree/main/datasets/eeg_ds003645s_hed) +using the [**summarize_hed_tags_rmdl.json**](./_static/data/summaries/summarize_hed_tags_rmdl.json) +remodeling file. + +(summarize-hed-type-anchor)= +### Summarize HED type + +The *summarize_hed_type* operation is designed to extract experimental design matrices or other +experimental structure. +This summary operation assumes that the structure in question is suitably +annotated with HED (Hierarchical Event Descriptors). +The [**HED conditions and design matrices**](https://hed-examples.readthedocs.io/en/latest/HedConditionsAndDesignMatrices.html) +explains how this works. + +(summarize-hed-type-parameters-anchor)= +#### Summarize HED type parameters + +The *summarize_hed_type* operation provides detailed information about a specified tag, +usually `Condition-variable` or `Task`. +This summary provides useful information about experimental design. + +```{admonition} Parameters for the *summarize_hed_type* operation. +:class: tip + +| Parameter | Type | Description | +| ------------ | ---- | ----------- | +| *summary_name* | str | A unique name used to identify this summary.| +| *summary_filename* | str | A unique file basename to use for saving this summary. | +| *type_tag* | str | Tag to produce a summary for (most often *condition-variable*).| +| *append_timecode* | bool | (**Optional**: Default false) If true, append a time code to filename.| +``` +In addition to the two standard parameters (*summary_name* and *summary_filename*), +the *type_tag* parameter is required. +Only one tag can be given, so you must provide a separate operations in the remodel file +for multiple type tags. + +(summarize-hed-type-example-anchor)= +#### Summarize HED type example + +````{admonition} A JSON file with a single *summarize_hed_type* summarization operation. +:class: tip +```json +[{ + "operation": "summarize_hed_type", + "description": "Summarize column names.", + "parameters": { + "summary_name": "AOMIC_condition_variables", + "summary_filename": "AOMIC_condition_variables", + "type_tag": "condition-variable" + } +}] +``` +```` + +The results of executing this operation on the +[**sample remodel event file**](sample-remodel-event-file-anchor) are shown below. + +````{admonition} Text summary of *summarize_hed_types* operation on the sample remodel file. +:class: tip + +```text +Summary name: AOMIC_condition_variables +Summary type: hed_type_summary +Summary filename: AOMIC_condition_variables + +Overall summary: + +Dataset: Type=condition-variable Type values=1 Total events=6 Total files=1 + image-sex: 2 levels in 6 event(s)s out of 6 total events in 1 file(s) + female-image-cond [4,1]: ['Female', 'Image', 'Face'] + male-image-cond [2,1]: ['Male', 'Image', 'Face'] + +Individual files: + +aomic_sub-0013_excerpt_events.tsv: +Type=condition-variable Total events=6 + image-sex: 2 levels in 6 events + female-image-cond [4 events, 1 files]: + Tags: ['Female', 'Image', 'Face'] + male-image-cond [2 events, 1 files]: + Tags: ['Male', 'Image', 'Face'] +``` +```` + +Because *summarize_hed_type* is a HED operation, +a HED schema version is required and a JSON sidecar is also usually needed. +This summary was produced by using `hed_version="8.1.0"` when creating the `dispatcher` +and using the [**sample remodel sidecar file**](sample-remodel-sidecar-file-anchor) in the `do_op`. +The sidecar provides the annotations that use the `condition-variable` tag in the summary. + +For a more extensive example, see the +[**text**](./_static/data/summaries/FacePerception_hed_type_summary.txt) +and [**JSON**](./_static/data/summaries/FacePerception_hed_type_summary.json) +format summaries of the sample dataset +[**ds003645s_hed**](https://github.com/hed-standard/hed-examples/tree/main/datasets/eeg_ds003645s_hed) +using the [**summarize_hed_types_rmdl.json**](./_static/data/summaries/summarize_hed_types_rmdl.json) +remodeling file. + +(summarize-hed-validation-anchor)= +### Summarize HED validation + +The *summarize_hed_validation* operation runs the HED validator on the requested data +and produces a summary of the errors. +See the [**HED validation guide**](./HedValidationGuide.md) for available methods of +running the HED validator. + + +(summarize-hed-validation-parameters-anchor)= +#### Summarize HED validation parameters + +In addition to the required *summary_name* and *summary_filename* parameters, +the *summarize_hed_validation* operation has a required boolean parameter *check_for_warnings*. +If *check_for_warnings* is false, the summary will not report warnings. + +```{admonition} Parameters for the *summarize_hed_validation* operation. +:class: tip + +| Parameter | Type | Description | +| ------------ | ---- | ----------- | +| *summary_name* | str | A unique name used to identify this summary.| +| *summary_filename* | str | A unique file basename to use for saving this summary. | +| *append_timecode* | bool | (**Optional**: Default false) If true, append a time code to filename. | +| *check_for_warnings* | bool | (**Optional**: Default false) If true, warnings are reported in addition to errors. | +``` +The *summarize_hed_validation* is a HED operation and the calling program must provide a HED schema version +and usually a JSON sidecar containing the HED annotations. + +The validation process takes place in two stages: first the JSON sidecar is validated. +This strategy is used because a single error in the JSON sidecar can generate an error message +for every line in the corresponding data file. + +If the JSON sidecar has errors (warnings don't count), the validation process is terminated +without validation of the data file and assembled HED annotations. + +If the JSON sidecar does not have errors, +the validator assembles the annotations for each line in the data files and validates +the assembled HED annotation. +Data file-wide consistency, such as matched onsets and offsets, is also checked. + + +(summarize-hed-validation-example-anchor)= +#### Summarize HED validation example + +````{admonition} A JSON file with a single *summarize_hed_validation* summarization operation. +:class: tip +```json +[{ + "operation": "summarize_hed_validation", + "description": "Summarize validation errors in the sample dataset.", + "parameters": { + "summary_name": "AOMIC_sample_validation", + "summary_filename": "AOMIC_sample_validation", + "check_for_warnings": true + } +}] +``` +```` + +To demonstrate the output of the validation operation, we modified the first row of the +[**sample remodel event file**](sample-remodel-event-file-anchor) +so that `trial_type` column contained the value `baloney` rather than `go`. +This modification generates a warning because the meaning of `baloney` is not defined +in the [**sample remodel sidecar file**](sample-remodel-sidecar-file-anchor). +The results of executing the example operation with the modified file are shown +in the following example. + + +````{admonition} Text summary of *summarize_hed_validation* operation on a modified sample data file. +:class: tip + +```text +Summary name: AOMIC_sample_validation +Summary type: hed_validation +Summary filename: AOMIC_sample_validation + +Summary details: + +Dataset: [1 sidecar files, 1 event files] + task-stopsignal_acq-seq_events.json: 0 issues + sub-0013_task-stopsignal_acq-seq_events.tsv: 6 issues + +Individual files: + + sub-0013_task-stopsignal_acq-seq_events.tsv: 1 sidecar files + task-stopsignal_acq-seq_events.json has no issues + sub-0013_task-stopsignal_acq-seq_events.tsv issues: + HED_UNKNOWN_COLUMN: WARNING: Column named 'onset' found in file, but not specified as a tag column or identified in sidecars. + HED_UNKNOWN_COLUMN: WARNING: Column named 'duration' found in file, but not specified as a tag column or identified in sidecars. + HED_UNKNOWN_COLUMN: WARNING: Column named 'response_time' found in file, but not specified as a tag column or identified in sidecars. + HED_UNKNOWN_COLUMN: WARNING: Column named 'response_accuracy' found in file, but not specified as a tag column or identified in sidecars. + HED_UNKNOWN_COLUMN: WARNING: Column named 'response_hand' found in file, but not specified as a tag column or identified in sidecars. + HED_SIDECAR_KEY_MISSING[row=0,column=2]: WARNING: Category key 'baloney' does not exist in column. Valid keys are: ['succesful_stop', 'unsuccesful_stop', 'go'] + +``` +```` + +This summary was produced using HED schema version `hed_version="8.1.0"` when creating the `dispatcher` +and using the [**sample remodel sidecar file**](sample-remodel-sidecar-file-anchor) in the `do_op`. + + +(summarize-sidecar-from-events-anchor)= +### Summarize sidecar from events + +The summarize sidecar from events operation generates a sidecar template from the event +files in the dataset. + + +(summarize-sidecar-from-events-parameters-anchor)= +#### Summarize sidecar from events parameters + +The following table lists the parameters required for using the summary. + +```{admonition} Parameters for the *summarize_sidcar_from_events* operation. +:class: tip + +| Parameter | Type | Description | +| ------------ | ---- | ----------- | +| *summary_name* | str | A unique name used to identify this summary.| +| *summary_filename* | str | A unique file basename to use for saving this summary. | +| *skip_columns* | list | A list of column names to omit from the sidecar.| +| *value_columns* | list | A list of columns to treat as value columns in the sidecar. | +| *append_timecode* | bool | (Optional) If True, append a time code to filename.
False is the default. | +``` +The standard summary parameters, *summary_name* and *summary_filename* are required. +The *summary_name* is the unique key used to identify the +particular incarnation of this summary in the dispatcher. +Since a particular operation file may use a given operation multiple times, +care should be taken to make sure that it is unique. + +The *summary_filename* should also be unique and is used for saving the summary upon request. +When the remodeler is applied to full datasets rather than single files, +the summaries are saved in the `derivatives/remodel/summaries` directory under the dataset root. +A time stamp and file extension are appended to the *summary_filename* when the +summary is saved. + +In addition to the standard parameters, *summary_name* and *summary_filename* required of all summaries, +the *summarize_sidecar_from_events* operation requires two additional lists to be supplied. +The *skip_columns* list specifies the names of columns to skip entirely in +generating the sidecar template. +The *value_columns* list specifies the names of columns to treat as value columns +when generating the sidecar template. + +(summarize-sidecar-from-events-example-anchor)= +#### Summarize sidecar from events example + +The following example shows the JSON for including this operation in a remodeling file. + +````{admonition} A JSON file with a single *summarize_sidecar_from_events* summarization operation. +:class: tip +```json +[{ + "operation": "summarize_sidecar_from_events", + "description": "Generate a sidecar from the excerpted events file.", + "parameters": { + "summary_name": "AOMIC_generate_sidecar", + "summary_filename": "AOMIC_generate_sidecar", + "skip_columns": ["onset", "duration"], + "value_columns": ["response_time", "stop_signal_delay"] + } +}] + +``` +```` + +The results of executing this operation on the +[**sample remodel event file**](sample-remodel-event-file-anchor) +are shown in the following example using the text format. + +````{admonition} Sample *summarize_sidecar_from_events* operation results in text format. +:class: tip +```text +Summary name: AOMIC_generate_sidecar +Summary type: events_to_sidecar +Summary filename: AOMIC_generate_sidecar + +Dataset: Currently no overall sidecar extraction is available + +Individual files: + +aomic_sub-0013_excerpt_events.tsv: Total events=6 Skip columns: ['onset', 'duration'] +Sidecar: +{ + "trial_type": { + "Description": "Description for trial_type", + "HED": { + "go": "(Label/trial_type, Label/go)", + "succesful_stop": "(Label/trial_type, Label/succesful_stop)", + "unsuccesful_stop": "(Label/trial_type, Label/unsuccesful_stop)" + }, + "Levels": { + "go": "Here describe column value go of column trial_type", + "succesful_stop": "Here describe column value succesful_stop of column trial_type", + "unsuccesful_stop": "Here describe column value unsuccesful_stop of column trial_type" + } + }, + "response_accuracy": { + "Description": "Description for response_accuracy", + "HED": { + "correct": "(Label/response_accuracy, Label/correct)" + }, + "Levels": { + "correct": "Here describe column value correct of column response_accuracy" + } + }, + "response_hand": { + "Description": "Description for response_hand", + "HED": { + "left": "(Label/response_hand, Label/left)", + "right": "(Label/response_hand, Label/right)" + }, + "Levels": { + "left": "Here describe column value left of column response_hand", + "right": "Here describe column value right of column response_hand" + } + }, + "sex": { + "Description": "Description for sex", + "HED": { + "female": "(Label/sex, Label/female)", + "male": "(Label/sex, Label/male)" + }, + "Levels": { + "female": "Here describe column value female of column sex", + "male": "Here describe column value male of column sex" + } + }, + "response_time": { + "Description": "Description for response_time", + "HED": "(Label/response_time, Label/#)" + }, + "stop_signal_delay": { + "Description": "Description for stop_signal_delay", + "HED": "(Label/stop_signal_delay, Label/#)" + } +} +``` +```` + +(remodel-implementation-anchor)= +## Remodel implementation + +Operations are defined as classes that extend `BaseOp` regardless of whether +they are transformations or summaries. However, summaries must also implement +an additional supporting class that extends `BaseSummary` to hold the summary information. + +In order to be executed by the remodeling functions, +an operation must appear in the `valid_operations` dictionary. + +Each operation class must have a `NAME` class variable, specifying the operation name (string) and a +`PARAMS` class variable containing a dictionary of the operation's parameters represented as a json schema. +The operation's constructor extends the `BaseOp` class constructor by calling: + +````{admonition} A remodel operation class must call the BaseOp constructor first. +:class: tip +```python + super().__init__(parameters) +``` +```` + +A remodel operation class must implement the `BaseOp` abstract methods `do_ops` and `validate_input_data`. + +### The PARAMS dictionary + +The class-wide `PARAMS` dictionary specifies the required and optional parameters of the operation as a [**JSON schema**](https://json-schema.org/). +We currently use draft-2020-12. +The basic vocabulary allows specifying the type of parameters that are expected and +whether a parameter is required or optional. + +It is also possible to add dependencies between parameters. More information can be found in the JSON schema +[**documentation**](https://json-schema.org/learn/getting-started-step-by-step). + +On the highest level the type should always be specified as an object, as the parameters are always provided as a dictionary or json object. +Under the properties key, the expected parameters should be listed, along with what datatype is expected for every parameter. +The specifications can be nested, for example, the `rename_columns` operation requires a parameter `column_mapping`, +which should be a JSON object whose keys are any valid string, and whose values are also strings. +This is represented in the following way: + +```json +{ + "type": "object", + "properties": { + "column_mapping": { + "type": "object", + "patternProperties": { + ".*": { + "type": "string" + } + }, + "minProperties": 1 + }, + "ignore_missing": { + "type": "boolean" + } + }, + "required": [ + "column_mapping", + "ignore_missing" + ], + "additionalProperties": false + } +``` + +The `PARAMS` dictionaries for all available operations are read by the `validator` and compiled into a +single JSON schema which represents the specification for remodeler files. +The `properties` dictionary explicitly specifies the parameters that are allowed for this operation. +The `required` list specifies which parameters must be included when calling operation. +Parameters that are not required may be omitted in the operation call. +The `"additionalProperties": false` is the way that JSON schema use to +indicate that no other parameters are allowed in the call to the operation. + +A limitation to JSON schema representations is that although it can handle specific dependencies between keys in the data, +it cannot validate the data that is provided in the JSON file against other data in the JSON file. +For example, if the requirement is a list of elements whose length should be specified by another parameter, +JSON schema does not provide a vocabulary for setting this dependency. +Instead, we handle these type of dependencies in the `validate_input_data` method. + +(operation-class-constructor-anchor)= +### Operation class constructor + +All the operation classes have constructors that start with a call to the superclass constructor `BaseOp`. +The following example shows the constructor for the `RemoveColumnsOp` class. + +````{admonition} The class-wide PARAMS dictionary for the RemoveColumnsOp class. +:class: tip +```python + def __init__(self, parameters): + super().__init__(parameters) + self.column_names = parameters['column_names'] + ignore_missing = parameters['ignore_missing'] + if ignore_missing: + self.error_handling = 'ignore' + else: + self.error_handling = 'raise' +``` +```` + +After the call to the base class constructor, the operation constructor assigns the operation-specific +values to class properties. Validation takes place before the operation classes are initialized. + + +(the-do_op-implementation-anchor)= +### The do_op implementation +The remodeling script is meant to be executed by the `Dispatcher`, +which keeps a compiled version of the remodeling script to execute on each tabular file to be remodeled. + +The main method that must be implemented by each operation is `do_op`, which takes +an instance of the `Dispatcher` class as the first parameter and a Pandas [`DataFrame`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) +representing the tabular file as the second parameter. +A third required parameter is a name used to identify the tabular file in error messages and summaries. +This name is usually the filename or the filepath from the dataset root. +An additional optional argument, a sidecar containing HED annotations, +only need be included for HED operations. +Note that the `Dispatcher` is responsible for holding the appropriate version of the HED schema if +HED remodeling operations are included. + +The following example shows a sample implementation for `do_op`. + +````{admonition} The implementation of do_op for the RemoveColumnsOp class. +:class: tip +```python + + def do_op(self, dispatcher, df, name, sidecar=None): + return df.drop(self.remove_names, axis=1, errors=self.error_handling) +``` +```` + +The `do_op` in this case is a wrapper for the underlying Pandas `DataFrame` +operation for removing columns. + +**IMPORTANT NOTE**: The `do_op` operation always assumes that `n/a` values have been +replaced by `numpy.NaN` values in the incoming dataframe `df`. +The `Dispatcher` class has a static method `prep_data` that does this replacement. +At the end of running all the remodeling operations on a data file `Dispatcher` `run_operations` +method replaces all of the `numpy.NaN` values with `n/a`, the value expected by BIDS. +This operation is performed by the `Dispatcher` static method `post_proc_data`. + + +### The validate_input_data implementation + +This method exist to handle additional input data validation that cannot be specified in JSON schema. +It is a class method which is called by the `validator`. +If there is no additional validation to be done, +a minimal implementation of this method should take in a dictionary with the operation parameters and return an empty list. +In case additional validation is required, the method should directly implement validation and return a list of user-friendly +error messages (strings) if validation fails, or an empty list if there are no errors. + +The following implementation of `validate_input_data` method, for the `factor_hed_tags` operation, +checks whether the parameter `query_names` is the same length as the input for parameter `queries`, +since the names specified in the first parameter are meant to represent the queries provided in the latter. +The check only takes place if `query_names` exists, since naming is handled automatically otherwise. + +```python +@staticmethod +def validate_input_data(parameters): + errors = [] + if parameters.get("query_names", False): + if len(parameters.get("query_names")) != len(parameters.get("queries")): + errors.append("The list in query_names, in the factor_hed_tags operation, should have the same number of items as queries.") + return errors +``` + + +(the-do_op-for summarization-anchor)= +### The do_op for summarization + +The `do_op` operation for summarization operations has a slightly different form, +as it serves primarily as a wrapper for the actual summary information as illustrated +by the following example. + +(implementation-of-do-op_summarize-column-names-anchor)= +````{admonition} The implementation of do_op for SummarizeColumnNamesOp. +:class: tip +```python + def do_op(self, dispatcher, df, name, sidecar=None): + summary = dispatcher.summary_dict.get(self.summary_name, None) + if not summary: + summary = ColumnNameSummary(self) + dispatcher.summary_dict[self.summary_name] = summary + summary.update_summary({"name": name, "column_names": list(df.columns)}) + return df + +``` +```` + +A `do_op` operation for a summarization checks the `dispatcher` to see if the +summary name is already in the dispatcher's `summary_dict`. +If that summary is not yet in the `summary_dict`, +the operation creates a `BaseSummary` object for its summary (e.g., `ColumnNameSummary`) +and adds this object to the dispatcher's `summary_dict`, +otherwise the operation fetches the `BaseSummary` object from the dispatcher's `summary_dict`. +It then asks this `BaseSummary` object to update the summary based on the `DataFrame` +as explained in the next section. + +(additional-requirements-for-summarization-anchor)= +### Additional requirements for summarization + +Any summary operation must implement a supporting class that extends `BaseSummary`. +This class is used to hold and accumulate the information specific to the summary. +This support class must implement two methods: `update_summary` and `get_summary_details`. + +The `update_summary` method is called by its associated `BaseOp` operation during the `do_op` +to update the summary information based on the current `DataFrame`. +The `update_summary` information takes a single parameter, which is a dictionary of information +specific to this operation. + +````{admonition} The update_summary method required to be implemented by all BaseSummary objects. +:class: tip +```python + def update_summary(self, summary_dict) +``` +```` + +In the example [do_op for ColumnNamesOp](implementation-of-do-op_summarize-column-names-anchor), +the dictionary contains keys for `name` and `column_names. + +The `get_summary_details` returns a dictionary with the summary-specific information +currently in the summary. +The `BaseSummary` provides universal methods for converting this summary to JSON or text format. + + +````{admonition} The get_summary_details method required to be implemented by all BaseSummary objects. +:class: tip +```python + get_summary_details(self, verbose=True) +``` +```` +The operation associated with this instance of it associated with a given format +implementation + +### Validator implementation + +The required input for the remodeler is specified in JSON format and must follow +the rules laid out by the JSON schema. +The parameters in the remodeler file must conform to the properties specified +in the corresponding JSON schema associated with each operation. +The errors are retrieved from the validator but are not passed on directly but instead +modified for display as user-friendly error messages. + +Validation errors are organized by stages as follows. + +#### Stage 0: top-level structure + +Stage 0 refers to the top-level structure of the remodel JSON file. +As specified by the validator's `BASE_ARRAY`, +a JSON remodel file must be an array of operation dictionaries containing at least 1 item. + +#### Stage 1: operation dictionary format + +Stage 1 validation refers the structure of the individual operations as specified by the validator's `OPERATION_DICT`. +Every operation dictionary should have exactly the keys: `operation`, `description`, and `parameters`. + +#### Stage 2: operation dictionary values + +Stage 2 validates the values associated with the keys in each operation dictionary. +The `operation` and `description` keys should have a string values, +while the `parameters` key should have a dictionary value. + +Stage 2 validation also verifies that the operation value is one of the valid operations as +enumerated in the `valid_operations` dictionary. + +Several checks are also applied to the `parameters` dictionary. +The properties listed as `required` in the schema must appear as keys in the `parameters` dictionary. + +If additional properties are not allowed, as designated by `"additionalProperties": False` in the JSON schema, +the validator verifies that parameters not mentioned in the schema do not appear. +Note this is currently true for all operations and recommended for new operations. + +If the schema for the operation has a `dependentRequired` dictionary, the validator +verifies that the indicated keys are present if the listed parameter values are also present. +For example, the `factor_column_op` only allows the `factor_names` parameter if the `factor_values` +parameter is also present. In this case the dependency works only one way, such that `factor_values` +can be provided without `factor_names`. If `factor_names` is provided alone the operation automatically generates the +factor names based on the column names, however, without `factor_values` the names provided +in `factor_names` do not correspond to anything, so this key cannot appear on its own. + +#### Later validation stages + +Later stages in validation concern the values given within the parameter object, which can be nested to an arbitrary level +and are handled in a general way. +The user is provided with the operation index, name and the 'path' of the value that is invalid. +Note that while parameters always contains an object, the values in parameters can be of any type. +Thus, parameter values can be objects whose values might also be expected to be objects, arrays, or arrays of objects. +The validator has appropriate messages for many of the conditions that can be set with json schema, +but if an new operation parameter has a condition that has not been used yet, a new error message will need to be added to the validator. + + +When validation against JSON schema passes, +the validator performs additional data-specific validation by calling `validate_input_data` +for each operation to verify that input data satisfies the +constraints that fall outside the scope of JSON schema. +Also see [**The validate_input_data implementation**](#the-validate_input_data-implementation) and +[**The PARAMS dictionary**](#the-params-dictionary) sections for additional information. diff --git a/docs/source/HedSearchGuide.md b/docs/source/HedSearchGuide.md index c4dde7d..e4df36e 100644 --- a/docs/source/HedSearchGuide.md +++ b/docs/source/HedSearchGuide.md @@ -241,8 +241,8 @@ based on HED annotations in a dataset-independent manner. These queries can be used to locate data sets satisfying the specified criteria and to find the relevant event markers in that data. -For example, the [**factor_hed_tags**](https://www.hed-resources.org/en/latest/FileRemodelingTools.html#factor-hed-tags) -operation of the HED [**file remodeling tools**](https://www.hed-resources.org/en/latest/FileRemodelingTools.html) +For example, the [**factor_hed_tags**](https://www.hed-resources.org/en/latest/HedRemodelingTools.html#factor-hed-tags) +operation of the [**HED remodeling tools**](https://www.hed-resources.org/en/latest/HedRemodelingTools.html) creates factor vectors for selecting events satisfying general HED queries. The [**HED-based epoching**](https://www.hed-resources.org/en/latest/HedMatlabTools.html#hed-based-epoching) tools in [**EEGLAB**](https://sccn.ucsd.edu/eeglab/index.php) diff --git a/docs/source/HedSummaryGuide.md b/docs/source/HedSummaryGuide.md index 51af930..27c362b 100644 --- a/docs/source/HedSummaryGuide.md +++ b/docs/source/HedSummaryGuide.md @@ -1,7 +1,7 @@ (hed-summary-guide-anchor)= # HED summary guide -The HED [**File remodeling tools**](https://www.hed-resources.org/en/latest/FileRemodelingTools.html) provide a number of event summaries +The HED [**HED remodeling tools**](https://www.hed-resources.org/en/latest/HedRemodelingTools.html) provide a number of event summaries and event file transformations that are very useful during curation and analysis. The summaries described in this guide are: @@ -10,8 +10,8 @@ The summaries described in this guide are: * [**HED tag summary**](hed-tag-summary-anchor) * [**Experimental design summary**](experimental-design-summary-anchor) -As described in more detail in the [**File remodeling quickstart**](https://www.hed-resources.org/en/latest/FileRemodelingQuickstart.html) tutorial and the -[**File remodeling tools**](https://www.hed-resources.org/en/latest/FileRemodelingTools.html) +As described in more detail in the [**HED remodeling quickstart**](https://www.hed-resources.org/en/latest/HedRemodelingQuickstart.html) tutorial and the +[**HED remodeling tools**](https://www.hed-resources.org/en/latest/HedRemodelingTools.html) user manual, these tools have as input, a JSON file with a list of remodeling commands and an event file. Summaries involving HED also require a HED schema version and possibly a JSON sidecar containing HED annotations. diff --git a/docs/source/HedValidationGuide.md b/docs/source/HedValidationGuide.md index 80d524f..757d3f8 100644 --- a/docs/source/HedValidationGuide.md +++ b/docs/source/HedValidationGuide.md @@ -250,7 +250,7 @@ Errors, if any are printed to the command line. #### Remodeling validation summaries Validation is also available through HED remodeling tool interface. -As explained in [**File remodeling quickstart**](./FileRemodelingQuickstart.md), +As explained in [**HED remodeling quickstart**](./HedRemodelingQuickstart), the HED remodeling tools allow users to restructure their event files and/or summarize their contents in various ways. Users specify a list of operations in a JSON remodeling file, @@ -318,7 +318,7 @@ For example, the text file is: `/root_path/derivatives/remodel/summaries/validate_initial_xxx.txt` where xxx is the time of generation. -For more information see [**File remodeling quickstart**](./FileRemodelingQuickstart.md) +For more information see [**HED remodeling quickstart**](./HedRemodelingQuickstart) for an overview of the remodeling process and -[**File remodeling tools**](./FileRemodelingTools.md) for detailed descriptions of +[**HED remodeling tools**](./HedRemodelingTools) for detailed descriptions of the operations that are currently supported. \ No newline at end of file diff --git a/docs/source/HowCanYouUseHed.md b/docs/source/HowCanYouUseHed.md index 81aff08..46d9595 100644 --- a/docs/source/HowCanYouUseHed.md +++ b/docs/source/HowCanYouUseHed.md @@ -92,7 +92,7 @@ in post-processing and assure that the conditions are correctly marked. #### Logs to event files Although the HED tools do not yet directly support any particular experimental presentation/control -software packages, the HED [**File remodeling tools**](./FileRemodelingTools.md) can +software packages, the HED [**HED remodeling tools**](./HedRemodelingTools) can be useful in working with logged data. Assuming that you can put the information from your experimental log into a tabular form such as: @@ -108,8 +108,8 @@ Assuming that you can put the information from your experimental log into a tabu ```` -The [**summarize column values**](./FileRemodelingTools.md#summarize-column-values) -operation in the HED [**file remodeling tools**](./FileRemodelingTools.md) +The [**summarize column values**](./HedRemodelingTools#summarize-column-values) +operation in the HED [**file remodeling tools**](./HedRemodelingTools) compiles detailed summaries of the contents of tabular files. Use the following remodeling file and your tabular log file as input to the HED online [**event remodeling**](https://hedtools.ucsd.edu/hed_dev/events) tools @@ -135,10 +135,10 @@ to quickly get an overview of its contents. ### Post-processing the event data The information that first comes off the experimental logs is usually not directly usable for -sharing and analysis. A number of HED [**File remodeling tools**](./FileRemodelingTools.md) +sharing and analysis. A number of HED [**HED remodeling tools**](./HedRemodelingTools) might be helpful for restructuring your first pass at the event files. -The [**remap columns**](./FileRemodelingTools.md#remap-columns) transformation is +The [**remap columns**](./HedRemodelingTools#remap-columns) transformation is particularly useful during the initial processing of tabular log information as exemplified by the following example @@ -395,7 +395,7 @@ to improperly handle these situations, reducing the accuracy of analysis. At this time, your only option is to do manual checks or write custom code to detect these types of experiment-specific inconsistencies. However, work is underway to include some standard types of checks in the -HED [**File remodeling tools**](./FileRemodelingTools.md) in future releases. +HED [**HED remodeling tools**](./HedRemodelingTools) in future releases. You may also want to reorganize the event files using the remodeling tools. See the [**Remap columns**](remap-columns-anchor) @@ -431,7 +431,7 @@ work and possibly contact with the data authors for correct use and interpretati You can get a preliminary sense about what is actually in the data by downloading a single event file (e.g., a BIDS `_events.tsv`) and its associated JSON sidecar (e.g., a BIDS `_events.json`) and creating HED remodeling tool summaries using the -[**HED online tools for debugging**](./FileRemodelingQuickstart.md#online-tools-for-debugging). +[**HED online tools for debugging**](./HedRemodelingQuickstart#online-tools-for-debugging). Summaries of particular use for analysts include: - The [**column value summary**](./HedSummaryGuide.md#column-value-summary) compiles a summary of @@ -449,11 +449,11 @@ or temporal layout of the experiment. While HED tag summary and the experimental design summaries require that the dataset have HED annotations, these summaries do not rely on the experiment-specific event-coding used in each experiment and can be used to compare information for different datasets. -The [**File remodeling quickstart**](./FileRemodelingQuickstart.md) tutorial +The [**HED remodeling quickstart**](./HedRemodelingQuickstart) tutorial gives an overview of the remodeling tools and how to use them. -More detailed information can be found in [**File remodeling tools**](./FileRemodelingTools.md). +More detailed information can be found in [**HED remodeling tools**](./HedRemodelingTools). -The [**Online tools for debugging**](./FileRemodelingQuickstart.md#online-tools-for-debugging) +The [**Online tools for debugging**](./HedRemodelingQuickstart#online-tools-for-debugging) shows how to use remodeling tools to obtain these summaries without writing any code. The [**HED conditions and design matrices**](HedConditionsAndDesignMatrices.md) guide explains how @@ -474,8 +474,8 @@ additional code, while generality allows comparison of criteria across different experiments. The factor generation as described in the next section relies on the HED -[**File remodeling tools**](FileRemodelingTools.md). -See [**File remodeling tools**](FileRemodelingTools.md). +[**HED remodeling tools**](HedRemodelingTools). +See [**HED remodeling tools**](HedRemodelingTools). (factor-vectors-and-selection-anchor)= #### Factor vectors and selection @@ -490,18 +490,18 @@ rows as the event file (each row corresponding to an event marker). Factor vectors contain 1's for rows in which a specified criterion is satisfied and 0's otherwise. -- The [**factor column operation**](./FileRemodelingTools.md#factor-column) +- The [**factor column operation**](./HedRemodelingTools#factor-column) creates factor vectors based on the unique values in specified columns. This factor operation does not require any HED information.

-- The [**factor HED tags**](./FileRemodelingTools.md#factor-hed-tags) +- The [**factor HED tags**](./HedRemodelingTools#factor-hed-tags) creates factor vectors based on a HED tag query. The [**HED search guide**](./HedSearchGuide.md) explains the HED query structure and available search options.

-- The [**factor HED type**](./FileRemodelingTools.md#factor-hed-type) +- The [**factor HED type**](./HedRemodelingTools#factor-hed-type) creates factors based on a HED tag representing structural information about the data such as *Condition-variable* (for experimental design and experimental conditions) or *Task*. diff --git a/docs/source/UnderstandingHedVersions.md b/docs/source/UnderstandingHedVersions.md new file mode 100644 index 0000000..a21d44a --- /dev/null +++ b/docs/source/UnderstandingHedVersions.md @@ -0,0 +1 @@ +# Understanding HED versions (draft) \ No newline at end of file diff --git a/docs/source/index.rst b/docs/source/index.rst index 817e75d..7910d11 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -116,12 +116,13 @@ Visit the `HED project homepage `_ for links to BidsAnnotationQuickstart.md HedAnnotationQuickstart.md + UnderstandingHedVersions.md HedAnnotationInNWB.md HedValidationGuide.md HedSearchGuide.md HedSummaryGuide.md HedConditionsAndDesignMatrices.md - FileRemodelingQuickstart.md + HedRemodelingQuickstart.md HedSchemaDevelopersGuide.md @@ -132,7 +133,7 @@ Visit the `HED project homepage `_ for links to HedOnlineTools.md CTaggerGuiTaggingTool.md - FileRemodelingTools.md + HedRemodelingTools.md HedPythonTools.md HedJavascriptTools.md HedMatlabTools.md diff --git a/src/jupyter_notebooks/remodeling/README.md b/src/jupyter_notebooks/remodeling/README.md index 3cbf7ba..3d806dd 100644 --- a/src/jupyter_notebooks/remodeling/README.md +++ b/src/jupyter_notebooks/remodeling/README.md @@ -7,7 +7,7 @@ restructuring tools through its command line interface. ## Quickstart A quickstart overview of the remodeling process and supporting tools can be -found at [**File remodeling quickstart](https://www.hed-resources.org/en/latest/FileRemodelingQuickstart.html). +found at [**HED remodeling quickstart](https://www.hed-resources.org/en/latest/HedRemodelingQuickstart.html). ## Installation requirements From 07d70cbcef4ed897a94710fd467be68ad6043753 Mon Sep 17 00:00:00 2001 From: Kay Robbins <1189050+VisLab@users.noreply.github.com> Date: Fri, 26 Jul 2024 12:12:43 -0500 Subject: [PATCH 3/3] Updated Whats new and HED version tutorial --- docs/source/HedAnnotationInNWB.md | 11 +++- docs/source/UnderstandingHedVersions.md | 71 ++++++++++++++++++++++++- docs/source/WhatsNew.md | 4 ++ 3 files changed, 83 insertions(+), 3 deletions(-) diff --git a/docs/source/HedAnnotationInNWB.md b/docs/source/HedAnnotationInNWB.md index a075fb5..0f9a417 100644 --- a/docs/source/HedAnnotationInNWB.md +++ b/docs/source/HedAnnotationInNWB.md @@ -1,4 +1,4 @@ -# HED annotation in NWB (draft) +# HED annotation in NWB [**Neurodata Without Borders (NWB)**](https://www.nwb.org/) is a data standard for organizing neurophysiology data. NWB is used extensively as the data representation for single cell and animal recordings as well as @@ -22,7 +22,14 @@ The `ndx-hed` extension is not currently supported in MATLAB, although support i ## NWB ndx-hed installation -Should it be uploaded to PyPi? +The `ndx-hed` extension for Python can be installed using `pip`: + + +```bash +pip install -U ndx-hed +``` + +The `ndx-hed` extension for MATLAB is under development and not available. ## NWB ndx-hed examples diff --git a/docs/source/UnderstandingHedVersions.md b/docs/source/UnderstandingHedVersions.md index a21d44a..f142430 100644 --- a/docs/source/UnderstandingHedVersions.md +++ b/docs/source/UnderstandingHedVersions.md @@ -1 +1,70 @@ -# Understanding HED versions (draft) \ No newline at end of file +# Understanding HED versions + +HED (Hierarchical Event Descriptors) schemas are standardized tre-structured vocabularies for annotating experimental data, particularly +neuroimaging and behavioral data. +The **HED standard schema** contains a base vocabulary of terms that are common to most experiments, +while various **HED library schemas** contain discipline specific vocabularies. +The [**HED schema viewer**](https://www.hedtags.org/display_hed.html) allows users to view the available vocabularies. + +Applications that use HED must specify which versions of the HED schemas they are using. +The definitive HED vocabulary files are available in the +[**hed-schemas](https://github.com/hed-standard/hed-schemas) GitHub repository. +Tools retrieve the XML files corresponding to the designated HED either from GitHUB or from +their internal caches to use in validation and analysis. + +This tutorial explains HED versioning and how to specify HED version. + +## HED version basics + +HED uses semantic versioning of the form *Major.Minor.Patch* for the standard schema. +If referring to a library schema the *library_* is prepended to the version. + +When multiple schemas are being used together, you specify the versions as a list of strings. + + +```{list-table} HED version examples +:header-rows: 1 +:name: hed-version-examples + +* - Version + - Meaning +* - *"8.3.0"* + - HED standard schema version *8.3.0* +* - *"score_1.0.0"* + - SCORE library schema version *1.0.0* +* - *"score_1.2.0"* + - SCORE library schema version *score_1.2.0*
partnered with standard schema *8.2.0* +* - *["score_1.2.0", "bc:testlib_4.0.0"]* + - SCORE library schema version *score_1.2.0* and *testlib_4.0.0*. +* - *["score_1.0.0", "ac:8.3.0"]* + - SCORE library schema version 1.0.0 and standard schema *8.3.0*. +* - *["lang_1.0.0", "score_2.0.0"]* + - LANG library schema version *lang_1.0.0* and
SCORElibrary schema *score_2.0.0*
both partnered with standard schema *8.3.0*. +``` + +SCORE library schema version 1.0.0 is an **unpartnered schema**. +This means that if you want to use any tags from the standard schema you must explicitly give the version. +Multiple unpartnered schemas must use prefixes for all but one of the schema versions. +Annotations using tags from schemas whose versions are prefixed must also include the prefix in the tag. +So if the version specification is *["score_1.0.0", "ac:8.3.0"]*, an annotation using the HED tag `Event` +must be `ac:Event`. + +**Partnered schemas** automatically include a specific version of the standard schema. +LANG library schema 1.0.0 and SCORE library schema 2.0.0 (both in prerelease) +are both partnered with standard schema 8.3.0. +Further, these library schemas have no conflicts with each other. +Hence, the version specifications *["lang_1.0.0", "score_2.0.0"]* does not require a prefixes. +All three schemas are loaded as a single schema at runtime. + +## Using HED versions + +In BIDS (Brain Imaging Data Structure) datasets, the HED version is +specified in the `dataset_description.json` file at the top level of the dataset. +See [**7.5. Library schemas in BIDS**](https://hed-specification.readthedocs.io/en/latest/07_Library_schemas.html#library-schemas-in-bids) +in the HED specification for information about the rules. + +In NWB (Neurodata Without Borders) dataset, the HED version is specified +when `HedTags` objects are created. +See [**HED annotation in NWB**](https://www.hed-resources.org/en/develop/HedAnnotationInNWB.html) +for additional information and examples. + diff --git a/docs/source/WhatsNew.md b/docs/source/WhatsNew.md index 729e61b..f89508b 100644 --- a/docs/source/WhatsNew.md +++ b/docs/source/WhatsNew.md @@ -1,6 +1,10 @@ (whats-new-anchor)= # What's new? +**July 25, 2024**: **ndx-hed 0.1.0 HED extension for NWB released on PyPI.** +> Initial support includes the HedTags class extending VectorData. +> HedTags objects may be added as columns to any NWB DynamicTable. + **July 4, 2024**: **HED specification v3.3.0 released.** > [**https://zenodo.org/records/12664745**](https://zenodo.org/records/12664745) > Includes lazy partnering and HED ontology formats.