[Feature spec] Regulations import dialog #39

daemontus · 2024-05-29T09:15:22Z

Currently, there is no way to import interaction data from (reasonably) standard formats, like .csv or .sif into the regulations editor. These formats are machine-readable, but typically don't have a truly standard structure that we can import correctly every time. For example, .csv could contain many columns, each having an unknown meaning if the header is missing. Similarly, .sif has a known column count, but it is not always clear what interaction types map to what regulation properties. As such, there is a need for a comprehensive import dialog that can handle the task of "extract interactions from a structured data file".

This does not concern formats like .sbml and .aeon which combine regulation data with other types of information (functions, parameters, properties, ...). For these, we need to explore a dedicated import dialog which will let users pick what portion of the file they want to import and what they want to ignore.

Requirements

Note that the following are not separated into fronted/backend, but generally mix both.

Domain requirements

The import should be ideally facilitated by a single dialog (i.e. there is no multi-step "flow", just one window where everything gets resolved).
The goal is to extract the following information from the data file for each interaction:
- Identification of a source entity and a target entity. These are typically alphanumeric strings, but can have some associated issues: First, an entity can have both a "display name" and an "ID". Display name can be almost anything while ID is relatively strictly defined. Second, the names can often be "close enough" but not exactly the same between the input file and the current state of the editor. These can cover "CDK46" vs. "Cdk46" (i.e. the name is clearly the same but written in a different "style"), as well as functionally equivalent cases, like "AP-1" vs. "Jun-Fos-complex".
- Identification of interaction essentiality, i.e. whether the interaction is required, unused, or unknown. There are some relatively common names for these situations (especially for .sif), but there is no clear standard.
- Similarly, a sign (or monotonicity) is either positive, negative, dual, or unknown. Again, in .sif, there are some default values that we can assume, but there is mostly no truly standard way to map these values.
- If possible, an annotation. This can be almost any text, but typically is some sort of reference to literature/data where said interaction is discussed. In theory, this could also be constructed as a combination of multiple columns.
Due to the issues outlined above, the import process needs to be highly customizable and allow the user to essentially pick what data is imported and how.

Functional requirements

The dialog will contain a preview of the loaded structured data file as a table (we already have this when importing observations).
The preview table should be separated into two parts: The columns with the raw data that was loaded, plus the columns that show a live preview of what will actually be imported (i.e. what the raw data maps to).
Ideally, if the number of columns is large, this "imported preview" part should be locked at a fixed position, while the rest of the table scrolls horizontally (i.e. the preview is always visible). However, I'm not sure we can implement this with our current table component, so if this is an issue, we might be able to implement this using two tables instead?
In the data table, it should be possible to manually exclude rows (similar to the observations table), in which case they remain visible in the table, but their "imported preview" will be empty.
In the data table, it should be possible to manually edit individual cells, e.g. to fix typos.
In the preview table, if a row is not excluded but we have issues mapping it to a valid interaction, show some sort of error indicator for that row.
Provide a toggle that only shows items with issues in the preview/data tables. Currently I am imagining a switch with "All items (XXX) / Issues (XXX)" as labels, but we can probably consider various options.
For .csv files, we need a checkbox that indicates "First row is a header". For .sif, this defined in the format so we don't need this as user input.
Following the table is the configuration of the import process, first variables, then monotonicity, and then essentiality.
For .csv files, we need two dropdowns, "source" and "target", where one can select a column for each item (if a header is selected, values from the first row are used as names, if no header is available, the format should be Column 1 ("value...")). For .sif, this part is not shown as source/target columns are known.
Then we show information about unrecognized entities and options to fix them. If there are no unrecognized entities, this is not shown. There should be a global switch with "Ignore / Create / Custom". Ignore and create globally apply to all interactions that involve unknown variables. Unrecognized also includes entities that have an ambiguous interpretation, but this should be rare.
If global "ignore"/"create" is selected, a message should be shown with a confirmation of what is the consequence ("XX interaction are ignored", "XX variables will be created"). A badge/indicator should be added to the preview table to reflect this as well.
If "custom" is selected, a list of unknown variables is shown with options to "fix" each variable. This should be a dropdown with "create/ignore/remap". If "remap" is selected, another dropdown is shown with a list of existing entities.
For each variable in the "custom" list, we can also shown "suggestions", if there are some variables that have similar names, or names that are matched based on "functional equivalence". For each of these suggestions, an indicator should be shown as to why this is a suggestion.
If the number of unknown entities is large, we should probably paginate this list (but it does not need to be loaded dynamically, it is ok to paginate this only on the frontend).
In the custom case, we should also show a summary of how issues are resolved (X entities created, Y entities remapped, Z interactions ignored).
After entities, we show monotonicity mapping. This starts with a checkbox indicating whether monotonicity is imported at all (if not, all monotonicity is "unknown").
If the format is csv, we also show a dropdown for selecting the relevant column. For sif, the column is known.
Then, we have one input field for each monotonicity enum value. Here, the content for the "unknown" field is auto-generated and contains all values that are not assigned to anything else. The remaining categories can be assigned manually, with autocomplete showing the possible values taken from the row. There should be an indicator for each category to show the count of values in each input field.
It would be nice to also have the option to switch the checking from "listing" to "pattern", in which case each input field would contain a regular expression. But this seems a bit complicated to implement.
Similar to variable listing, a summary of how many interactions fall into which category is shown.
Essentiality works the same way, just the number of categories is different.

Design preview

Any of this can change, these are mostly just suggestions:

The text was updated successfully, but these errors were encountered:

daemontus · 2024-05-30T13:46:26Z

What if interactions already exist?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature spec] Regulations import dialog #39

[Feature spec] Regulations import dialog #39

daemontus commented May 29, 2024

daemontus commented May 30, 2024

[Feature spec] Regulations import dialog #39

[Feature spec] Regulations import dialog #39

Comments

daemontus commented May 29, 2024

Requirements

Domain requirements

Functional requirements

Design preview

daemontus commented May 30, 2024