History tracking, provenance and states #690
Replies: 1 comment
-
In GitLab by @yoavnash on May 6, 2020, 11:01 To clarify the point about ontology independence: a solution may use an ontology, but I think it should be general so that it's not just working for EMMO. In this case, extending CUBA might be a good idea. Another point to keep in mind is that not only a wrapper can create an event that is relevant to the history of the CUDS object. That is, also a user, who uses the semantic layer, can create those events. This might be relevant to support keeping track of workflows that go beyond a specific wrapper. Concerning the implementation, allowing to store metadata for a CUDS object can also be useful in this case. Here, we can have a metadata module that will keep track of the changes (if the user wishes to do so) and with its own API, the user can query the changes. See a previous discussion about the metadata here. |
Beta Was this translation helpful? Give feedback.
-
In GitLab by @pablo-de-andres on May 5, 2020, 10:58
Keeping track of different versions of an object has been a recurring topic in our discussions for a long time. This issue will group the motivation, approaches and decisions regarding this topic.
Relates to #127
Motivation
Wrappers
The behaviour of the wrappers could follow 2 main paradigms, namely modify or create data
1. Modify data
Advocate: @pablo-de-andres
Example: Current implementation of SimLammps
Description: The wrapper takes some input data, run for a number of steps, and overwrites the input data with the latest value. The wrapper here behaves as a process generating an output from an input. But no memory capabilities are present, and there is no direct way of tracing back the changes.
History tracking: Has to be implemented outside of the wrapper, and controlled by the user. The user sets what and when to store a snapshot.
Pros:
Cons:
2. Create data
Advocate: @urbanmatthias
Example: Current implementation of Gromacs Wrapper
Description: The wrapper has some input data (connected through a
hasInput
or similar relationship) and generates output data. When the user queries the wrapper, the data will be stored under ahasOutput
(or equivalent) relationship. If multiple runs are called sequentially, the output of one simulation becomes the input of the next one. This means loading the full output state of the engine in the output, so it is available as an input. The behaviour of the wrapper would mimic more a full workflow, where every run is a process, with its own input and output.History tracking: An inherent part of the wrapper. Requires to keep a connection between an entity through all its states (possibly through a relationship).
Pros:
Cons:
Bonus option: Internal engine output file storage
Advocate: @ahashibon
Description: A hybrid of the first option where the engine is internally asked to generate output files every step (or multiple fixed steps) and stored internally. This files could be parsed on demand if the data is required.
History tracking: Done through the tracking of the files generated by the engine and kept internally.
Pros:
Cons:
Decision
Standardise option 1.
This requires the design an implementation of an external history tracking mechanism.
A desired requirement coming from this point would be to integrate the history tracking in a way that the user could easily define some parameters and have an optional approach similar to option 2 that would be automatic. This means the tracking should become a part of the semantic or the interoperability (session class) layer.
Implementation considerations
Implementation ideas
Beta Was this translation helpful? Give feedback.
All reactions