Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable-Based iteration layout #855

Merged
merged 11 commits into from
Apr 23, 2021
Merged

Conversation

franzpoeschel
Copy link
Contributor

@franzpoeschel franzpoeschel commented Dec 21, 2020

Based on topic-adios-variables-for-attributes since "variable attributes" are needed for this.
For comparing the changes: franzpoeschel/openPMD-api@topic-adios-variables-for-attributes...franzpoeschel:topic-stepbased

This adds a third iteration layout to the openPMD API: Next to file-based and group-based, a step-based layout.

Idea: ADIOS prefers consuming data in steps. Until now, the openPMD API created entirely new datasets for each iteration, while ADIOS(2) actually allows to reuse datasets across steps. A simple dataset may now look like this in ADIOS2:

  string    /basePath                        scalar
  uint64_t  /data/__step__                   10*scalar
  uint8_t   /data/closed                     10*scalar
  double    /data/dt                         10*scalar
  int8_t    /data/meshes/E/axisLabels        10*{1, 2}
  string    /data/meshes/E/dataOrder         10*scalar
  string    /data/meshes/E/geometry          10*scalar
  double    /data/meshes/E/gridGlobalOffset  10*{1}
  double    /data/meshes/E/gridSpacing       10*{1}
  double    /data/meshes/E/gridUnitSI        10*scalar
  float     /data/meshes/E/timeOffset        10*scalar
  double    /data/meshes/E/unitDimension     10*{7}
  int32_t   /data/meshes/E/x/__data__        10*{1000}
  double    /data/meshes/E/x/position        10*{1}
  double    /data/meshes/E/x/unitSI          10*scalar
  int32_t   /data/meshes/E/y/__data__        10*{__}
  double    /data/meshes/E/y/position        10*{1}
  double    /data/meshes/E/y/unitSI          10*scalar
  double    /data/time                       10*scalar
  double    /data/timeUnitSI                 10*scalar
  string    /date                            scalar
  string    /iterationEncoding               scalar
  string    /iterationFormat                 scalar
  string    /meshesPath                      scalar
  string    /openPMD                         scalar
  uint32_t  /openPMDextension                scalar
  string    /software                        scalar
  string    /softwareVersion                 scalar

This helps controlling the amount of metadata by reusing attributes and variables across steps.

TODO:

  • Remove currentIteration group and merge contents with layer above.
    EDIT: See comment below, there are pros and cons and this should be discussed first.
  • Merge topic-adios-variables-for-attributes before this.
  • Merge lazy parsing of iterations #938 before this
  • CI failures are because I made the use of steps the default behavior for the new schema and there are bugs in ADIOS2 2.6.0 that prevent that. So, merge this only after bumping the required version to ADIOS2 2.7.0.
  • Documentation
  • More thorough testing
  • Formalize this in the openPMD standard: IterationEncoding: variableBased openPMD-standard#250
  • iterationFormat and similar attributes
  • Use something like data%V.bp for automatic recognition of the iteration layout?
  • test that changing extents (shapes) over time work for meshes & particles
    Update: Doesn't work at the moment. I know the reason, need to find a way to fix it.
    Update: I've reordered the PRs, this is now based on lazy parsing of iterations #938 (lazy parsing). In lazy parsing mode, this now works. In eager parsing, parsing is done before any step is opened and datasets are not (yet) re-parsed. Fix that.
    Update: Both should be working now.
  • Naming: rename __step__ to snapshot (see Rename Iteration openPMD-standard#148)
  • Expose iteration encoding as JSON option – maybe postpone to a more thorough JSON PR
  • Guarantee removal of stale content when reparsing

@franzpoeschel
Copy link
Contributor Author

franzpoeschel commented Dec 22, 2020

The latest commit removes the currentIteration group:

  string    /basePath                        scalar
  uint8_t   /data/closed                     10*scalar
  double    /data/dt                         10*scalar
  uint64_t  /data/iterationIndex             10*scalar
  int8_t    /data/meshes/E/axisLabels        10*{1, 2}
  string    /data/meshes/E/dataOrder         10*scalar
  string    /data/meshes/E/geometry          10*scalar
  double    /data/meshes/E/gridGlobalOffset  10*{1}
  double    /data/meshes/E/gridSpacing       10*{1}
  double    /data/meshes/E/gridUnitSI        10*scalar
  float     /data/meshes/E/timeOffset        10*scalar
  double    /data/meshes/E/unitDimension     10*{7}
  int32_t   /data/meshes/E/x/__data__        10*{1000}
  double    /data/meshes/E/x/position        10*{1}
  double    /data/meshes/E/x/unitSI          10*scalar
  double    /data/time                       10*scalar
  double    /data/timeUnitSI                 10*scalar
  string    /date                            scalar
  string    /iterationEncoding               scalar
  string    /iterationFormat                 scalar
  string    /meshesPath                      scalar
  string    /openPMD                         scalar
  uint32_t  /openPMDextension                scalar
  string    /software                        scalar
  string    /softwareVersion                 scalar

I'm not sure how to feel about this.
Pro: Looks nicer, is simpler
Con: It's no longer possible to distinguish attributes written to series.iterations and series.iterations[0].

Would this lead to problems? Are attributes commonly written to series.iterations?
Also if we go with this:

  • Even if the other backends don't support step-based iteration layout, make sure they deal correctly with openPath(path=""). The ADIOS2 backend didn't.

@ax3l
Copy link
Member

ax3l commented Feb 8, 2021

Hi @franzpoeschel, can you please rebase this PR for review? :)

#855 (comment)

Pro: Looks nicer, is simpler
Con: It's no longer possible to distinguish attributes written to series.iterations and series.iterations[0].

Would this lead to problems? Are attributes commonly written to series.iterations?

The layout without the extra group is exactly what we want. Attributes that were on each /data/<iteration>/ should not clash with anything in /data by design of openPMD-standard, so we are good to go. This is similar to how we handle scalar and vector records already.

I would rename iterationIndex to __steps__ as we already do for __data__.

For tests, I would say we can add as we do the "ext"(ension) argument for many steps another "encoding" argument and just increase write/read coverage by running many more unit tests through it.

@franzpoeschel
Copy link
Contributor Author

For tests, I would say we can add as we do the "ext"(ension) argument for many steps another "encoding" argument and just increase write/read coverage by running many more unit tests through it.

This iteration layout requires use of the streaming API, so the number of tests we can hijack is limited by that. But testing can surely be extended there.
Currently, picking this iteration layout is supported only via Series::setIterationLayout. I think we should add some kind of pattern for this as well, so applications won't have to explicitly add support for this layout. I propose something like series%V.bp. That would also simplify hijacking tests.

@franzpoeschel
Copy link
Contributor Author

Hi @franzpoeschel, can you please rebase this PR for review? :)

Done

#855 (comment)
Pro: Looks nicer, is simpler
Con: It's no longer possible to distinguish attributes written to series.iterations and series.iterations[0].
Would this lead to problems? Are attributes commonly written to series.iterations?

The layout without the extra group is exactly what we want. Attributes that were on each /data/<iteration>/ should not clash with anything in /data by design of openPMD-standard, so we are good to go. This is similar to how we handle scalar and vector records already.

It's also a bit helpful for implementation. This way, I can inquire Iteration::getAttribute("__step__") also via Series::iterations::getAttribute("__step__"), so we know which iteration to open in the first place. Otherwise, things get a bit hen-and-egg ;)
So, since that one is currently implemented, I'm keeping it.

I would rename iterationIndex to __steps__ as we already do for __data__.

Done

@ax3l
Copy link
Member

ax3l commented Feb 8, 2021

Should we default to the new attribute layout #813 with stepBased from the beginning?

Maybe I overlook it in the review, but I cannot spot where this is exactly triggered in this PR if it is.

@franzpoeschel franzpoeschel force-pushed the topic-stepbased branch 2 times, most recently from 52f311e to 480d343 Compare April 8, 2021 12:55
@franzpoeschel
Copy link
Contributor Author

Think I found an ancient bug: ADIOS1 reports the datatype of unitDimension as VEC_DOUBLE. This never came up since the unitDimension attributes are default-initialized and the defaults were not overwritten by what was actually in the file. They now are.

@franzpoeschel
Copy link
Contributor Author

Should we default to the new attribute layout #813 with stepBased from the beginning?

Maybe I overlook it in the review, but I cannot spot where this is exactly triggered in this PR if it is.

The ADIOS2 backend will now automatically switch to the new layout when selecting step-based iteration encoding.

Otherwise, this is ready for review from my side :) @ax3l

@ax3l
Copy link
Member

ax3l commented Apr 21, 2021

Awesome, can you please rebase out the little merge conflicts from the last merge and I'll get to it before I am on PTO :-)

@franzpoeschel
Copy link
Contributor Author

Passing apart from a CI run that fails to initialize. I think this one is ready @ax3l

@ax3l
Copy link
Member

ax3l commented Apr 23, 2021

Awesome! Addressing the brew macOS CI thingy in #970

@ax3l ax3l added the api: new additions to the API label Apr 23, 2021
Copy link
Member

@ax3l ax3l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! 🚀 ✨

@ax3l ax3l merged commit 86a15ba into openPMD:dev Apr 23, 2021
@ax3l ax3l changed the title Step-Based iteration layout Variable-Based iteration layout Apr 24, 2021
@ax3l
Copy link
Member

ax3l commented Apr 24, 2021

@franzpoeschel I just realized: did we intentionally call this now variableBased? We designed this as stepBased:
openPMD/openPMD-standard#250 I can change this in the standards as well, I just cannot recall if we intentionally renamed it.

@franzpoeschel
Copy link
Contributor Author

franzpoeschel commented Apr 29, 2021

I initially named the new encoding "step-based", but in an offline discussion you suggested the use of "variable-based" encoding to stress the fact that we are using variables, i.e. datasets with version numbers, to encode openPMD iterations. @ax3l

@ax3l
Copy link
Member

ax3l commented May 20, 2021

@franzpoeschel Thanks. I looked at this again from an openPMD perspective, where we don't define the term "variable".
We currently have a "Series of Iterations", which will likely be called "Series of Snapshots" in 2.0 openPMD/openPMD-standard#148

It's generally ok-ish to go with variableBased since it's ADIOS specific, but as I said it introduces a new term (stepBased would do so as well). Have no better idea at the moment either, though :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: new additions to the API backend: ADIOS2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants