Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MD simulation extension #218

Open
wants to merge 4 commits into
base: upcoming-2.0.0
Choose a base branch
from

Conversation

JunCEEE
Copy link

@JunCEEE JunCEEE commented Jan 26, 2020

Implements issue: #212 @CFGrote

Description

What is introduced, removed or renamed and why?

  • Molecular Dynamics simulation extension is introduced for MD data IO.

Example

  • The example data structure can be found HERE.

Affected Components

  • EXT-MD

Writer Changes

What would a writer need to change?

  • The writer needs to be able to write observables as a group under each Iteration, in the same level of particles to describe the thermodynamic parameters of the whole atomic system.

Does this pull request change the interpretation of existing data writers?

  • Not yet.

Reader Changes

What would a reader need to change? Link implementation examples!

@ax3l ax3l self-assigned this Feb 2, 2020
@ax3l ax3l self-requested a review February 2, 2020 05:28
@ax3l ax3l added the EXT: MD molecular dynamics label Feb 2, 2020
@ax3l
Copy link
Member

ax3l commented Feb 2, 2020

Thank you for the PR!
Before we start the review, can you please rebase against the upcoming-2.0.0 branch and squash your commits into one?

so basically you do:

$ git remote -v
# ... list of remotes, remember the name of https://github.com/openPMD/openPMD-standard

$ git rebase -i <nameFromAbove>/upcoming-2.0.0
# now a text edit opens.
#   replace the first word "pick" in each line with "squash"
#   save and close the editor
# a new editor window opens with all commit messages
#   edit and keep the one you want to keep, save

$ git push -f

In case anything goes south in the process: first create a backup branch and you can always git rebase --abort as it runs.

I'll be on a working trip next week but try to take a look as soon as the PR is updated.

@ax3l ax3l requested a review from RemiLehe February 2, 2020 05:35
@ax3l ax3l added this to the openPMD 2.X milestone Feb 2, 2020
- add missed scopes, coordinate attribute
- correct description of the triclinic simulation box
- remove irrelevant simex and lammps references
@JunCEEE
Copy link
Author

JunCEEE commented Feb 2, 2020

@ax3l Thanks. I've followed your instruction and squashed the commits. Please check that.

Copy link
Member

@ax3l ax3l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry for the delay.

This looks wonderful, great work! I added a round of questions and formatting suggestions.

EXT_MD.md Outdated Show resolved Hide resolved
EXT_MD.md Outdated Show resolved Hide resolved
EXT_MD.md Outdated Show resolved Hide resolved
EXT_MD.md Outdated Show resolved Hide resolved
The following additional attributes are defined in this extension.
The individual requirement is given in `scope`.

- `forceField`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for forceField and forceFieldParameters, do you want to define the syntax of the values further or keep values as a human-readable-only free text?
You can check out the SpeciesType extension for ideas how to define syntax and alternatives.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ideal case is that there could be a well-defined syntax to describe the forceField, and to be read by MD simulation code directly. But in practice, for different forceFields, there will be different parameters. Like the LJ potential and EAM potential, the parameter syntax are different.

Since I cannot understand all the potentials used in MD simulation, maybe I can start with defining the syntax from some potentials I understand? If it's other potential, just make it a human-readable-only free text.

Should we also ask the opinion of some other experts about this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, feel free to reach out. Otherwise you can also just keep it undefined for now.

You can always add more attributes in openPMD and if you arrive at something that is worth standardizing for those attributes, then we can just specify them later.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I don't particularly like this constraint in the name. ;)

Consider a case where one wishes to find the ground state of an atomic configuration. This is not a forceField but just a method. Would this make sense to change?

Copy link
Member

@ax3l ax3l Jan 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think @JunCEEE? Sounds reasonable to me, if storing the used method is the intent for this attribute :)

Note that if we use a very general attribute name like method, we need to add a better description to compensate what we mean :)

EXT_MD.md Outdated Show resolved Hide resolved
EXT_MD.md Outdated Show resolved Hide resolved
EXT_MD.md Outdated Show resolved Hide resolved
- description: the representation of the stored position coordinates
- available values:
- `absolute` the unscaled coordinates
- `fractional` the coordinates that are scaled in the range of [0,1] relative to the length of each box edge; in this coordinate system, the `unitSI` of each position component should be `1.0`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For fractional, do you intentionally include the upper value? (E.g. [0.0, 1.0] vs. [0.0, 1.0)) Just double-checking.
Formatting suggestion:

Suggested change
- `fractional` the coordinates that are scaled in the range of [0,1] relative to the length of each box edge; in this coordinate system, the `unitSI` of each position component should be `1.0`
- `fractional` the coordinates that are scaled in the range of `[0.0, 1.0]` relative to the length of each box edge; in this coordinate system, the `unitSI` of each position component should be `1.0`

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think it deeply. But intuitively, for the atom located at the edge corner of the simulation box, it's fractional coordinate is [1.0,1.0,1.0]. Thus I set the range to be [0.0,1.0].

Please correct me if there are some occasions I missed.

Copy link
Member

@ax3l ax3l Feb 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I am just wondering about double-counting @JunCEEE - when a particle belongs to one box or another. But maybe I misunderstood and there is only one (simulation) box here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @JunCEEE here. It may be valuable to have an atom at the box-edge on one side and not the other. I guess the code should it-self check for mirrored positions. Especially considering that not all directions need to be periodic [0 ; 1] seems like the correct choice here.


### Records for Sub-Group `box`

The following records are defined in this extension.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to store these small data sets as records or as attributes? Both is fine, just be aware that attributes can only be scalars or 1D-arrays, due to limitations in some file formats.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I would like to use records here.

@JunCEEE
Copy link
Author

JunCEEE commented Apr 25, 2020

I am sorry for the delay.

This looks wonderful, great work! I added a round of questions and formatting suggestions.

Thanks Axel. I've replied to your questions. Do we need more suggestions from experts in MD simulation?

@ax3l
Copy link
Member

ax3l commented Feb 1, 2021

@CFGrote @JunCEEE do you like to update the PR for merging? :)

@ax3l ax3l mentioned this pull request Feb 4, 2021
@ax3l
Copy link
Member

ax3l commented Feb 19, 2021

VC meeting today:

  • after initial LAMMPS experiments, further XMD and GROMACS workflows are being explored
  • classic MD [atomic positions] & hybrid MD (DFT, ab initio) [additional particle attributes]
  • we'll start with the start-to-end workflows needed for EU XFEL
    • as usual we'll, iterate in follow-up PRs if more people show interest and want to generalize it to their MD workflows, at the moment we focus on ours
  • converter scripts: some outputs are already in HDF5. If existing data is large, a conversion could avoid a fully copy and instead use Virtual Dataset (VDS) and external links so create a light-weight openPMD file that re-interprets the original non-openPMD data

Action items:

  • this PR will get polished with new XMD & GROMACS experience
  • then we'll review & meet on it again

@CFGrote
Copy link

CFGrote commented Apr 13, 2021

hi @ax3l and @JunCEEE , is anything missing on this PR? just wondering. would be great to have it included in 2.0.0

@JunCEEE
Copy link
Author

JunCEEE commented Apr 14, 2021

@CFGrote I realized that the information of scattering factors and electronic charges related is missed in the standard. when I drafted the standard, I only considered the output of traditional/classical MD simulations, where the scattering process was absent. I am not sure whether we add it to this standard or have a different one. Any suggestions?

Copy link

@zerothi zerothi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comments, I come from the DFT community and would like to explore whether OpenPMD could be of use to us!
For that this PR seems very much needed.

EXT_MD.md Outdated

The following attributes are defined in this extension. The individual requirement is given in `scope`.

- `coordinate`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the regular openmpd scheme coordinates are labelled position. Is there need to introduce another naming convention?

Wouldn't it be better to sub-inherit the position spec here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the coordinate attribute is a string indicating if the position records are in absolute coordinates or fractional coordinates. It's not a replacement/equivalent of position.

Maybe the name is a little confusing. We replace it with a bool attribute, fractionalCoordinate. Do you think it would be clearer to be with the bool attribute?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it relates directly to position, then why not positionFormat? One could also imagine scaled coordinates according to some lattice parameter.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like a good suggestion.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EXT_MD.md Outdated
- description: the boundary condition of the box along each dimension. The allowed values in `boundary` are either **periodic** or **none**.
- example values:
- `["periodic","periodic","periodic"]` periodic in all the three dimensions
- `["none","periodic","periodic"]` periodic in only *y* and *z* dimensions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might I suggest we add dirichlet and neumann for a more complete basis?.

Also, I would refrain from using y and z as specifiers. When the box is a (3, 3) D=3 one may have a skewed box where then the boundary refers to the box-boundaries which are not necessarily Cartesian directions. As below the triclinic box is suggested.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestions. I'll add the two conditions. Also, the other dimensions will be referred to as the second and third dimensions in the example.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- `[[1.,0.,0.],[0.,1.,0.],[0.,0.,1.]]` 3D cubic simulation box, Ax = (1.,0.,0.), Ay = (0.,1.,0.), Az = (0.,0.,1.)
- `[[3.46,0.,0.],[1.73,2.997,0.],[0.,0.,10.]]` 3D triclinic simulation box, Ax = (3.46,0.,0.), Ay = (1.73,2.997,0.), Az = (0.,0.,10.)
- `[[1.,0.],[1.,1.]]` 2D rectangle simulation box, Ax = (1.,0.), Ay = (0.,1.)
- `limit`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the difference between limit and edge they are both required?
Why is limit needed? Could you make an example?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edge is a collection of the edge direction vector of each dimension of the box. It basically defines the direction of each edge/dimension of the box. edge=[[1.,0.,0.],[0.,1.,0.],[0.,0.,1.]] means a 3D orthorhombic (instead of cubic, my fault.) simulation box. edge= [[3.46,0.,0.],[1.73,2.997,0.],[0.,0.,10.]] means a 3D triclinic simulation box.

limit defines the starting and ending point on each edge of the box. It basically defines the size and position of the box.

Maybe, still, the naming is a bit confusing. It would be nice if you could suggest something better.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand! ;) You write exactly what the document says.
Why isn't edge fully explaining the box? What does limit tell you that edge doesn't?
Could you make a picture (2D I guess would suffice)?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I.e. what is the relation between edge and limit?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for asking. Actually, I just realized the problem of this definition.

The idea was inspired by xhi and xlo in https://docs.lammps.org/create_box.html. However, I didn't really implement the iead clearly in this definition. What I really need (as shown in the graph) is actually edge for length and direction and origin instead of limit for the origin of the box in a cartesian coordinate. Please let me know how you think if we implement it in this way.
image

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I think we should replace edge with vectors.

In addition to that, I'd also introduce a lengths keyword to define the length of box edges to ease the defining and changing of box sizes (vectors will only represent the direction.). Considering a box with vectors = [[3.46,0.,0.],[1.73,2.997,0.],[0.,0.,10.]], it's not straightforward to change the box size to 5x5x5 using only vectors representing both directions and lengths. With the lengths keyword, this will be more straightforward and practical.

It will be like this:
https://github.com/JunCEEE/openPMD-standard/blob/85369d1dd4119bd27413de90a64caccaac604520/EXT_MD.md?plain=1#L125-L139

What do you think?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think length should be there at all. The vectors should contain the length of the vectors.
I don't understand your:

Considering a box with vectors = [[3.46,0.,0.],[1.73,2.997,0.],[0.,0.,10.]], it's not straightforward to change the box size to 5x5x5 using only vectors representing both directions and lengths. With the lengths keyword, this will be more straightforward and practical.

I really have no clue what you mean here? Do you just want each vector to have a length of 5?

One parameter is more than enough to cover all cases.

My problem with your proposal is that it requires users to do this: 1) read vectors, 2) normalize vectors, 3) read lengths, 4) scale vectors
With only 1 parameter you just read it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. lengths is intended to help users change the box size easily: only change the lengths without re-calculate the vectors, and sizes are changed more frequently than vector directions. But surely your comment also makes sense. The implementation about being user-friendly can also be done on the user interface level. If we follow the principle of storing essential information, we should set only one parameter.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I think this should be a data format. Not a convenience format for simulators. :) So keeping things simple is much more important to me ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like you arrived at a solution :)

@JunCEEE
Copy link
Author

JunCEEE commented Oct 20, 2021

General comments, I come from the DFT community and would like to explore whether OpenPMD could be of use to us! For that this PR seems very much needed.

Hello @zerothi! Welcome! Your comments would be important for us to see if this MD standard can also fulfill the need of DFT community.

@zerothi
Copy link

zerothi commented Oct 20, 2021

I would be happy to review a bit more. Is there something in particular that needs attention?


- `positionFormat`
- type: *(string)*
- scope: *required*
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, another ping here ;)

Since generally position is considered absolute wouldn't it make sense to make this optional, thus defaulting to absolute?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be a good idea for a program that writes in this format. However, as a data format, I feel it's better to make the positionFormat clear in metadata to ensure the data is self-described. Thus I tend to keep it required.

What do you think?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what the OpenPMD community has of rules for these somewhat optional things. I don't mind having it required, it is just weird given that the OpenPMD specification without the MD simulation extension does not have this. If anything this format shouldn't belong to the MD extension.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if I understand correctly, but there is indeed the scope with optional in the "PIC extension" which I take as a template. And there are actually some records belonging to the topic of MD simulation but not essential (e.g. observables).

Maybe @ax3l can comment on this?

Copy link
Member

@ax3l ax3l Jan 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for thinking about this!

What I understand is that this attribute tries to modify how we interpret position records. While we generally can do such modifications in extensions, they might be a bit tricky to keep neatly compatible with the base standard.

For the specific task here, it looks to me like you try to achieve a "fine" and "coarse" position, similar to what we do in some particle-in-cell codes. For this purpose, the base standard has already he "position" and "positionOffset" records defined. Could we potentially use exactly that mechanism here or does it differ?

As an example, in PIConGPU, we defined positionOffset to a cell index, scaled via unitSI back to m. Then we use position with values between [0,1) to store the fine-grained position in a cell. The unitSI for position can be a different value than the one for positionOffset.

For data sets that do not need this splitting in fine and coarse positions, we define positionOffset to zero values (constant record).


### Additional Sub-Group for each Particle Species

`box` is an *optional* sub-group for each particle species to contain the information of the simulation box.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the box optional for each particle species? I mean the box is globally defined for the entire simulation box. Not by each particle specie?

@ax3l
Copy link
Member

ax3l commented Jan 25, 2022

@JunCEEE @zerothi thank you for iterating on this, that is great input!
Do you two like to update the PR together to get it closer to merge-able state? :) It looks to me like most threads started arrived at a solution/improvement suggestion now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EXT: MD molecular dynamics
Projects
Status: Review
Development

Successfully merging this pull request may close these issues.

5 participants