You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to use Interchange to generate LAMMPS input files for some large polymer systems, which consist of many repeated-and-translated copies of oligomers, plasticizers, and solvent water molecules. I've found that in every case, a unique molecule ID in the atoms table section of the resulting LAMMPS file is only generated for each distinct Molecule chemistry, rather than for each individual Molecule in the Topology.
Some further digging reveals that the culprit is a speed-up update to the LAMMPS valence writers, in particular the _write_atoms() function, which generates molecule IDs from a chemical table hash of each molecule, allegedly to bypass an extra toolkit wrapper conversion step. This molecule CTAB hash, in turn, is only based on basic chemical info of the atoms and bonds in the Molecule object (namely atom symbols, formal charges, stereo, and bond orders).
The consequence of this is that any distinct Molecules with identical chemical tables in the same Topology will unavoidably assigned the same molecule ID in the atoms table when writing to LAMMPS
Reproduction
This turns out to affect even really basic systems, such as a box filled with nothing but water.
To illustrate, one could take the waterbox.pdb file found here and execute the following code:
All atoms in the resulting water.lammps file will be labelled with molecule ID 500 (the number of unique water molecules in the Topology), rather than unique IDs from 1-500.
Solution
The simplest solution from my point of view is to salt the molecule ID hash in _write_atoms() with the molecule ID to ensure distinct molecules with identical CTABs are hashed as such. This would look something like (interop/lammps/export/export.py, lines 296-300):
for molecule_index, molecule in enumerate(interchange.topology.molecules):
molecule_hash = hash(f'{molecule_index}_{molecule.ordered_connection_table_hash()}') # salt with mol ID
molecule_map[molecule_hash] = molecule_index
for atom in molecule.atoms:
atom_molecule_map[atom] = molecule_hash
I've tested on my machine (via manual source code edits in a conda env) that this circumvents the non-uniqueness issue, and have put together a simple PR to do exactly this. Software versions
OS: running on Linux Ubuntu 20.04.6 LTS
Installed OpenFF Toolkit (0.15.2) and Interchange (0.3.23) via mamba, though the offending portions of code are still present in the most recent versions on GitHub.
The text was updated successfully, but these errors were encountered:
I was just working on this very issue and this seems to prove it's not isolated since it persists within interchange.to_gromacs as well. A broader solution may be necessary to fix this (i.e. somewhere in interop or just copied in each export).
Thanks @timbernat for the great writeup and quick PR. Matt is on a much-needed vacation right now so I'll help polish up the PR but his final review won't come until next week :-)
Description
I am trying to use Interchange to generate LAMMPS input files for some large polymer systems, which consist of many repeated-and-translated copies of oligomers, plasticizers, and solvent water molecules. I've found that in every case, a unique molecule ID in the atoms table section of the resulting LAMMPS file is only generated for each distinct Molecule chemistry, rather than for each individual Molecule in the Topology.
Some further digging reveals that the culprit is a speed-up update to the LAMMPS valence writers, in particular the
_write_atoms()
function, which generates molecule IDs from a chemical table hash of each molecule, allegedly to bypass an extra toolkit wrapper conversion step. This molecule CTAB hash, in turn, is only based on basic chemical info of the atoms and bonds in the Molecule object (namely atom symbols, formal charges, stereo, and bond orders).The consequence of this is that any distinct Molecules with identical chemical tables in the same Topology will unavoidably assigned the same molecule ID in the atoms table when writing to LAMMPS
Reproduction
This turns out to affect even really basic systems, such as a box filled with nothing but water.
To illustrate, one could take the waterbox.pdb file found here and execute the following code:
Output
All atoms in the resulting
water.lammps
file will be labelled with molecule ID 500 (the number of unique water molecules in the Topology), rather than unique IDs from 1-500.Solution
The simplest solution from my point of view is to salt the molecule ID hash in
_write_atoms()
with the molecule ID to ensure distinct molecules with identical CTABs are hashed as such. This would look something like (interop/lammps/export/export.py, lines 296-300):I've tested on my machine (via manual source code edits in a conda env) that this circumvents the non-uniqueness issue, and have put together a simple PR to do exactly this.
Software versions
The text was updated successfully, but these errors were encountered: