LAMMPS atom writer makes assigning unique IDs to chemically-identical molecules impossible #998

timbernat · 2024-07-01T21:01:18Z

Description

I am trying to use Interchange to generate LAMMPS input files for some large polymer systems, which consist of many repeated-and-translated copies of oligomers, plasticizers, and solvent water molecules. I've found that in every case, a unique molecule ID in the atoms table section of the resulting LAMMPS file is only generated for each distinct Molecule chemistry, rather than for each individual Molecule in the Topology.

Some further digging reveals that the culprit is a speed-up update to the LAMMPS valence writers, in particular the _write_atoms() function, which generates molecule IDs from a chemical table hash of each molecule, allegedly to bypass an extra toolkit wrapper conversion step. This molecule CTAB hash, in turn, is only based on basic chemical info of the atoms and bonds in the Molecule object (namely atom symbols, formal charges, stereo, and bond orders).

The consequence of this is that any distinct Molecules with identical chemical tables in the same Topology will unavoidably assigned the same molecule ID in the atoms table when writing to LAMMPS

Reproduction

This turns out to affect even really basic systems, such as a box filled with nothing but water.
To illustrate, one could take the waterbox.pdb file found here and execute the following code:

from pathlib import Path
from openff.toolkit import Topology, ForceField


waterbox_path = Path('waterbox.pdb')
watertop = Topology.from_pdb(waterbox_path)

ff = ForceField('tip3p.offxml')
winc = ff.create_interchange(watertop)
winc.to_lammps('water.lammps')

Output

All atoms in the resulting water.lammps file will be labelled with molecule ID 500 (the number of unique water molecules in the Topology), rather than unique IDs from 1-500.

Solution
The simplest solution from my point of view is to salt the molecule ID hash in _write_atoms() with the molecule ID to ensure distinct molecules with identical CTABs are hashed as such. This would look something like (interop/lammps/export/export.py, lines 296-300):

    for molecule_index, molecule in enumerate(interchange.topology.molecules):
        molecule_hash = hash(f'{molecule_index}_{molecule.ordered_connection_table_hash()}') # salt with mol ID
        molecule_map[molecule_hash] = molecule_index
        for atom in molecule.atoms:
            atom_molecule_map[atom] = molecule_hash

I've tested on my machine (via manual source code edits in a conda env) that this circumvents the non-uniqueness issue, and have put together a simple PR to do exactly this.
Software versions

OS: running on Linux Ubuntu 20.04.6 LTS
Installed OpenFF Toolkit (0.15.2) and Interchange (0.3.23) via mamba, though the offending portions of code are still present in the most recent versions on GitHub.

The text was updated successfully, but these errors were encountered:

PEFrankel · 2024-07-01T21:22:26Z

I was just working on this very issue and this seems to prove it's not isolated since it persists within interchange.to_gromacs as well. A broader solution may be necessary to fix this (i.e. somewhere in interop or just copied in each export).

j-wags · 2024-07-02T20:29:11Z

Thanks @timbernat for the great writeup and quick PR. Matt is on a much-needed vacation right now so I'll help polish up the PR but his final review won't come until next week :-)

mattwthompson · 2024-07-16T18:21:05Z

We're pretty sure this is fixed in #1000, which I expect to include in 0.3.28

timbernat mentioned this issue Jul 1, 2024

Fix to non-unique LAMMPS molecule IDs #999

Closed

3 tasks

mattwthompson added the lammps Relating to LAMMPS label Jul 16, 2024

mattwthompson closed this as completed Jul 16, 2024

mattwthompson mentioned this issue Jul 16, 2024

LAMMPS improvements trunk #1011

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LAMMPS atom writer makes assigning unique IDs to chemically-identical molecules impossible #998

LAMMPS atom writer makes assigning unique IDs to chemically-identical molecules impossible #998

timbernat commented Jul 1, 2024 •

edited

Loading

PEFrankel commented Jul 1, 2024 •

edited

Loading

j-wags commented Jul 2, 2024

mattwthompson commented Jul 16, 2024

LAMMPS atom writer makes assigning unique IDs to chemically-identical molecules impossible #998

LAMMPS atom writer makes assigning unique IDs to chemically-identical molecules impossible #998

Comments

timbernat commented Jul 1, 2024 • edited Loading

PEFrankel commented Jul 1, 2024 • edited Loading

j-wags commented Jul 2, 2024

mattwthompson commented Jul 16, 2024

timbernat commented Jul 1, 2024 •

edited

Loading

PEFrankel commented Jul 1, 2024 •

edited

Loading