Enable Metal, Facet, site and terrace specification in Molecule #2452

mjohnson541 · 2023-06-03T00:51:30Z

Also adds a "Reaction" bond order for creating bonds between atoms participating in a reaction so the structure can be considered as a whole without specific labels and enables sites to be bonded to each other.

This enables creation of molecules from adjacency lists such as:

    multiplicity 1
    metal Cu
    facet 111
    1 *3 X u0 p0 c0 s"atop" m"terrace"  {2,S} {4,S}
    2 *1 O u0 p2 c0 {1,S} {3,R}
    3 *2 H u0 p0 c0 {2,R} {4,R}
    4 *4 X u0 p0 c0 s"hcp" m"terrace" {3,R} {1,S}

codecov · 2023-06-03T19:32:56Z

Codecov Report

Merging #2452 (c66f580) into main (7234fb6) will increase coverage by 0.11%.
The diff coverage is 87.96%.

@@            Coverage Diff             @@
##             main    #2452      +/-   ##
==========================================
+ Coverage   48.15%   48.26%   +0.11%     
==========================================
  Files         110      110              
  Lines       30627    30726      +99     
  Branches     7989     8032      +43     
==========================================
+ Hits        14748    14830      +82     
- Misses      14350    14365      +15     
- Partials     1529     1531       +2

Impacted Files	Coverage Δ
rmgpy/molecule/adjlist.py	`76.88% <87.37%> (+1.99%)`	⬆️
rmgpy/molecule/fragment.py	`73.61% <100.00%> (+0.10%)`	⬆️

... and 2 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

rwest

I made some comments and edits.

My main concern is lack of unit tests and documentation.

rwest · 2023-06-03T22:59:11Z

rmgpy/molecule/adjlist.py

+
+        if line.split()[0] == 'metal':
+            if group and "[" in line:
+                out = line.split('[')[1][:-1]


I would have used a regex, and perhaps with more error handling. But my main concern is this sort of thing (splits, selecting by index, slicing) is a devil to review for correctness by reading the code (I got it wrong the first 4 times I drafted this comment), especially without any comments saying what its goal is. Can we have some unit tests? (codecov says it's not tested) and should it be documented?

Okay, added regex checking and error handling. Added unittests and I've added it to the adjacency list documentation.

rwest · 2023-06-05T14:36:49Z

rmgpy/molecule/adjlist.py

+        if line.split()[0] == 'metal':
+            if group and "[" in line:
+                out = line.split('[')[1][:-1]
+                metal = out.split(',')


Does this mean it must be [Cu,Sn,Ag] and not [Cu, Sn, Ag]?

Just noticed we have a check above in mistake1, but that only checks for spaces inside {} and not [] (despite the example in the comment on line 541)

We now loop through the split and strip all of the items so any whitespace there should be fine.

rmgpy/molecule/adjlist.py

rmgpy/molecule/group.py

rwest · 2023-06-05T15:18:41Z

rmgpy/molecule/molecule.py

-    def __init__(self, element=None, radical_electrons=0, charge=0, label='', lone_pairs=-100, coords=np.array([]),
-                 id=-1, props=None):
+    def __init__(self, element=None, radical_electrons=0, charge=0, label='', lone_pairs=-100, site='', morphology='', 
+                 coords=np.array([]), id=-1, props=None):


just noticed the coords=np.array([]). I know it's not being changed in this PR, but it did catch my eye. Isn't having a mutable (like an array) default argument in the def statement a classic code smell? (eg.). Do we have a good reason for it?
Update: I guess a numpy array can't be appended to, and if it's starting empty then it is therefore immutable after all. And by keeping it always an array type, (never a NoneType) we're perhaps better able to cythonize. So you can ignore this comment.

rmgpy/molecule/molecule.py

hwpang · 2023-06-06T19:00:12Z

rmgpy/molecule/group.py

+        """
+        if wildcards:
+            for order in self.order:
+                if abs(order) <= 1e-9:


Why "1e-9"? Can you add explanation in the docstring or comments?

I think this was part of an existing bug that I copied over where the H bonds used to be 0 value, but then they were changed to 0.1 and check was still left against order instead of (order - 0.1). I've fixed that issue here now for H and R bonds.

hwpang · 2023-06-06T19:01:13Z

rmgpy/molecule/molecule.py

@@ -873,7 +884,7 @@ def get_bond_string(self):
        the atom labels in alphabetical order (i.e. 'C-H' is possible but not 'H-C')
        :return: str
        """
-        bond_symbol_mapping = {0.1: '~', 1: '-', 1.5: ':', 2: '=', 3: '#'}
+        bond_symbol_mapping = {0.05: '~', 0.1: '~', 1: '-', 1.5: ':', 2: '=', 3: '#'}


What does 0.1 stand for currently? Is it intended for 0.05 reaction bond to share the same symbol as the 0.1 bond?

0.1 is an H bond, 0 is vdW bond, now 0.05 is R bond. Yes that's intentional.

hwpang · 2023-06-06T19:01:56Z

rmgpy/molecule/adjlist.py

@@ -524,11 +524,17 @@ def from_adjacency_list(adjlist, group=False, saturate_h=False, check_consistenc
            multiplicity = int(line.split()[1])
        if len(lines) == 0:
            raise InvalidAdjacencyListError('No atoms specified in adjacency list: \n{0}'.format(adjlist))
-
+        


Can you run an auto-formatter to remove all these unnecessary blanks?

I think it's inappropriate to run the auto-formatter on RMG files that are commonly modified, because I think that will create very large nasty merge conflicts if anyone modified the file and tries to rebase?

There are tools like "darker" (that runs "black" only on lines of code that you modified). Maybe there are others. Anyway, I agree please don't run something wholesale across the whole file, especially during a topic pull request. But we can try to avoid making it worse :)

Besides what @rwest mentioned, if you use VSCode, there are options where you can choose only run auto-formatter on the lines you modified as well!

hwpang · 2023-06-06T19:02:04Z

rmgpy/molecule/adjlist.py

    # Iterate over the remaining lines, generating Atom or GroupAtom objects
    for line in lines:
-
+            


unnecessary blanks

hwpang · 2023-06-06T19:09:54Z

rmgpy/molecule/atomtype.py

@@ -249,8 +249,8 @@ def get_features(self):
                             benzene=[0], lone_pairs=[0])
 # Occupied surface site:
 ATOMTYPES['Xo']   = AtomType('Xo', generic=['X'], specific=[],
-                             single=[0, 1], all_double=[0, 1], r_double=[], o_double=[], s_double=[], triple=[0, 1],
-                             quadruple=[0, 1], benzene=[0], lone_pairs=[0])
+                             single=[0, 1, 2, 3, 4, 5, 6, 7, 8], all_double=[0, 1, 2, 3, 4, 5, 6, 7, 8], r_double=[], o_double=[], s_double=[], triple=[0, 1, 2, 3, 4, 5, 6, 7, 8],


What does this mean? Do we allow up to 8 single bonds on surface site or something? Can you add comments to this change

Yes, this makes it possible for the surface site to have up to 8 bonds...the intention is that you can represent local surface structure (what sites neighbor other sites) which affects adsorbate energies and reaction rates on surfaces.

hwpang

Left some questions and comments.

hwpang · 2023-06-06T19:13:21Z

rmgpy/molecule/adjlist.py

@@ -772,7 +836,7 @@ def from_adjacency_list(adjlist, group=False, saturate_h=False):
            Saturator.saturate(atoms)

    # Consistency checks
-    if not group:
+    if not group and check_consistency:


Why adding the option to skip consistency check?

If you allow consistency checking for a molecule with reaction bonds, it will get upset because it has bonds RMG doesn't expect it to have, it naturally won't match RMG atomtypes properly.

I might merge this PR anyway, but should we instead (or as well) make the consistency checkers cope with reaction bonds? Currently you could create an adjacancy list with reaction bonds AND other inconsistencies, and not notice the other inconsistencies.

This is true, it would be good to at least consistency check parts of the Molecule that don't have reaction bonds.

I'll let you open a new issue (or mental note) to fix this :)

hwpang · 2023-06-07T14:28:28Z

I assume the final goal is to enable RMG to generate mechanisms with metal, facet, site, and terrace specification? Can one use these specifications to generate a RMG model at this point, or is that intended for another PR? If the former, can you add a regression test?

rmgpy/molecule/molecule.py

mjohnson541 · 2023-06-08T00:07:36Z

I assume the final goal is to enable RMG to generate mechanisms with metal, facet, site, and terrace specification? Can one use these specifications to generate a RMG model at this point, or is that intended for another PR? If the former, can you add a regression test?

Not right now on my end at least. My intention for now is to make it possible to generate good 2D representations for SIDT of surface configurations.

rwest

Looking good. (I mentioned a few minor questions)

rmgpy/molecule/adjlist.py

rmgpy/molecule/group.py

…up objects consistency check

reaction bonds

mjohnson541 · 2023-06-08T23:22:33Z

This passes all unit, functional and database tests locally and no regression changes are expected. Should be ready. If we're comfortable merging this without CI that would be appreciated, but is not absolutely necessary.

rwest · 2023-06-09T13:30:27Z

I restarted the CI and all the unit tests etc ran, but quite a few of the regression tests have different core and edges -- is this to be expected by random fluctuations? what should we do? tests we are in the habit of ignoring are not helpful.

aromatics Failed Edge Comparison
liquid_oxidation Failed Core Comparison
liquid_oxidation Failed Edge Comparison
sulfur Failed Edge Comparison
RMS_CSTR_liquid_oxidation Failed Edge Comparison

See
https://github.com/ReactionMechanismGenerator/RMG-Py/actions/runs/5216023343?pr=2452

(Unfortunately there seems to be a bug causing a crash when comparing RMS_liquidSurface_ch4o2cat
see #2446 (comment) )

mjohnson541 · 2023-06-09T16:49:01Z

It looks like it doesn't print the differences anymore so we can't tell what's changed?

mjohnson541 · 2023-06-09T16:59:42Z

The only way I can currently imagine this changing anything would be if it broke isomorphism comparison, but that would change nearly every rate coefficient from families, which should break all or nearly all regression tests on both core and edge.

rwest · 2023-06-09T18:20:38Z

Comparing the model edge for the liquid_oxidation

Non-identical kinetics!
original:
rxn: CCCCCO[O](104) + CC(CC(C)OO)O[O](103) <=> oxygen(1) + CCCCC[O](128) + CC([O])CC(C)OO(127)		origin: Peroxyl_Disproportionation
tested:
rxn: CCCCCO[O](104) + CC(CC(C)OO)O[O](103) <=> oxygen(1) + CCCCC[O](127) + CC([O])CC(C)OO(129)		origin: Peroxyl_Disproportionation
k(1bar)|300K   |400K   |500K   |600K   |800K   |1000K  |1500K  |2000K  

k(T):  |   3.52|   4.27|   4.71|   5.01|   5.39|   5.61|   5.91|   6.06
k(T):  |   7.79|   7.46|   7.21|   7.00|   6.67|   6.41|   5.94|   5.60

Kinetics: Arrhenius(A=(3.2e+12,'cm^3/(mol*s)'), n=0, Ea=(4.096,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Root_Ext-5R-R_7R!H->C_N-7C-inRing_Ext-5R-R""")
Kinetics: Arrhenius(A=(3.18266e+20,'cm^3/(mol*s)'), n=-2.694, Ea=(0.053,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Root_Ext-5R-R_7R!H->C_N-7C-inRing Ea raised from 0.0 to 0.2 kJ/mol to match endothermicity of reaction.""")
kinetics: Estimated from node Root_Ext-5R-R_7R!H->C_N-7C-inRing_Ext-5R-R
kinetics: Estimated from node Root_Ext-5R-R_7R!H->C_N-7C-inRing Ea raised from 0.0 to 0.2 kJ/mol to match endothermicity of reaction.

Perhaps an instance of something like #2010 ?

This may be related #1027 but is 6 years old and quite likely outdated.

We need to decide what counts as a regression worth failing the tests for!

mjohnson541 · 2023-06-09T18:36:21Z

Those two nodes are parent and child and look like this:

I'm wondering a little bit if this is something related to how the atoms are labeled in that family...in R_Recombination we handle the fact that the radicals aren't unique by using the same label for both radicals...but here the -O[O]'s are labeled differently so I could imagine the prediction is dependent on where the isomorphism (that will match both ways) puts the labels. (I.E. if it puts *1 and *2 on the branched species and *3 and *4 on the other it will match the less specific group while if it puts *1 and *2 on the non-branched species it will match the more specific group.

stale

mjohnson541 · 2023-06-09T18:39:28Z

Regardless I don't think this is related to this PR.

rwest · 2023-06-09T18:40:06Z

Merged, since I think all Hao-Wei's comments have been addressed, as well as mine.

We might revisit the skipping of consistency checks when creating things from adjacency lists.
see Enable Metal, Facet, site and terrace specification in Molecule #2452 (comment)
The regression test failures seem to be unrelated and random non-deterministic errors,
that we should look into and address as a separate issue

rwest · 2023-06-22T17:40:11Z

The followup points mentioned when I closed this now have their own issues:
#2465 enabling consistency checks
#2490 random kinetics

mjohnson541 force-pushed the site_specific branch 2 times, most recently from 2a4c236 to a813ff0 Compare June 3, 2023 17:23

mjohnson541 requested review from rwest and hwpang June 3, 2023 20:02

rwest requested changes Jun 5, 2023

View reviewed changes

hwpang reviewed Jun 6, 2023

View reviewed changes

hwpang previously requested changes Jun 6, 2023

View reviewed changes

hwpang reviewed Jun 7, 2023

View reviewed changes

rmgpy/molecule/molecule.py Outdated Show resolved Hide resolved

mjohnson541 force-pushed the site_specific branch from a813ff0 to d52a2b6 Compare June 8, 2023 01:27

rwest reviewed Jun 8, 2023

View reviewed changes

rmgpy/molecule/adjlist.py Outdated Show resolved Hide resolved

rmgpy/molecule/group.py Show resolved Hide resolved

mjohnson541 force-pushed the site_specific branch from d52a2b6 to b43d423 Compare June 8, 2023 20:15

mjohnson541 added 10 commits June 8, 2023 14:48

allow skipping of consistency checks when generating Molecule and Gro…

8e09538

…up objects consistency check

add reaction 'R' bonds

3e2c2cc

reaction bonds

add metal and facet attributes to Molecule and Group

05709b3

add site and morphology attributes to Group and Molecule

75ad284

allow read and write of adjlists with site and morphology

c5d2138

add site and morphology to fragment to make compatible with adjlist

db32c59

enable sites to be bonded to each other to represent neighbors

9825706

set metal and facet for fragment

b9e1eb7

add test for metal, facet, site and morphology specifications

c252dc3

add metal, facet, site and morphology to adjacency list documentation

c66f580

mjohnson541 force-pushed the site_specific branch from b43d423 to c66f580 Compare June 8, 2023 21:48

rwest approved these changes Jun 9, 2023

View reviewed changes

rwest merged commit e7261a8 into main Jun 9, 2023

rwest deleted the site_specific branch June 9, 2023 18:39

rwest mentioned this pull request Jun 22, 2023

Consistent Non-deterministic Behavior in Thermo (polycylic) and Kinetics (Peroxyl_Disproportionation) #2490

Open

		# Iterate over the remaining lines, generating Atom or GroupAtom objects
		for line in lines:

Enable Metal, Facet, site and terrace specification in Molecule #2452

Enable Metal, Facet, site and terrace specification in Molecule #2452

Conversation

mjohnson541 commented Jun 3, 2023 • edited Loading

codecov bot commented Jun 3, 2023 • edited Loading

Codecov Report

rwest left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hwpang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hwpang commented Jun 7, 2023

mjohnson541 commented Jun 8, 2023

rwest left a comment

Choose a reason for hiding this comment

mjohnson541 commented Jun 8, 2023

rwest commented Jun 9, 2023

mjohnson541 commented Jun 9, 2023

mjohnson541 commented Jun 9, 2023

rwest commented Jun 9, 2023

mjohnson541 commented Jun 9, 2023

mjohnson541 commented Jun 9, 2023

rwest commented Jun 9, 2023

rwest commented Jun 22, 2023

mjohnson541 commented Jun 3, 2023 •

edited

Loading

codecov bot commented Jun 3, 2023 •

edited

Loading