Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Metal, Facet, site and terrace specification in Molecule #2452

Merged
merged 10 commits into from
Jun 9, 2023

Conversation

mjohnson541
Copy link
Contributor

@mjohnson541 mjohnson541 commented Jun 3, 2023

Also adds a "Reaction" bond order for creating bonds between atoms participating in a reaction so the structure can be considered as a whole without specific labels and enables sites to be bonded to each other.

This enables creation of molecules from adjacency lists such as:

    multiplicity 1
    metal Cu
    facet 111
    1 *3 X u0 p0 c0 s"atop" m"terrace"  {2,S} {4,S}
    2 *1 O u0 p2 c0 {1,S} {3,R}
    3 *2 H u0 p0 c0 {2,R} {4,R}
    4 *4 X u0 p0 c0 s"hcp" m"terrace" {3,R} {1,S}

@mjohnson541 mjohnson541 force-pushed the site_specific branch 2 times, most recently from 2a4c236 to a813ff0 Compare June 3, 2023 17:23
@codecov
Copy link

codecov bot commented Jun 3, 2023

Codecov Report

Merging #2452 (c66f580) into main (7234fb6) will increase coverage by 0.11%.
The diff coverage is 87.96%.

@@            Coverage Diff             @@
##             main    #2452      +/-   ##
==========================================
+ Coverage   48.15%   48.26%   +0.11%     
==========================================
  Files         110      110              
  Lines       30627    30726      +99     
  Branches     7989     8032      +43     
==========================================
+ Hits        14748    14830      +82     
- Misses      14350    14365      +15     
- Partials     1529     1531       +2     
Impacted Files Coverage Δ
rmgpy/molecule/adjlist.py 76.88% <87.37%> (+1.99%) ⬆️
rmgpy/molecule/fragment.py 73.61% <100.00%> (+0.10%) ⬆️

... and 2 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@mjohnson541 mjohnson541 requested review from rwest and hwpang June 3, 2023 20:02
Copy link
Member

@rwest rwest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some comments and edits.

My main concern is lack of unit tests and documentation.


if line.split()[0] == 'metal':
if group and "[" in line:
out = line.split('[')[1][:-1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have used a regex, and perhaps with more error handling. But my main concern is this sort of thing (splits, selecting by index, slicing) is a devil to review for correctness by reading the code (I got it wrong the first 4 times I drafted this comment), especially without any comments saying what its goal is. Can we have some unit tests? (codecov says it's not tested) and should it be documented?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, added regex checking and error handling. Added unittests and I've added it to the adjacency list documentation.

if line.split()[0] == 'metal':
if group and "[" in line:
out = line.split('[')[1][:-1]
metal = out.split(',')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean it must be [Cu,Sn,Ag] and not [Cu, Sn, Ag]?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed we have a check above in mistake1, but that only checks for spaces inside {} and not [] (despite the example in the comment on line 541)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now loop through the split and strip all of the items so any whitespace there should be fine.

rmgpy/molecule/adjlist.py Outdated Show resolved Hide resolved
rmgpy/molecule/adjlist.py Outdated Show resolved Hide resolved
rmgpy/molecule/adjlist.py Outdated Show resolved Hide resolved
rmgpy/molecule/group.py Outdated Show resolved Hide resolved
rmgpy/molecule/group.py Outdated Show resolved Hide resolved
rmgpy/molecule/group.py Outdated Show resolved Hide resolved
def __init__(self, element=None, radical_electrons=0, charge=0, label='', lone_pairs=-100, coords=np.array([]),
id=-1, props=None):
def __init__(self, element=None, radical_electrons=0, charge=0, label='', lone_pairs=-100, site='', morphology='',
coords=np.array([]), id=-1, props=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just noticed the coords=np.array([]). I know it's not being changed in this PR, but it did catch my eye. Isn't having a mutable (like an array) default argument in the def statement a classic code smell? (eg.). Do we have a good reason for it?
Update: I guess a numpy array can't be appended to, and if it's starting empty then it is therefore immutable after all. And by keeping it always an array type, (never a NoneType) we're perhaps better able to cythonize. So you can ignore this comment.

rmgpy/molecule/molecule.py Outdated Show resolved Hide resolved
"""
if wildcards:
for order in self.order:
if abs(order) <= 1e-9:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "1e-9"? Can you add explanation in the docstring or comments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was part of an existing bug that I copied over where the H bonds used to be 0 value, but then they were changed to 0.1 and check was still left against order instead of (order - 0.1). I've fixed that issue here now for H and R bonds.

@@ -873,7 +884,7 @@ def get_bond_string(self):
the atom labels in alphabetical order (i.e. 'C-H' is possible but not 'H-C')
:return: str
"""
bond_symbol_mapping = {0.1: '~', 1: '-', 1.5: ':', 2: '=', 3: '#'}
bond_symbol_mapping = {0.05: '~', 0.1: '~', 1: '-', 1.5: ':', 2: '=', 3: '#'}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does 0.1 stand for currently? Is it intended for 0.05 reaction bond to share the same symbol as the 0.1 bond?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.1 is an H bond, 0 is vdW bond, now 0.05 is R bond. Yes that's intentional.

@@ -524,11 +524,17 @@ def from_adjacency_list(adjlist, group=False, saturate_h=False, check_consistenc
multiplicity = int(line.split()[1])
if len(lines) == 0:
raise InvalidAdjacencyListError('No atoms specified in adjacency list: \n{0}'.format(adjlist))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you run an auto-formatter to remove all these unnecessary blanks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's inappropriate to run the auto-formatter on RMG files that are commonly modified, because I think that will create very large nasty merge conflicts if anyone modified the file and tries to rebase?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are tools like "darker" (that runs "black" only on lines of code that you modified). Maybe there are others. Anyway, I agree please don't run something wholesale across the whole file, especially during a topic pull request. But we can try to avoid making it worse :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides what @rwest mentioned, if you use VSCode, there are options where you can choose only run auto-formatter on the lines you modified as well!

# Iterate over the remaining lines, generating Atom or GroupAtom objects
for line in lines:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary blanks

@@ -249,8 +249,8 @@ def get_features(self):
benzene=[0], lone_pairs=[0])
# Occupied surface site:
ATOMTYPES['Xo'] = AtomType('Xo', generic=['X'], specific=[],
single=[0, 1], all_double=[0, 1], r_double=[], o_double=[], s_double=[], triple=[0, 1],
quadruple=[0, 1], benzene=[0], lone_pairs=[0])
single=[0, 1, 2, 3, 4, 5, 6, 7, 8], all_double=[0, 1, 2, 3, 4, 5, 6, 7, 8], r_double=[], o_double=[], s_double=[], triple=[0, 1, 2, 3, 4, 5, 6, 7, 8],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean? Do we allow up to 8 single bonds on surface site or something? Can you add comments to this change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this makes it possible for the surface site to have up to 8 bonds...the intention is that you can represent local surface structure (what sites neighbor other sites) which affects adsorbate energies and reaction rates on surfaces.

hwpang
hwpang previously requested changes Jun 6, 2023
Copy link
Contributor

@hwpang hwpang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some questions and comments.

@@ -772,7 +836,7 @@ def from_adjacency_list(adjlist, group=False, saturate_h=False):
Saturator.saturate(atoms)

# Consistency checks
if not group:
if not group and check_consistency:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why adding the option to skip consistency check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you allow consistency checking for a molecule with reaction bonds, it will get upset because it has bonds RMG doesn't expect it to have, it naturally won't match RMG atomtypes properly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might merge this PR anyway, but should we instead (or as well) make the consistency checkers cope with reaction bonds? Currently you could create an adjacancy list with reaction bonds AND other inconsistencies, and not notice the other inconsistencies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true, it would be good to at least consistency check parts of the Molecule that don't have reaction bonds.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll let you open a new issue (or mental note) to fix this :)

@hwpang
Copy link
Contributor

hwpang commented Jun 7, 2023

I assume the final goal is to enable RMG to generate mechanisms with metal, facet, site, and terrace specification? Can one use these specifications to generate a RMG model at this point, or is that intended for another PR? If the former, can you add a regression test?

rmgpy/molecule/molecule.py Outdated Show resolved Hide resolved
@mjohnson541
Copy link
Contributor Author

I assume the final goal is to enable RMG to generate mechanisms with metal, facet, site, and terrace specification? Can one use these specifications to generate a RMG model at this point, or is that intended for another PR? If the former, can you add a regression test?

Not right now on my end at least. My intention for now is to make it possible to generate good 2D representations for SIDT of surface configurations.

Copy link
Member

@rwest rwest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. (I mentioned a few minor questions)

rmgpy/molecule/adjlist.py Outdated Show resolved Hide resolved
rmgpy/molecule/group.py Show resolved Hide resolved
@mjohnson541
Copy link
Contributor Author

This passes all unit, functional and database tests locally and no regression changes are expected. Should be ready. If we're comfortable merging this without CI that would be appreciated, but is not absolutely necessary.

@rwest
Copy link
Member

rwest commented Jun 9, 2023

I restarted the CI and all the unit tests etc ran, but quite a few of the regression tests have different core and edges -- is this to be expected by random fluctuations? what should we do? tests we are in the habit of ignoring are not helpful.

  • aromatics Failed Edge Comparison
  • liquid_oxidation Failed Core Comparison
  • liquid_oxidation Failed Edge Comparison
  • sulfur Failed Edge Comparison
  • RMS_CSTR_liquid_oxidation Failed Edge Comparison

See
https://github.com/ReactionMechanismGenerator/RMG-Py/actions/runs/5216023343?pr=2452

(Unfortunately there seems to be a bug causing a crash when comparing RMS_liquidSurface_ch4o2cat
see #2446 (comment) )

@mjohnson541
Copy link
Contributor Author

It looks like it doesn't print the differences anymore so we can't tell what's changed?

@mjohnson541
Copy link
Contributor Author

The only way I can currently imagine this changing anything would be if it broke isomorphism comparison, but that would change nearly every rate coefficient from families, which should break all or nearly all regression tests on both core and edge.

@rwest
Copy link
Member

rwest commented Jun 9, 2023

Comparing the model edge for the liquid_oxidation

Screenshot 2023-06-09 at 2 13 34 PM
Non-identical kinetics!
original:
rxn: CCCCCO[O](104) + CC(CC(C)OO)O[O](103) <=> oxygen(1) + CCCCC[O](128) + CC([O])CC(C)OO(127)		origin: Peroxyl_Disproportionation
tested:
rxn: CCCCCO[O](104) + CC(CC(C)OO)O[O](103) <=> oxygen(1) + CCCCC[O](127) + CC([O])CC(C)OO(129)		origin: Peroxyl_Disproportionation
k(1bar)|300K   |400K   |500K   |600K   |800K   |1000K  |1500K  |2000K  

k(T):  |   3.52|   4.27|   4.71|   5.01|   5.39|   5.61|   5.91|   6.06
k(T):  |   7.79|   7.46|   7.21|   7.00|   6.67|   6.41|   5.94|   5.60

Kinetics: Arrhenius(A=(3.2e+12,'cm^3/(mol*s)'), n=0, Ea=(4.096,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Root_Ext-5R-R_7R!H->C_N-7C-inRing_Ext-5R-R""")
Kinetics: Arrhenius(A=(3.18266e+20,'cm^3/(mol*s)'), n=-2.694, Ea=(0.053,'kcal/mol'), T0=(1,'K'), comment="""Estimated from node Root_Ext-5R-R_7R!H->C_N-7C-inRing Ea raised from 0.0 to 0.2 kJ/mol to match endothermicity of reaction.""")
kinetics: Estimated from node Root_Ext-5R-R_7R!H->C_N-7C-inRing_Ext-5R-R
kinetics: Estimated from node Root_Ext-5R-R_7R!H->C_N-7C-inRing Ea raised from 0.0 to 0.2 kJ/mol to match endothermicity of reaction.

Perhaps an instance of something like #2010 ?

This may be related #1027 but is 6 years old and quite likely outdated.

We need to decide what counts as a regression worth failing the tests for!

@mjohnson541
Copy link
Contributor Author

Those two nodes are parent and child and look like this:
image
image
I'm wondering a little bit if this is something related to how the atoms are labeled in that family...in R_Recombination we handle the fact that the radicals aren't unique by using the same label for both radicals...but here the -O[O]'s are labeled differently so I could imagine the prediction is dependent on where the isomorphism (that will match both ways) puts the labels. (I.E. if it puts *1 and *2 on the branched species and *3 and *4 on the other it will match the less specific group while if it puts *1 and *2 on the non-branched species it will match the more specific group.

@mjohnson541
Copy link
Contributor Author

Regardless I don't think this is related to this PR.

@rwest rwest merged commit e7261a8 into main Jun 9, 2023
@rwest rwest deleted the site_specific branch June 9, 2023 18:39
@rwest
Copy link
Member

rwest commented Jun 9, 2023

Merged, since I think all Hao-Wei's comments have been addressed, as well as mine.

@rwest
Copy link
Member

rwest commented Jun 22, 2023

The followup points mentioned when I closed this now have their own issues:
#2465 enabling consistency checks
#2490 random kinetics

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants