Skip to content

Next Generation RMG Ideas

Mark Goldman edited this page Jul 16, 2019 · 6 revisions

Introduction

The purpose of this page is to document ideas for the next generation of RMG software. These include ideas that could potentially be implemented into the current version of RMG as well as innovative ideas that would be hard to implement without a complete rewrite. The ideas on this page could be useful for new students trying to figure out how they can improve RMG or the next jwallen who wants to rewrite RMG from scratch.

Ideas

Input/Output

Custom RMG mechanism format

Design a custom file format for inputting/outputting mechanisms, in replacement of reaction/thermo libraries, Chemkin files, and Cantera files. The requirements and goals of such a new format would include the following:

  • Move away from using Python syntax and loading via an exec type command.
  • Maintain reasonable human readability
  • Include RMG generated metadata in a way that can be easily read by RMG
  • Fulfill the following roles:
    • Reaction and thermo libraries
    • Seed mechanisms
    • Restart files
    • Input for RMG based post-processing including
      • Uncertainty/sensitivity analysis
      • Mechanism reduction
      • Conversion to other file formats such as Chemkin or Cantera

Robust metadata storage would be a key feature of the new format, which is extremely fragile in all of the current formats which rely on comment writing and parsing.

Database organization

thermo

transport

frequency estimation

kinetics estimation methods

Kinetics is currently estimated by using rate rules, and by converting training reactions into rate rules. Ideally, we would want to be able to change how the estimation is conducted without losing the kinetics data. This would mean storing the kinetics data in a format which is semi-separate from the estimation technique, but can be used to derive the estimation parameters. Rate rules store specific groups which must exist in the current tree structure, which prevents switching to better estimation methods since the data is tied to the method. Training reactions are better, but involve labeling specific atoms based on the reaction recipe (which may also change).

Order of reaction generation and kinetics estimation

Currently in RMG, reaction rates are estimated after processNewReactions, which was designed to prevent generating kinetics of reactions already generated by RMG. This structure might not be ideal since generating kinetics is not time intensive and the percentage of reactions removed between generation and kinetics estimation might not be that large. If these two assumptions are true (which have not been checked), the code should estimate kinetics during or right after reaction generation.

There is also an idea that reactions that are minor branching ratios could be removed after the reaction generation step. This benefit would be made significantly easier if the reaction generation occured right after kinetics generation.

Model enlargement

Ideally, different methods to enlarge models should be modular in design. For example, a change to how a flux-based algorithm is conducted should not cause an error if the user wants to use a different generation method (like obtaining all first and second generation products). This would require a well-thought out inheritance structure for model growth.

Currently there is one class for model enlargement, one class for solver, and these are closely tied in with the main algorithm in the RMG object.

Reaction generation

Thermo estimation

Kinetics estimation

RMG has an option for pressure dependent kinetics estimation. This has not been codified into a paper, creates undeterminable kinetics errors, and has a remarkably different flow than typical reaction generation. Bill Green mentioned debugging and writing a paper on it might be useful. Another option to investigate if whether converting a high-pressure limit model to a pdep model after generation has similar accuracy (with less headache). This still needs to be investigated.

Chemical structure representation

Semi-implicit electrons

Electron tracking is currently an important part of representation via resonance structures recipes for reaction generation, but it also creates unnecessary difficulties. This idea would try to make RMG behave more like quantum chemistry software, in that electrons are not explicitly part of the molecular structure. The benefit of this would be reducing each chemical species to a single, unique representation, and the challenge would be to teach RMG more about how electrons work.

Potential implementation:

  • Eliminate bond orders. Molecular graph would only indicate basic connectivity of atoms.
  • Introduce more detailed atom electron attributes, which would indicate electrons in pi bonds, electrons not in bonds (radicals), and lone pairs.
  • For reaction recipes, bond formation would entail decrementing one of the electron attributes, followed by updating the electron status of the other atoms in the molecule if necessary.
  • Aromaticity would then be handled as a separate flag for both atoms and bonds, based on whatever perception algorithm is used.
  • This representation could be converted to a standard representation by perceiving bonds based on the number of bonding electrons in adjacent atoms.

Parallelization

Currently RMG has a parallelization function, but it actually slows down the process. This is due most likely to transferring memory between CPUs. Getting rid of parallization or implementing it in a way that shares memory without transmitting the memory ('global memory') are options to improve this method.

Class restructuring

This is where we discuss potential caveats in how each class structure could lead to issues, and ways to improve them.

Conformer.pyx

This class has attributes number and mass, which are arrays and just a scalar multiple of each other. Storing these two separately might cause unexpected issues when updated one of them, but having the other be an older version.

Clone this wiki locally