Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize conformer comparison algorithm #765

Merged
merged 3 commits into from
Sep 2, 2024
Merged

Optimize conformer comparison algorithm #765

merged 3 commits into from
Sep 2, 2024

Conversation

JintaoWu98
Copy link
Member

The previous conformer comparison algorithm required a distance matrix (d-matrix) for operations. In this update, we introduce a new variable, fl_distance, which represents the distance between the first and last elements. This method provides a less costly preliminary comparison before invoking the d-matrix comparison. Now, the d-matrix comparison will only be executed when necessary, optimizing the overall efficiency of the algorithm.

Copy link

codecov bot commented Aug 20, 2024

Codecov Report

Attention: Patch coverage is 96.96970% with 1 line in your changes missing coverage. Please review.

Project coverage is 73.98%. Comparing base (8ba45be) to head (693b2f3).
Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
arc/species/conformers.py 92.85% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #765      +/-   ##
==========================================
+ Coverage   73.95%   73.98%   +0.02%     
==========================================
  Files         101      101              
  Lines       27777    27804      +27     
  Branches     5817     5825       +8     
==========================================
+ Hits        20542    20570      +28     
+ Misses       5774     5773       -1     
  Partials     1461     1461              
Flag Coverage Δ
unittests 73.98% <96.96%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@alongd alongd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I added my initial review.

fl_distance1 = np.linalg.norm(np.array(xyz1['coords'][0]) - np.array(xyz1['coords'][-1]))
if fl_distance2 is None:
fl_distance2 = np.linalg.norm(np.array(xyz2['coords'][0]) - np.array(xyz2['coords'][-1]))
if abs(fl_distance1-fl_distance2)/fl_distance1>rtol:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a space before and after operators

fl_distance2 = np.linalg.norm(np.array(xyz2['coords'][0]) - np.array(xyz2['coords'][-1]))
if abs(fl_distance1-fl_distance2)/fl_distance1>rtol:
return fl_distance1, fl_distance2, dmat1, dmat2
if dmat1 is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at this point we know for certain that dmat1 is None

exists = True
dmat1, fl_distance1 = None, None
for conf in new_conformers + newest_conformer_list:
conf['dmat'] = conf.get('dmat')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain why this is needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think otherwise we may get into a case where conf dictionary doesn't have dmat at the beginning.

fl_distance2=conf['fl_distance'])
conf['fl_distance'] = fl_distance2 if conf['fl_distance'] is None else conf['fl_distance']
conf['dmat'] = dmat2 if conf['dmat'] is None else conf['dmat']
if dmat1 is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that if dmat1 is None, this implies that the fl_distance was too large. But this code is not self-explanatory, therefore hard to maintain and understand.
I recommend to pass the two conformer dictionaries to compare_confs_fl, then have compare_confs_fl return three things: whether the conformers are similar or not and the two updated conformer dictionaries. The we update our confs with the returned dictionaries

Copy link
Member Author

@JintaoWu98 JintaoWu98 Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I agree. I modified compare_confs_fl with your advice. However, it's a bit inconvenient to pass both conf dicts under conformers_combinations_by_lowest_conformer, since we are looping xyz, as shown here.
However, if you would like me to modify it further, please let me know.

@JintaoWu98 JintaoWu98 force-pushed the compare_confs_opt branch 4 times, most recently from 8a7b5b0 to 89ce5f5 Compare August 21, 2024 10:48
Copy link
Member

@alongd alongd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Please see some comments

fl_distance1 = np.linalg.norm(np.array(xyz1['coords'][0]) - np.array(xyz1['coords'][-1]))
if conf2['fl_distance'] is None:
conf2['fl_distance'] = np.linalg.norm(np.array(xyz2['coords'][0]) - np.array(xyz2['coords'][-1]))
if abs(fl_distance1-conf2['fl_distance']) / fl_distance1 > rtol:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use np.isclose here

if abs(fl_distance1-conf2['fl_distance']) / fl_distance1 > rtol:
return fl_distance1, dmat1, conf2, simliar
simliar = True
dmat1 = xyz_to_dmat(xyz1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we compute dmat1 in this function?
This function also doesn't use conf2['dmat'], maybe we can simplify it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of this function compare_confs_fl, is as following:

  1. To compare two Cartesian coordinates representing conformers by examining the distances between the first and last atoms.
  2. Distance matrices are computed and returned only if these distances are identical.

Considering potential modifications to the second purpose of this process, here are three options:

  1. Rename the function to compare_confs_pre to indicate that it performs a preliminary, quicker comparison before the more resource-intensive compare_confs calculation.
  2. Create a new function to do this calculation.
  3. Merge this part to compare_confs, though feasible, I do not recommend this due to the complexity it would add, especially in returning the variable conf2, which could complicate the primary function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

if xyz is not None and energy is not None:
conformer = {'index': len_conformers + len(new_conformers) + len(newest_conformer_list),
'xyz': xyz,
'FF energy': round(energy, 3),
'source': f'Changing dihedrals on most stable conformer, iteration {i}',
'torsion': tor,
'dihedral': round(dihedral, 2)}
'dihedral': round(dihedral, 2),
'dmat': dmat1 if dmat1 is not None else None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If dmat1 either has a value or can be None, and you want its value or else None, then considering just keeping 'dmat': dmat1 instead of the entire line. Same for the line below

if index > 0 and not any([converter.compare_confs(lowest_conf['xyz'], conformer_list[index]['xyz'])
for lowest_conf in lowest_confs]):
lowest_confs.append(conformer_list[index])
if index>0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a space before and after operators

@JintaoWu98 JintaoWu98 force-pushed the compare_confs_opt branch 4 times, most recently from 4f2177a to 847440f Compare September 1, 2024 06:47
Compare two Cartesian coordinates representing conformers using first and last atom distances.
If the `fl_distances` are the similar, the distance matrices are computed and returned.
If we don't need the conversion, meaning `dmat1` and `dmat2` are both given, we can start the subsequent steps directly.
Save `fl_distance` and `dmat` to each of the `conformer` for efficient processing.
Copy link
Member

@alongd alongd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Thanks!

@alongd alongd merged commit 9e5c210 into main Sep 2, 2024
7 checks passed
@alongd alongd deleted the compare_confs_opt branch September 2, 2024 06:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants