Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove class-level caches from CheckpointFile #2810

Merged
merged 2 commits into from
Mar 10, 2023

Conversation

connorjward
Copy link
Contributor

Fixes #2809 (and hopefully fixes CI).

Is it appropriate to provide a test for something non-deterministic like this?

@connorjward
Copy link
Contributor Author

@ksagiyam this is failing the backwards compatibility tests due to mismatching SFs. Rather than expunging the caches as I do here should they be stored on the instance instead of the class? I find the code quite hard to follow.

@connorjward connorjward closed this Mar 9, 2023
@connorjward connorjward reopened this Mar 9, 2023
@ksagiyam
Copy link
Contributor

ksagiyam commented Mar 9, 2023

The backward_compat tests are failing I think because I relied on the fact that meshes are cached in those tests.
Here

mesh = afile.load_mesh(_generate_mesh_name(cell_type))
we end up creating multiple instances of the same mesh with the same name (say meshA and meshB, both named, e.g., "mesh_triangle"). As function spaces are cached based on "mesh_triangle" (and other names) on the CheckpointFile instance (afile), when we load a function space on meshB, which has already been loaded for meshA and cached, we end up grabbing the cached one. This causes conflict as meshA has pointSFA, the loaded function space inherits pointSFA, but meshB has poinSFB; though pointSFA and pointSFB are identical, they are different objects, which is what PETSc is complaining.

I think we can just use with CheckpointFile(..) as afile: as usual inside the for loop in the backward_compat tests.

@connorjward connorjward force-pushed the connorjward/remove-checkpoint-caches branch from 7d912a2 to daf2a65 Compare March 9, 2023 18:30
Copy link
Contributor

@ksagiyam ksagiyam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Connor!

@connorjward
Copy link
Contributor Author

Can we merge this? We are getting random fails in the link checker for the docs build which I'm certain is nothing to do with this PR.

@ksagiyam ksagiyam merged commit f55e968 into master Mar 10, 2023
@ksagiyam ksagiyam deleted the connorjward/remove-checkpoint-caches branch March 10, 2023 08:44
@ksagiyam
Copy link
Contributor

Unfortunately, the issue seems not resolved:

#2787
#2804

These branches have been rebased.

@connorjward
Copy link
Contributor Author

Blast.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CheckpointFile can hang (likely cause of CI failures)
2 participants