Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SampleSupervisor #101

Open
wants to merge 99 commits into
base: v3.2
Choose a base branch
from
Open

SampleSupervisor #101

wants to merge 99 commits into from

Conversation

rabii-chaarani
Copy link
Member

Description

  • This is the new SampleSupervisor class, it handles all storage and processing operations within Project() and outside of it. The class allow simple calls to get samples for any calculations.
  • Various tests are added in this PR for SampleSupervisor, the class passes all tests and works fine, if you have any other test cases to propose, please let me know.
  • It is expected that as we are adding more features over time, I will rewrite the SampleSupervisor in the future to use graphs to process the necessary computations in the correct order.

Fixes #(issue)

Type of change

  • Documentation update
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Test improvement

How Has This Been Tested?

All tests performed are included in tests/sample_storage/test_sample_storage.py

Checklist:

  • This branch is up-to-date with master
  • All gh-action checks are passing
  • I have performed a self-review of my own code
  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • My tests run with pytest from the map2loop folder
  • New and existing tests pass locally with my changes

Checklist continued (if PR includes changes to documentation)

  • I have built the documentation locally with make.bat
  • I have built this documentation in docker, following the docker configuration in map2loop/docs
  • I have checked my spelling and grammar

rabii-chaarani and others added 30 commits June 3, 2024 14:01
Copy link
Member

@lachlangrose lachlangrose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of changes in this pull request which make it a bit difficult to follow - I have left some comments on the code but I am not sure I follow the design/reasoning completely. I think it is important to think about how this class will interact with the other classes/methods within map2loop. I think a plan either as a design document and/or a flow chart would really help understanding these changes.

When I initially suggested this class my intention was to have a class which has a get_sample() method for each different type of sample - it then either calls the appropriate sampler or returns the stored data depending on the current state. This would mean that within other samplers there could be a call to the sample_supervisor.get_samples for any other data type (although some care would be needed to prevent circular calls). The motivation behind this method was to allow for the underlying data or sampling algorithm parameters to be changed by the user and then map2loop would only recompute what was/is required. From what I could follow this current implementation does allow for this but the order of the calls is defined by the supervisor instead of the samplers - I think it would make more sense for the samplers to control the ordr because what is actually used by each sampler would be dependent on the algorithm.

I think that the calls to the project should not exists and these functions calculate_fault_orientations, summarise_fault_data, extract_geology_contacts should all be run by sampler type classes that are managed by this class.

My last comment is there seem to some changes to m2l that are included in this pull request that are not really related to the sample supervisor. I know it is a pain to separate but it really makes reviewing the pull request easier if only the relevant changes are in the PR.

from .project import Project


class AccessStorage(ABC):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really necessary to have an abstract class for this? I can't think of a case where there are multiple sample supervisors

def type(self):
return self.storage_label

def set_default_samplers(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't default be chosen with appropriate parameterisation given the map/dataset?

self.sampler_dirtyflags[sampletype] = True

@beartype.beartype
def get_sampler(self, sampletype: SampleType):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this return the sample not the sample type? and it should be type hinted for the return type

return self.samplers[sampletype].sampler_label

@beartype.beartype
def store(self, sampletype: SampleType, data):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What type is data? does this matter? Should there be any validation?

Comment on lines +159 to +166
datatype = Datatype(sampletype)

if datatype == Datatype.DTM:
self.map_data.load_raster_map_data(datatype)

else:
# load map data
self.map_data.load_map_data(datatype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is from the mapdata class, but why can't we just have map_data.load(datatype)? And it takes care of what type the data is? Would probably be more future proof as raster data could also include geophysical grids

sampletype (SampleType): The type of the sample.
"""

if sampletype == SampleType.CONTACT:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the samplers are changed so that they have a __call__ method this would avoid having to pass different arguments to the sample methods.

self.store(
SampleType.CONTACT,
self.samplers[SampleType.CONTACT].sample(
self.map_data.basal_contacts, self.map_data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is basal contacts not an attribute of the sample supervisor? Isn't is something that is sampled?

)

@beartype.beartype
def reprocess(self, sampletype: SampleType):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really see why there is a reprocess. Shouldn't reprocess just be process?

self.process(SampleType.FOLD)

@beartype.beartype
def __call__(self, sampletype: SampleType):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like having a call method, I think the logic is a bit complicated. It should just be self.process and return the data. All other checks should be done within process.

@rabii-chaarani
Copy link
Member Author

Rebasing did not solve the problem of having too many file changes. I will start over in another branch!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants