Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic coding for GridPROTEUS #162

Open
nichollsh opened this issue Aug 30, 2024 · 13 comments
Open

Dynamic coding for GridPROTEUS #162

nichollsh opened this issue Aug 30, 2024 · 13 comments
Labels
enhancement New feature or request ensembles Relating to grids or forward models JOSS publication: PROTEUS TBD before PROTEUS JOSS publication Priority 2: high Priority level 2: high time criticality or importance

Comments

@nichollsh
Copy link
Contributor

Currently, a large amount of the configuration of GridPROTEUS.py is hardcoded in the if __name__ statement. This would ideally be configured in another manner, such as by a configuration file or with a CLI.

@stefsmeets
Copy link
Contributor

I would suggest to hold off major work on this until we refactored the proteus (single) run code. My hope is to fold GridProteus into there as well eventually.

@EmmaPostolec
Copy link
Contributor

EmmaPostolec commented Nov 13, 2024

Hi, I edited tools/grid_proteus.py on my computer with new parameters that I wanted to test for the escape (Pxuv, efficiency and semimajoraxis) and I ran into the following error:

[2024-11-13 11:25:32] Sleeping...
[2024-11-13 11:25:32]     7 
[2024-11-13 11:25:33]     6 
[2024-11-13 11:25:34]     5 
[2024-11-13 11:25:35]     4 
[2024-11-13 11:25:36]     3 
[2024-11-13 11:25:37]     2 
[2024-11-13 11:25:38]     1 
[2024-11-13 11:25:39]  
[2024-11-13 11:25:39] Writing config files
[2024-11-13 11:25:45] Starting process manager (75 grid points, 40 threads)
[2024-11-13 11:25:45]  75 queued (100.0%),   0 running (  0.0%),   0 completed (  0.0%)
[2024-11-13 11:25:45] Uncaught exception
Traceback (most recent call last):
  File "/Users/emmapostolec/Documents/PHD/SCIENCE/CODES/PROTEUS/tools/grid_proteus.py", line 492, in <module>
    pg.run(40, test_run=False)
  File "/Users/emmapostolec/Documents/PHD/SCIENCE/CODES/PROTEUS/tools/grid_proteus.py", line 405, in run
    threads[i].start()
  File "/Users/emmapostolec/miniconda3/envs/proteus/lib/python3.12/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
                  ^^^^^^^^^^^^^^^^^
  File "/Users/emmapostolec/miniconda3/envs/proteus/lib/python3.12/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/emmapostolec/miniconda3/envs/proteus/lib/python3.12/multiprocessing/context.py", line 289, in _Popen
    return Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^
  File "/Users/emmapostolec/miniconda3/envs/proteus/lib/python3.12/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/Users/emmapostolec/miniconda3/envs/proteus/lib/python3.12/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/Users/emmapostolec/miniconda3/envs/proteus/lib/python3.12/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/Users/emmapostolec/miniconda3/envs/proteus/lib/python3.12/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'Grid.run.<locals>._thread_target'

@nichollsh suggested having multiprocessing.set_start_method('fork') on line 25, and with this change it seems to work on my computer now! Should I update grid_proteus.py and open a new pull request to merge into main ? Let me know what you think is the best. @nichollsh @timlichtenberg @lsoucasse @stefsmeets

@stefsmeets
Copy link
Contributor

Do you know when the error was introduced?

@nichollsh
Copy link
Contributor Author

This might have been here since the script was created. I have not found this error on my system at any point in the past.

This article (with the fix mentioned above) suggest that it depends on the platform in some strange way. https://medium.com/devopss-hole/python-multiprocessing-pickle-issue-e2d35ccf96a9

@stefsmeets
Copy link
Contributor

stefsmeets commented Nov 13, 2024

You could argue whether fork vs. spawn is better, but note that fork is not available on Windows, buggy on macs, and discouraged because forking has lots of undefined behaviour. This can lead to very hard to reproduce bugs.

Looking at the code, my take is that this code should use threading module instead of multiprocessing (e.g. via a Threadpool). The code already calls a subprocess for the actual work, so that means that every task essentially creates 2 processes, which is very inefficient.

@nichollsh
Copy link
Contributor Author

nichollsh commented Nov 14, 2024

There's definitely a lot of room for improvement in this script. Using threadpools makes more sense than the current bespoke formulation for queueing. Ideally, it would also have an option to run the grid across multiple nodes using the Python SLURM library.

This is not particularly high-priority at the moment though - as long as the current script works for everyone.

@stefsmeets
Copy link
Contributor

I agree with your assesment that it is not very high priority. SLURM or subprocess is not the major development bottleneck here imo, if the rest of the code is well factored.

@timlichtenberg timlichtenberg added Priority 2: high Priority level 2: high time criticality or importance JOSS publication: PROTEUS TBD before PROTEUS JOSS publication labels Nov 19, 2024
@timlichtenberg
Copy link
Collaborator

Coming back to this a bit, because I believe dynamic coding for grid runs is quite essential to the functioning of the code. How could this be implemented with the use of the current config files?

For example, instead of
mass = 1.0 # M_earth
could the user set:
mass = [1.0, 1.5, 2.0] # M_earth
and the code automatically detects this to be a grid run, and accordingly launches multiple simulations in parallel?

If not, what are the alternatives?

@nichollsh
Copy link
Contributor Author

nichollsh commented Nov 19, 2024

I like this idea a lot. It's intuitive and doesn't require the user to learn a new configuration file format. Seems quite doable, since TOML already supports arrays/lists.

I am not so much a fan of the automatic detection thing, since the user should be clear on what exactly is going to be run when they pass a configuration file. Maybe we could have proteus start -c file.toml as usual, and then also proteus grid -c file.toml. The former case should raise an error if it thinks that the toml file is meant to be a grid of parameters.

@timlichtenberg
Copy link
Collaborator

timlichtenberg commented Nov 19, 2024

This is a good idea and will simplify the implementation. I want to shortly connect this to #204. Assuming we would use pymultinest for an inverse-PROTEUS implementation, the call function to run a PROTEUS grid will be a critical choice to also be passed to an external module. Alternatively, pymultinest could become a standard PROTEUS sub-module, and the config file would get an additional "retrieval" section of parameters that can be retrieved. Once the appropriate flag is passed (for example, via proteus retrieval -c file.toml), the user could specify both the priors of all model parameters and the data (i.e., mass, radius, age, spectrum, for example). This latter option would homogenise the PROTEUS call sequence for both forward and inverse modelling approaches.

@stefsmeets
Copy link
Contributor

stefsmeets commented Nov 19, 2024

I like the idea as well from a user point of view, but overloading the config object in this way will be a hugely complex task. There are libraries that can help with this sort of work (e.g. for use in uncertainty quantification studies).

@nichollsh
Copy link
Contributor Author

I can see how this might get complicated under the current formulation. Which libraries did you have in mind?

@stefsmeets
Copy link
Contributor

stefsmeets commented Nov 19, 2024

Here are a bunch from when I last looked for them. There might be more, these are in the context of UQ work:

They can all help with setting up a cartesian grid, hypercube sampling, and plotting the results if you are interested in UQ work. All we need to do is code the interface.

Let's say you want to vary mass = [1.0, 1.5, 2.0], you would have to define the connector interface to write input.toml files, probably from a template (easy), and the connector back to read the output (also relatively easy with the csv file). The last bit is optional if you just want to have these tools write the input files. The example above would then yield 3 different input files.

Some of these tools have support for submitting them to a cluster as well. I know that at least easyvvuq works with SLURM.

The point is, that libraries exist to help with this sort of work and I think it is worth investing a bit of time there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request ensembles Relating to grids or forward models JOSS publication: PROTEUS TBD before PROTEUS JOSS publication Priority 2: high Priority level 2: high time criticality or importance
Projects
Status: JOSS Publication
Development

No branches or pull requests

5 participants