-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel version taking more time to complete than serial version #688
Comments
I don't know that we have material documenting when to expect parallel speedup, but my intuition is that it should be worthwhile in your case. The team is all at a conference this week, but I'd be happy to take a look at this next week. I suspect there's a configuration issue that's getting in the way, related to use of multiple threads per core. In the meanwhile, could you post your scripts or command lines used? |
Thank you for offering to take a look at this, @PhilMiller . The commands I used are given as follow. Please let me know if you find them incomplete or requiring some explanation. Experiment 1 Serial:
Parallel (for 4 workers):
Note: I tried both with and without the flag Experiment 2 After git-cloning and entering folder "NGIAB-CloudInfra": Serial:
Parallel:
|
Can you also add your realization.json file contents? |
For experiment 1 (for exp 2 in the following comment): {
"global": {
"formulations": [
{
"name": "bmi_c++",
"params": {
"model_type_name": "raven",
"library_file": "/(...)/build/libravenbmi.so",
"init_config": "../inputs/rainfall-runoff_models/{{id}}/{{id}}.yaml",
"main_output_variable": "streamflow",
"variables_names_map": {
"temp_ave": "TMP_2maboveground",
"precipitation": "precip_rate"
},
"create_function": "bmi_model_create",
"destroy_function_": "bmi_model_destroy",
"uses_forcing_file": false
}
}
],
"forcing": {
"path": "../inputs/forcings/uniform.csv",
"provider": "CsvPerFeature"
}
},
"time": {
"start_time": "2023-11-28 22:00:00",
"end_time": "2023-11-29 22:00:00",
"output_interval": 3600
},
"catchments": {}
} |
For experiment 2 (for exp 1 in the previous comment): {
"global": {
"formulations": [
{
"name": "bmi_multi",
"params": {
"name": "bmi_multi",
"model_type_name": "NoahOWP_CFE",
"main_output_variable": "Q_OUT",
"init_config": "",
"allow_exceed_end_time": false,
"fixed_time_step": false,
"uses_forcing_file": false,
"modules": [
{
"name": "bmi_c++",
"params": {
"name": "bmi_c++",
"model_type_name": "SLOTH",
"main_output_variable": "z",
"init_config": "/dev/null",
"allow_exceed_end_time": true,
"fixed_time_step": false,
"uses_forcing_file": false,
"model_params": {
"sloth_ice_fraction_schaake(1,double,m,node)": "0.0",
"sloth_ice_fraction_xinan(1,double,1,node)": "0.0",
"sloth_smp(1,double,1,node)": "0.0",
"EVAPOTRANS": "0.0"
},
"library_file": "/dmod/shared_libs/libslothmodel.so",
"registration_function": "none"
}
},
{
"name": "bmi_c",
"params": {
"name": "bmi_c",
"model_type_name": "CFE",
"main_output_variable": "Q_OUT",
"init_config": "/ngen/ngen/data/config/awi_config.ini",
"allow_exceed_end_time": true,
"fixed_time_step": false,
"uses_forcing_file": false,
"variables_names_map": {
"atmosphere_water__liquid_equivalent_precipitation_rate": "precip_rate",
"water_potential_evaporation_flux": "EVAPOTRANS",
"ice_fraction_schaake": "sloth_ice_fraction_schaake",
"ice_fraction_xinan": "sloth_ice_fraction_xinan",
"soil_moisture_profile": "sloth_smp"
},
"model_params": {
"b": 8.660529385231255,
"satdk": 0.00011760880965802808,
"maxsmc": 0.543673362985325,
"refkdt": 3.6613440504586134,
"slope": 0.8154788969461678,
"max_gw_storage": 0.04021994414923359,
"expon": 7.308820146231674,
"Cgw": 0.0004609207383395736,
"Klf": 0.1681695665829872,
"Kn": 0.4017865685354076
},
"library_file": "/dmod/shared_libs/libcfebmi.so.1.0.0",
"registration_function": "register_bmi_cfe"
}
}
]
}
}
],
"forcing": {
"file_pattern": "cat03w_{{id}}*.csv",
"path": "/ngen/ngen/data/forcings/",
"provider": "CsvPerFeature"
}
},
"time": {
"start_time": "2022-08-24 13:00:00",
"end_time": "2022-09-03 12:00:00",
"output_interval": 3600
},
"routing": {
"t_route_config_file_with_path": "/ngen/ngen/data/config/ngen.yaml"
},
"catchments": {}
} |
It'll probably be easiest to analyze what's happening here with some interactive discussion. Could you email me at philip.miller AT noaa.gov and we can arrange some time to talk? |
In the meanwhile, how much RAM does your laptop have, and what storage hardware (HDD vs SSD, especially, also SATA vs NVMe if you know) |
@JoshCu -- do we know anything that might help here? |
Yes we do! At least for the second experiment with NGIAB Before the fixAfter the fix |
My machine has 16Gb RAM and 1TB NVMe SSD. @JoshCu Wow, from little more than 2 minutes to little less then 10 seconds is a huge drop! |
Yeah that was just from testing with the example data with the NGIAB image. Both tests were using identical config, but I did tweak some settings to try speed it up before making changes to t-route. I modified the image here and ngen.yaml so it was using 20 mpi processes for the model running and then 56 cpu_pool for the routing. I did also change my parallel_compute_method in ngen.yaml to by-subnetwork-jit compute_parameters:
#----------
parallel_compute_method: by-subnetwork-jit # serial?
compute_kernel : V02-structured
assume_short_ts : True
subnetwork_target_size : 100
cpu_pool : 56 The example ngen.yaml has a subnetwork_target_size of 10000 and there's only ~700 catchments in the example data. It's possible that all the catchments are getting assigned to one core regardless of the cpu_pool too, but I'm not sure how the subnetworks are created in t-route. Testing setup - my ancient dell workstationUbuntu 22.04 |
I'm also adding a bit of code to ngen to make diagnosing issues like this at least a little bit easier in the future: |
I've been comparing the performance of the NGen's serial vs. parallel (MPI) flavors using my personal laptop (processor Intel i7 4-core, 2 threads each). 2 experiments were set up, as described bellow.
Current behavior
Experiment 1: one basin segmented into 4,745 catchments, 2,694 nexus, each with its own Raven rainfall-runoff model fed with CSV file forcings. 1 month simulation time, hourly resolution. NGen compiled directly on my personal Ubuntu laptop. Tried serial, parallel with 2 processes and parallel with 4 processes.
Result:
Experiment 2: the default example provided in the README.md of the NGIAB repository (uses CFE rainfall-runoff model and NetCDF forcings). NGen running from the NGIAB's Docker container. Tried serial and its default parallel (2 processes) options.
Result:
Expected behavior
Parallel simulation was expected to take significantly less time to complete than its serial counterpart, but it is taking slight or significantly more time to complete.
Question
Is there a space / material presenting the conditions in which it is worth using the parallel version instead of the serial one?
The text was updated successfully, but these errors were encountered: