Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel version taking more time to complete than serial version #688

Open
adlzanchetta opened this issue Dec 14, 2023 · 12 comments
Open

Comments

@adlzanchetta
Copy link

adlzanchetta commented Dec 14, 2023

I've been comparing the performance of the NGen's serial vs. parallel (MPI) flavors using my personal laptop (processor Intel i7 4-core, 2 threads each). 2 experiments were set up, as described bellow.

Current behavior

Experiment 1: one basin segmented into 4,745 catchments, 2,694 nexus, each with its own Raven rainfall-runoff model fed with CSV file forcings. 1 month simulation time, hourly resolution. NGen compiled directly on my personal Ubuntu laptop. Tried serial, parallel with 2 processes and parallel with 4 processes.

Result:

mode workers time to complete (in minutes)
serial 1 47
parallel 2 55
parallel 4 56

Experiment 2: the default example provided in the README.md of the NGIAB repository (uses CFE rainfall-runoff model and NetCDF forcings). NGen running from the NGIAB's Docker container. Tried serial and its default parallel (2 processes) options.

Result:

mode workers time to complete (in minutes)
serial 1 3
parallel 2 3.5

Expected behavior

Parallel simulation was expected to take significantly less time to complete than its serial counterpart, but it is taking slight or significantly more time to complete.

Question

Is there a space / material presenting the conditions in which it is worth using the parallel version instead of the serial one?

@PhilMiller
Copy link
Contributor

I don't know that we have material documenting when to expect parallel speedup, but my intuition is that it should be worthwhile in your case.

The team is all at a conference this week, but I'd be happy to take a look at this next week. I suspect there's a configuration issue that's getting in the way, related to use of multiple threads per core.

In the meanwhile, could you post your scripts or command lines used?

@adlzanchetta
Copy link
Author

Thank you for offering to take a look at this, @PhilMiller .
Do you think it would worth sharing materials and results for this sort of comparison (parallel x serial) ?

The commands I used are given as follow. Please let me know if you find them incomplete or requiring some explanation.

Experiment 1

Serial:

$ /.../build/ngen "/.../catchments.geojson" all "/.../nexus_out.geojson" all "/.../realization.json"

Parallel (for 4 workers):

$ partitionGenerator /.../catchments.geojson /.../nexus_out.geojson /.../partitions_4.json 4 '' ''

$ mpirun -n 4 [--use-hwthread-cpus] /.../build/ngen "/.../catchments.geojson" all "/.../nexus_out.geojson" all "/.../realization.json" "/.../partitions_4.json" --subdivided-hydrofabric

Note: I tried both with and without the flag '--use-hwthread-cpus' with no significant difference.

Experiment 2

After git-cloning and entering folder "NGIAB-CloudInfra":

Serial:

$ ./guide.sh
$ ...
$ Select an option (type a number):  ➔  1) Run NextGen Model using local docker image
$ Select an option (type a number):  ➔  1) Run NextGen model framework in serial mode
$ [copy-paste geojson found file paths]

Parallel:

$ ./guide.sh
$ ...
$ Select an option (type a number):  ➔  1) Run NextGen Model using local docker image
$ Select an option (type a number):  ➔  2) Run NextGen model framework in parallel mode
$ [copy-paste geojson found file paths]

@hellkite500
Copy link
Member

Can you also add your realization.json file contents?

@adlzanchetta
Copy link
Author

adlzanchetta commented Dec 15, 2023

For experiment 1 (for exp 2 in the following comment):

{
    "global": {
        "formulations": [
            {
                "name": "bmi_c++",
                "params": {
                    "model_type_name": "raven",
                    "library_file": "/(...)/build/libravenbmi.so",
                    "init_config": "../inputs/rainfall-runoff_models/{{id}}/{{id}}.yaml",
                    "main_output_variable": "streamflow",
                    "variables_names_map": {
                        "temp_ave": "TMP_2maboveground",
                        "precipitation": "precip_rate"
                    },
                    "create_function": "bmi_model_create",
                    "destroy_function_": "bmi_model_destroy",
                    "uses_forcing_file": false
                }
            }
        ],
        "forcing": {
            "path": "../inputs/forcings/uniform.csv",
            "provider": "CsvPerFeature"
        }
    },
    "time": {
        "start_time": "2023-11-28 22:00:00",
        "end_time": "2023-11-29 22:00:00",
        "output_interval": 3600
    },
    "catchments": {}
}

@adlzanchetta
Copy link
Author

For experiment 2 (for exp 1 in the previous comment):

{
    "global": {
        "formulations": [
            {
                "name": "bmi_multi",
                "params": {
                    "name": "bmi_multi",
                    "model_type_name": "NoahOWP_CFE",
                    "main_output_variable": "Q_OUT",
                    "init_config": "",
                    "allow_exceed_end_time": false,
                    "fixed_time_step": false,
                    "uses_forcing_file": false,
                    "modules": [
                        {
                            "name": "bmi_c++",
                            "params": {
                                "name": "bmi_c++",
                                "model_type_name": "SLOTH",
                                "main_output_variable": "z",
                                "init_config": "/dev/null",
                                "allow_exceed_end_time": true,
                                "fixed_time_step": false,
                                "uses_forcing_file": false,
                                "model_params": {
                                    "sloth_ice_fraction_schaake(1,double,m,node)": "0.0",
                                    "sloth_ice_fraction_xinan(1,double,1,node)": "0.0",
                                    "sloth_smp(1,double,1,node)": "0.0",
                                    "EVAPOTRANS": "0.0"
                                },
                                "library_file": "/dmod/shared_libs/libslothmodel.so",
                                "registration_function": "none"
                            }
                        },
                        {
                            "name": "bmi_c",
                            "params": {
                                "name": "bmi_c",
                                "model_type_name": "CFE",
                                "main_output_variable": "Q_OUT",
                                "init_config": "/ngen/ngen/data/config/awi_config.ini",
                                "allow_exceed_end_time": true,
                                "fixed_time_step": false,
                                "uses_forcing_file": false,
                                "variables_names_map": {
                                    "atmosphere_water__liquid_equivalent_precipitation_rate": "precip_rate",
                                    "water_potential_evaporation_flux": "EVAPOTRANS",
                                    "ice_fraction_schaake": "sloth_ice_fraction_schaake",
                                    "ice_fraction_xinan": "sloth_ice_fraction_xinan",
                                    "soil_moisture_profile": "sloth_smp"
                                },
                                "model_params": {
                                    "b": 8.660529385231255,
                                    "satdk": 0.00011760880965802808,
                                    "maxsmc": 0.543673362985325,
                                    "refkdt": 3.6613440504586134,
                                    "slope": 0.8154788969461678,
                                    "max_gw_storage": 0.04021994414923359,
                                    "expon": 7.308820146231674,
                                    "Cgw": 0.0004609207383395736,
                                    "Klf": 0.1681695665829872,
                                    "Kn": 0.4017865685354076
                                },
                                "library_file": "/dmod/shared_libs/libcfebmi.so.1.0.0",
                                "registration_function": "register_bmi_cfe"
                            }
                        }
                    ]
                }
            }
        ],
        "forcing": {
            "file_pattern": "cat03w_{{id}}*.csv",
            "path": "/ngen/ngen/data/forcings/",
            "provider": "CsvPerFeature"
        }
    },
    "time": {
        "start_time": "2022-08-24 13:00:00",
        "end_time": "2022-09-03 12:00:00",
        "output_interval": 3600
    },
    "routing": {
        "t_route_config_file_with_path": "/ngen/ngen/data/config/ngen.yaml"
    },
    "catchments": {}
}

@PhilMiller
Copy link
Contributor

It'll probably be easiest to analyze what's happening here with some interactive discussion. Could you email me at philip.miller AT noaa.gov and we can arrange some time to talk?

@PhilMiller
Copy link
Contributor

In the meanwhile, how much RAM does your laptop have, and what storage hardware (HDD vs SSD, especially, also SATA vs NVMe if you know)

@jameshalgren
Copy link

@JoshCu -- do we know anything that might help here?

@JoshCu
Copy link

JoshCu commented Jan 2, 2024

Yes we do! At least for the second experiment with NGIAB
To fix it, we need to create a new NGIAB image with the latest troute changes.
The reason the performance is so similar for both runs is because there's step in troute that converts the ngen output to a format that can be read by troute.
It's discussed more here and was fixed by this pr. But the changes haven't made it into the docker image yet.
the forcing array construction step is the culprit

Before the fix

image

After the fix

image

@adlzanchetta
Copy link
Author

In the meanwhile, how much RAM does your laptop have, and what storage hardware (HDD vs SSD, especially, also SATA vs NVMe if you know)

My machine has 16Gb RAM and 1TB NVMe SSD.
Thank you for your interest, @PhilMiller . I've just sent you and email.

@JoshCu Wow, from little more than 2 minutes to little less then 10 seconds is a huge drop!
Are these results also for the default example provided in the root README.md of the NGIAB repo?

@JoshCu
Copy link

JoshCu commented Jan 3, 2024

Yeah that was just from testing with the example data with the NGIAB image. Both tests were using identical config, but I did tweak some settings to try speed it up before making changes to t-route. I modified the image here and ngen.yaml so it was using 20 mpi processes for the model running and then 56 cpu_pool for the routing. I did also change my parallel_compute_method in ngen.yaml to by-subnetwork-jit
From memory the increased mpi processes did make a difference but the extra t-route cores didn't effect much. But I such a large portion of the runtime was that I/O bottle neck that testing it again the speedup might be more noticeable.

compute_parameters:
    #----------
    parallel_compute_method: by-subnetwork-jit # serial?
    compute_kernel         : V02-structured
    assume_short_ts        : True
    subnetwork_target_size : 100
    cpu_pool               : 56

The example ngen.yaml has a subnetwork_target_size of 10000 and there's only ~700 catchments in the example data. It's possible that all the catchments are getting assigned to one core regardless of the cpu_pool too, but I'm not sure how the subnetworks are created in t-route.

Testing setup - my ancient dell workstation

Ubuntu 22.04
two 28 core Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz
sata ssd ~500mbs r/w
128gb ram (8* 16GiB RIMM DDR4 Synchronous 2133 MHz (0.5 ns))

@PhilMiller
Copy link
Contributor

I'm also adding a bit of code to ngen to make diagnosing issues like this at least a little bit easier in the future:
#696

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants