Parallel version taking more time to complete than serial version #688

adlzanchetta · 2023-12-14T03:36:48Z

I've been comparing the performance of the NGen's serial vs. parallel (MPI) flavors using my personal laptop (processor Intel i7 4-core, 2 threads each). 2 experiments were set up, as described bellow.

Current behavior

Experiment 1: one basin segmented into 4,745 catchments, 2,694 nexus, each with its own Raven rainfall-runoff model fed with CSV file forcings. 1 month simulation time, hourly resolution. NGen compiled directly on my personal Ubuntu laptop. Tried serial, parallel with 2 processes and parallel with 4 processes.

Result:

mode	workers	time to complete (in minutes)
serial	1	47
parallel	2	55
parallel	4	56

Experiment 2: the default example provided in the README.md of the NGIAB repository (uses CFE rainfall-runoff model and NetCDF forcings). NGen running from the NGIAB's Docker container. Tried serial and its default parallel (2 processes) options.

Result:

mode	workers	time to complete (in minutes)
serial	1	3
parallel	2	3.5

Expected behavior

Parallel simulation was expected to take significantly less time to complete than its serial counterpart, but it is taking slight or significantly more time to complete.

Question

Is there a space / material presenting the conditions in which it is worth using the parallel version instead of the serial one?

PhilMiller · 2023-12-14T17:52:04Z

I don't know that we have material documenting when to expect parallel speedup, but my intuition is that it should be worthwhile in your case.

The team is all at a conference this week, but I'd be happy to take a look at this next week. I suspect there's a configuration issue that's getting in the way, related to use of multiple threads per core.

In the meanwhile, could you post your scripts or command lines used?

adlzanchetta · 2023-12-15T00:11:33Z

Thank you for offering to take a look at this, @PhilMiller .
Do you think it would worth sharing materials and results for this sort of comparison (parallel x serial) ?

The commands I used are given as follow. Please let me know if you find them incomplete or requiring some explanation.

Experiment 1

Serial:

$ /.../build/ngen "/.../catchments.geojson" all "/.../nexus_out.geojson" all "/.../realization.json"

Parallel (for 4 workers):

$ partitionGenerator /.../catchments.geojson /.../nexus_out.geojson /.../partitions_4.json 4 '' ''

$ mpirun -n 4 [--use-hwthread-cpus] /.../build/ngen "/.../catchments.geojson" all "/.../nexus_out.geojson" all "/.../realization.json" "/.../partitions_4.json" --subdivided-hydrofabric

Note: I tried both with and without the flag '--use-hwthread-cpus' with no significant difference.

Experiment 2

After git-cloning and entering folder "NGIAB-CloudInfra":

Serial:

$ ./guide.sh
$ ...
$ Select an option (type a number):  ➔  1) Run NextGen Model using local docker image
$ Select an option (type a number):  ➔  1) Run NextGen model framework in serial mode
$ [copy-paste geojson found file paths]

Parallel:

$ ./guide.sh
$ ...
$ Select an option (type a number):  ➔  1) Run NextGen Model using local docker image
$ Select an option (type a number):  ➔  2) Run NextGen model framework in parallel mode
$ [copy-paste geojson found file paths]

hellkite500 · 2023-12-15T00:36:07Z

Can you also add your realization.json file contents?

adlzanchetta · 2023-12-15T11:26:42Z

For experiment 1 (for exp 2 in the following comment):

{
    "global": {
        "formulations": [
            {
                "name": "bmi_c++",
                "params": {
                    "model_type_name": "raven",
                    "library_file": "/(...)/build/libravenbmi.so",
                    "init_config": "../inputs/rainfall-runoff_models/{{id}}/{{id}}.yaml",
                    "main_output_variable": "streamflow",
                    "variables_names_map": {
                        "temp_ave": "TMP_2maboveground",
                        "precipitation": "precip_rate"
                    },
                    "create_function": "bmi_model_create",
                    "destroy_function_": "bmi_model_destroy",
                    "uses_forcing_file": false
                }
            }
        ],
        "forcing": {
            "path": "../inputs/forcings/uniform.csv",
            "provider": "CsvPerFeature"
        }
    },
    "time": {
        "start_time": "2023-11-28 22:00:00",
        "end_time": "2023-11-29 22:00:00",
        "output_interval": 3600
    },
    "catchments": {}
}

adlzanchetta · 2023-12-15T11:29:57Z

For experiment 2 (for exp 1 in the previous comment):

{
    "global": {
        "formulations": [
            {
                "name": "bmi_multi",
                "params": {
                    "name": "bmi_multi",
                    "model_type_name": "NoahOWP_CFE",
                    "main_output_variable": "Q_OUT",
                    "init_config": "",
                    "allow_exceed_end_time": false,
                    "fixed_time_step": false,
                    "uses_forcing_file": false,
                    "modules": [
                        {
                            "name": "bmi_c++",
                            "params": {
                                "name": "bmi_c++",
                                "model_type_name": "SLOTH",
                                "main_output_variable": "z",
                                "init_config": "/dev/null",
                                "allow_exceed_end_time": true,
                                "fixed_time_step": false,
                                "uses_forcing_file": false,
                                "model_params": {
                                    "sloth_ice_fraction_schaake(1,double,m,node)": "0.0",
                                    "sloth_ice_fraction_xinan(1,double,1,node)": "0.0",
                                    "sloth_smp(1,double,1,node)": "0.0",
                                    "EVAPOTRANS": "0.0"
                                },
                                "library_file": "/dmod/shared_libs/libslothmodel.so",
                                "registration_function": "none"
                            }
                        },
                        {
                            "name": "bmi_c",
                            "params": {
                                "name": "bmi_c",
                                "model_type_name": "CFE",
                                "main_output_variable": "Q_OUT",
                                "init_config": "/ngen/ngen/data/config/awi_config.ini",
                                "allow_exceed_end_time": true,
                                "fixed_time_step": false,
                                "uses_forcing_file": false,
                                "variables_names_map": {
                                    "atmosphere_water__liquid_equivalent_precipitation_rate": "precip_rate",
                                    "water_potential_evaporation_flux": "EVAPOTRANS",
                                    "ice_fraction_schaake": "sloth_ice_fraction_schaake",
                                    "ice_fraction_xinan": "sloth_ice_fraction_xinan",
                                    "soil_moisture_profile": "sloth_smp"
                                },
                                "model_params": {
                                    "b": 8.660529385231255,
                                    "satdk": 0.00011760880965802808,
                                    "maxsmc": 0.543673362985325,
                                    "refkdt": 3.6613440504586134,
                                    "slope": 0.8154788969461678,
                                    "max_gw_storage": 0.04021994414923359,
                                    "expon": 7.308820146231674,
                                    "Cgw": 0.0004609207383395736,
                                    "Klf": 0.1681695665829872,
                                    "Kn": 0.4017865685354076
                                },
                                "library_file": "/dmod/shared_libs/libcfebmi.so.1.0.0",
                                "registration_function": "register_bmi_cfe"
                            }
                        }
                    ]
                }
            }
        ],
        "forcing": {
            "file_pattern": "cat03w_{{id}}*.csv",
            "path": "/ngen/ngen/data/forcings/",
            "provider": "CsvPerFeature"
        }
    },
    "time": {
        "start_time": "2022-08-24 13:00:00",
        "end_time": "2022-09-03 12:00:00",
        "output_interval": 3600
    },
    "routing": {
        "t_route_config_file_with_path": "/ngen/ngen/data/config/ngen.yaml"
    },
    "catchments": {}
}

PhilMiller · 2024-01-02T20:49:32Z

It'll probably be easiest to analyze what's happening here with some interactive discussion. Could you email me at philip.miller AT noaa.gov and we can arrange some time to talk?

PhilMiller · 2024-01-02T20:51:08Z

In the meanwhile, how much RAM does your laptop have, and what storage hardware (HDD vs SSD, especially, also SATA vs NVMe if you know)

jameshalgren · 2024-01-02T21:21:00Z

@JoshCu -- do we know anything that might help here?

JoshCu · 2024-01-02T21:42:28Z

Yes we do! At least for the second experiment with NGIAB
To fix it, we need to create a new NGIAB image with the latest troute changes.
The reason the performance is so similar for both runs is because there's step in troute that converts the ngen output to a format that can be read by troute.
It's discussed more here and was fixed by this pr. But the changes haven't made it into the docker image yet.
the forcing array construction step is the culprit

Before the fix

After the fix

adlzanchetta · 2024-01-03T03:13:06Z

In the meanwhile, how much RAM does your laptop have, and what storage hardware (HDD vs SSD, especially, also SATA vs NVMe if you know)

My machine has 16Gb RAM and 1TB NVMe SSD.
Thank you for your interest, @PhilMiller . I've just sent you and email.

@JoshCu Wow, from little more than 2 minutes to little less then 10 seconds is a huge drop!
Are these results also for the default example provided in the root README.md of the NGIAB repo?

JoshCu · 2024-01-03T03:59:08Z

Yeah that was just from testing with the example data with the NGIAB image. Both tests were using identical config, but I did tweak some settings to try speed it up before making changes to t-route. I modified the image here and ngen.yaml so it was using 20 mpi processes for the model running and then 56 cpu_pool for the routing. I did also change my parallel_compute_method in ngen.yaml to by-subnetwork-jit
From memory the increased mpi processes did make a difference but the extra t-route cores didn't effect much. But I such a large portion of the runtime was that I/O bottle neck that testing it again the speedup might be more noticeable.

compute_parameters:
    #----------
    parallel_compute_method: by-subnetwork-jit # serial?
    compute_kernel         : V02-structured
    assume_short_ts        : True
    subnetwork_target_size : 100
    cpu_pool               : 56

The example ngen.yaml has a subnetwork_target_size of 10000 and there's only ~700 catchments in the example data. It's possible that all the catchments are getting assigned to one core regardless of the cpu_pool too, but I'm not sure how the subnetworks are created in t-route.

Testing setup - my ancient dell workstation

Ubuntu 22.04
two 28 core Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz
sata ssd ~500mbs r/w
128gb ram (8* 16GiB RIMM DDR4 Synchronous 2133 MHz (0.5 ns))

PhilMiller · 2024-01-03T15:43:24Z

I'm also adding a bit of code to ngen to make diagnosing issues like this at least a little bit easier in the future:
#696

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel version taking more time to complete than serial version #688

Parallel version taking more time to complete than serial version #688

adlzanchetta commented Dec 14, 2023 •

edited

Loading

PhilMiller commented Dec 14, 2023

adlzanchetta commented Dec 15, 2023

hellkite500 commented Dec 15, 2023

adlzanchetta commented Dec 15, 2023 •

edited

Loading

adlzanchetta commented Dec 15, 2023

PhilMiller commented Jan 2, 2024

PhilMiller commented Jan 2, 2024

jameshalgren commented Jan 2, 2024

JoshCu commented Jan 2, 2024 •

edited

Loading

adlzanchetta commented Jan 3, 2024

JoshCu commented Jan 3, 2024

PhilMiller commented Jan 3, 2024

Parallel version taking more time to complete than serial version #688

Parallel version taking more time to complete than serial version #688

Comments

adlzanchetta commented Dec 14, 2023 • edited Loading

Current behavior

Expected behavior

Question

PhilMiller commented Dec 14, 2023

adlzanchetta commented Dec 15, 2023

hellkite500 commented Dec 15, 2023

adlzanchetta commented Dec 15, 2023 • edited Loading

adlzanchetta commented Dec 15, 2023

PhilMiller commented Jan 2, 2024

PhilMiller commented Jan 2, 2024

jameshalgren commented Jan 2, 2024

JoshCu commented Jan 2, 2024 • edited Loading

Before the fix

After the fix

adlzanchetta commented Jan 3, 2024

JoshCu commented Jan 3, 2024

Testing setup - my ancient dell workstation

PhilMiller commented Jan 3, 2024

adlzanchetta commented Dec 14, 2023 •

edited

Loading

adlzanchetta commented Dec 15, 2023 •

edited

Loading

JoshCu commented Jan 2, 2024 •

edited

Loading