You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using the current master branch or the latest release. Please indicate.
I am running on an up-to-date pypsa-eur environment. Update via conda env update -f envs/environment.yaml.
Describe the Bug
PR #1013 introduced a new feature that shape files are stored inside network files (in addition to the .geojson-files that are stored in the resources/). This is convenient for plotting; however, for large networks, reading networks requires massive memory spike compared to previous versions w/o n.shapes.
For example, take a workflow for the 50 node electricity-only network with an up-to-date pypsa-eur. Let's pick build_powerplants rule from build_electricity.smk with default 7GB memory allocation:
The script build_powerplants.py requires ~10.6GB of memory for the 50 node network, whereas profiling the same script w/o the line that reads base network show everything w/o reading network requires only ~2.2GB of memory. The 7GB memory legacy setting is thus not sufficient anymore, breaking the workflow with the default settings.
What's causing the memory spike?
If profiling the following test script shows that most of 10GB memory spike is needed in PyPSA/pypsa/io.py for the xarray call self.ds = xr.open_dataset(path)
import pypsa
n = pypsa.Network("resources/test-50/networks/base.nc")
Now, if we drop n.shapes, write to nc, and read again -> the same line requires 80x less memory (120 MB):
n.mremove("Shape", n.shapes.index)
n.export_to_netcdf("resources/test-50/networks/base_noshapes.nc")
n = pypsa.Network("resources/test-50/networks/base_noshapes.nc")
What can be done?
-- increase memory requirements within PyPSA-Eur and PyPSA-x (not ideal given the size of spikes)
-- make n.shapes optional in config (trade-off between convenience and sanity)
-- any workaround for xr.open_dataset(..)?
The text was updated successfully, but these errors were encountered:
Checklist
master
branch or the latest release. Please indicate.pypsa-eur
environment. Update viaconda env update -f envs/environment.yaml
.Describe the Bug
PR #1013 introduced a new feature that shape files are stored inside network files (in addition to the .geojson-files that are stored in the resources/). This is convenient for plotting; however, for large networks, reading networks requires massive memory spike compared to previous versions w/o n.shapes.
For example, take a workflow for the 50 node electricity-only network with an up-to-date pypsa-eur. Let's pick
build_powerplants
rule from build_electricity.smk with default 7GB memory allocation:pypsa-eur/rules/build_electricity.smk
Lines 31 to 50 in 885a881
The script
build_powerplants.py
requires ~10.6GB of memory for the 50 node network, whereas profiling the same script w/o the line that reads base network show everything w/o reading network requires only ~2.2GB of memory. The 7GB memory legacy setting is thus not sufficient anymore, breaking the workflow with the default settings.What's causing the memory spike?
If profiling the following test script shows that most of 10GB memory spike is needed in
PyPSA/pypsa/io.py
for the xarray callself.ds = xr.open_dataset(path)
Now, if we drop n.shapes, write to nc, and read again -> the same line requires 80x less memory (120 MB):
What can be done?
-- increase memory requirements within PyPSA-Eur and PyPSA-x (not ideal given the size of spikes)
-- make n.shapes optional in config (trade-off between convenience and sanity)
-- any workaround for
xr.open_dataset(..)
?The text was updated successfully, but these errors were encountered: