Fix: Avoid `_like` function in Chunking #340

ax3l · 2022-04-12T21:36:36Z

When we prepare chunked reads, we assume a single chunk for all backends but ADIOS2.
Preparing the returned data, we use data = np.full_like(record_component, np.nan). It turns out that numpy seems to trigger a __getitem__ access or full copy of our record_component at this point, which causes severe slowdown.

Refs.:

This was first seen for particles, but affects every read where we do not slice a subset.

Thanks a lot to @AlexanderSinn for debugging this with me ✨

This fixes the performance regression that @AlexanderSinn, @MaxThevenet and @SeverinDiederichs saw in Hi-PACE/hipace#725

Regression to #332 #334

AlexanderSinn · 2022-04-13T13:49:51Z

Doesn't seem to work yet

Traceback (most recent call last):
  File "/home/asinn/hipace//tests/checksum/checksumAPI.py", line 158, in <module>
    evaluate_checksum(args.test_name, args.file_name, rtol=args.rtol,
  File "/home/asinn/hipace//tests/checksum/checksumAPI.py", line 60, in evaluate_checksum
    test_checksum = Checksum(test_name, file_name, do_fields=do_fields,
  File "/home/asinn/hipace/tests/checksum/checksum.py", line 41, in __init__
    self.data = self.read_output_file(do_fields=do_fields,
  File "/home/asinn/hipace/tests/checksum/checksum.py", line 67, in read_output_file
    data_lev[field] = self.trim_digits(ds.get_field_checksum(lev, field, self.test_name))
  File "/home/asinn/hipace/tests/checksum/backend/openpmd_backend.py", line 49, in get_field_checksum
    Q = self.dataset.get_field(field=field, iteration=self.dataset.iterations[-1])[0]
  File "/home/asinn/openPMD-Viewer/openPMD-viewer/openpmd_viewer/openpmd_timeseries/main.py", line 503, in get_field
    F, info = self.data_reader.read_field_cartesian(
  File "/home/asinn/openPMD-Viewer/openPMD-viewer/openpmd_viewer/openpmd_timeseries/data_reader/data_reader.py", line 190, in read_field_cartesian
    return io_reader.read_field_cartesian(
  File "/home/asinn/openPMD-Viewer/openPMD-viewer/openpmd_viewer/openpmd_timeseries/data_reader/io_reader/field_reader.py", line 113, in read_field_cartesian
    F = get_data( series, component )
  File "/home/asinn/openPMD-Viewer/openPMD-viewer/openpmd_viewer/openpmd_timeseries/data_reader/io_reader/utilities.py", line 66, in get_data
    chunk = ChunkInfo()
TypeError: __init__(): incompatible constructor arguments. The following argument types are supported:
    1. openpmd_api.openpmd_api_cxx.ChunkInfo(offset: List[int], extent: List[int])

openpmd_viewer/openpmd_timeseries/data_reader/io_reader/utilities.py

ax3l · 2022-04-13T17:56:57Z

@AlexanderSinn thx, fixed the constructor now. Can you please try again?

Can you please also post me a reproducer for the exact HiPACE compilation flags & inputs set that shows the problem?

AlexanderSinn · 2022-04-13T19:29:59Z

It now works again but its still very slow compared to h5py (actually 10 times slower than development, now 20 minutes instead of 1-2 minutes for the ~200MB h5 file, while h5py takes 3 seconds).

To reproduce go to hipace/tests/checksum/backend/openpmd_backend.py and insert something like

import sys
sys.path.insert(1, '/home/asinn/openPMD-viewer/')
from openpmd_viewer import OpenPMDTimeSeries

And change

self.dataset = OpenPMDTimeSeries(filename, backend='h5py')

to

self.dataset = OpenPMDTimeSeries(filename, backend='openpmd-api')

Then compile hipace for cpu (cmake .. -DHiPACE_OPENPMD=ON) and run the open-boundary benchmark

bash ~/hipace/tests/beam_in_vacuum_open_boundary.normalized.1Rank.sh ~/hipace/build/bin/hipace ~/hipace/

AlexanderSinn · 2022-04-13T19:39:48Z

This is the file https://syncandshare.desy.de/index.php/s/dX77jPNAofgwMwJ

openpmd_viewer/openpmd_timeseries/data_reader/io_reader/utilities.py

ax3l · 2022-04-13T21:26:21Z

openpmd_viewer/openpmd_timeseries/data_reader/io_reader/utilities.py

        # mask invalid regions with NaN
        data = np.full_like(record_component, np.nan)
        for chunk in chunks:
            chunk_slice = chunk_to_slice(chunk)
+            print(f"chunk: {chunk}, {chunk_slice}")
            # read only valid region
            x = record_component[chunk_slice]


I wonder if this is related to openPMD/openPMD-api#1225
Prior to #332 we did not copy an extra time.

actually, @AlexanderSinn found that x = record_component[()] is fast, but it becomes slow the moment we have the data = np.full_like(record_component, np.nan) above.

We could try to make the above line a np.full with shape and dtype.

openpmd_viewer/openpmd_timeseries/data_reader/io_reader/utilities.py

When we prepare chunked reads, we assume a single chunk for all backends but ADIOS2. Preparing the returned data, we use `data = np.full_like(record_component, np.nan)`. It turns out that numpy seems to trigger a `__getitem__` access or full copy of our `record_component` at this point, which causes severe slowdown. This was first seen for particles, but affects every read where we do not slice a subset. Co-authored-by: AlexanderSinn <[email protected]>

AlexanderSinn

Now it works and is just as fast as h5py!

ax3l · 2022-04-14T00:33:37Z

Thank you so much for your help, Alex. Much appreciated 💖

RemiLehe · 2022-04-14T17:00:15Z

@AlexanderSinn @ax3l Thank you so much for figuring this out, and for fixing it! ✨

ax3l mentioned this pull request Apr 12, 2022

Change checksum backend to h5py to improve CI performance Hi-PACE/hipace#725

Merged