Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Avoid _like function in Chunking #340

Merged
merged 1 commit into from
Apr 14, 2022

Conversation

ax3l
Copy link
Member

@ax3l ax3l commented Apr 12, 2022

When we prepare chunked reads, we assume a single chunk for all backends but ADIOS2.
Preparing the returned data, we use data = np.full_like(record_component, np.nan). It turns out that numpy seems to trigger a __getitem__ access or full copy of our record_component at this point, which causes severe slowdown.

Refs.:

This was first seen for particles, but affects every read where we do not slice a subset.

Thanks a lot to @AlexanderSinn for debugging this with me ✨

This fixes the performance regression that @AlexanderSinn, @MaxThevenet and @SeverinDiederichs saw in Hi-PACE/hipace#725

Regression to #332 #334

@AlexanderSinn
Copy link
Member

Doesn't seem to work yet

Traceback (most recent call last):
  File "/home/asinn/hipace//tests/checksum/checksumAPI.py", line 158, in <module>
    evaluate_checksum(args.test_name, args.file_name, rtol=args.rtol,
  File "/home/asinn/hipace//tests/checksum/checksumAPI.py", line 60, in evaluate_checksum
    test_checksum = Checksum(test_name, file_name, do_fields=do_fields,
  File "/home/asinn/hipace/tests/checksum/checksum.py", line 41, in __init__
    self.data = self.read_output_file(do_fields=do_fields,
  File "/home/asinn/hipace/tests/checksum/checksum.py", line 67, in read_output_file
    data_lev[field] = self.trim_digits(ds.get_field_checksum(lev, field, self.test_name))
  File "/home/asinn/hipace/tests/checksum/backend/openpmd_backend.py", line 49, in get_field_checksum
    Q = self.dataset.get_field(field=field, iteration=self.dataset.iterations[-1])[0]
  File "/home/asinn/openPMD-Viewer/openPMD-viewer/openpmd_viewer/openpmd_timeseries/main.py", line 503, in get_field
    F, info = self.data_reader.read_field_cartesian(
  File "/home/asinn/openPMD-Viewer/openPMD-viewer/openpmd_viewer/openpmd_timeseries/data_reader/data_reader.py", line 190, in read_field_cartesian
    return io_reader.read_field_cartesian(
  File "/home/asinn/openPMD-Viewer/openPMD-viewer/openpmd_viewer/openpmd_timeseries/data_reader/io_reader/field_reader.py", line 113, in read_field_cartesian
    F = get_data( series, component )
  File "/home/asinn/openPMD-Viewer/openPMD-viewer/openpmd_viewer/openpmd_timeseries/data_reader/io_reader/utilities.py", line 66, in get_data
    chunk = ChunkInfo()
TypeError: __init__(): incompatible constructor arguments. The following argument types are supported:
    1. openpmd_api.openpmd_api_cxx.ChunkInfo(offset: List[int], extent: List[int])

@ax3l
Copy link
Member Author

ax3l commented Apr 13, 2022

@AlexanderSinn thx, fixed the constructor now. Can you please try again?

Can you please also post me a reproducer for the exact HiPACE compilation flags & inputs set that shows the problem?

@AlexanderSinn
Copy link
Member

AlexanderSinn commented Apr 13, 2022

It now works again but its still very slow compared to h5py (actually 10 times slower than development, now 20 minutes instead of 1-2 minutes for the ~200MB h5 file, while h5py takes 3 seconds).

To reproduce go to hipace/tests/checksum/backend/openpmd_backend.py and insert something like

import sys
sys.path.insert(1, '/home/asinn/openPMD-viewer/')
from openpmd_viewer import OpenPMDTimeSeries

And change

self.dataset = OpenPMDTimeSeries(filename, backend='h5py')

to

self.dataset = OpenPMDTimeSeries(filename, backend='openpmd-api')

Then compile hipace for cpu (cmake .. -DHiPACE_OPENPMD=ON) and run the open-boundary benchmark

bash ~/hipace/tests/beam_in_vacuum_open_boundary.normalized.1Rank.sh ~/hipace/build/bin/hipace ~/hipace/

@AlexanderSinn
Copy link
Member

This is the file https://syncandshare.desy.de/index.php/s/dX77jPNAofgwMwJ

# mask invalid regions with NaN
data = np.full_like(record_component, np.nan)
for chunk in chunks:
chunk_slice = chunk_to_slice(chunk)
print(f"chunk: {chunk}, {chunk_slice}")
# read only valid region
x = record_component[chunk_slice]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is related to openPMD/openPMD-api#1225
Prior to #332 we did not copy an extra time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, @AlexanderSinn found that x = record_component[()] is fast, but it becomes slow the moment we have the data = np.full_like(record_component, np.nan) above.

We could try to make the above line a np.full with shape and dtype.

@ax3l ax3l changed the title [WIP] Fix: Chunking only for ADIOS2 Data Fix: Avoid _like function in Chunking Apr 13, 2022
@ax3l ax3l force-pushed the fix-chunkingOnlyForADIOS2 branch from 13665fc to 32696e7 Compare April 13, 2022 23:45
@ax3l ax3l requested a review from RemiLehe April 13, 2022 23:46
@ax3l ax3l added the bug label Apr 13, 2022
@ax3l ax3l force-pushed the fix-chunkingOnlyForADIOS2 branch from 32696e7 to 0a31e5a Compare April 13, 2022 23:52
When we prepare chunked reads, we assume a single chunk for all
backends but ADIOS2. Preparing the returned data, we use
`data = np.full_like(record_component, np.nan)`. It turns out
that numpy seems to trigger a `__getitem__` access or full copy
of our `record_component` at this point, which causes severe
slowdown.

This was first seen for particles, but affects every read where
we do not slice a subset.

Co-authored-by: AlexanderSinn <[email protected]>
@ax3l ax3l force-pushed the fix-chunkingOnlyForADIOS2 branch from 0a31e5a to fd41324 Compare April 13, 2022 23:58
Copy link
Member

@AlexanderSinn AlexanderSinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it works and is just as fast as h5py!

@ax3l
Copy link
Member Author

ax3l commented Apr 14, 2022

Thank you so much for your help, Alex. Much appreciated 💖

@RemiLehe
Copy link
Member

@AlexanderSinn @ax3l Thank you so much for figuring this out, and for fixing it! ✨

@RemiLehe RemiLehe merged commit 80eeb5e into openPMD:dev Apr 14, 2022
@ax3l ax3l deleted the fix-chunkingOnlyForADIOS2 branch April 14, 2022 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants