Skip to content

Commit

Permalink
Fix: Avoid _like function in Chunking
Browse files Browse the repository at this point in the history
When we prepare chunked reads, we assume a single chunk for all
backends but ADIOS2. Preparing the returned data, we use
`data = np.full_like(record_component, np.nan)`. It turns out
that numpy seems to trigger a `__getitem__` access or full copy
of our `record_component` at this point, which causes severe
slowdown.

This was first seen for particles, but affects every read where
we do not slice a subset.

Co-authored-by: AlexanderSinn <[email protected]>
  • Loading branch information
ax3l and AlexanderSinn committed Apr 13, 2022
1 parent 5511da8 commit 32696e7
Showing 1 changed file with 7 additions and 3 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
License: 3-Clause-BSD-LBNL
"""
import numpy as np
from openpmd_api import ChunkInfo


def chunk_to_slice(chunk):
Expand Down Expand Up @@ -56,17 +57,20 @@ def get_data(series, record_component, i_slice=None, pos_slice=None,
if i_slice is not None and not isinstance(i_slice, list):
i_slice = [i_slice]

chunks = record_component.available_chunks()

# read whole data set
if pos_slice is None:
print(f"record_component: {record_component}, {record_component.shape}, {record_component.dtype}")
# mask invalid regions with NaN
data = np.full_like(record_component, np.nan)
# note: full_like triggers a full read, thus we avoid it #340
data = np.full(record_component.shape, np.nan, record_component.dtype)
for chunk in chunks:
chunk_slice = chunk_to_slice(chunk)
print(f"chunk: {chunk}, {chunk_slice}")
# read only valid region
x = record_component[chunk_slice]
series.flush()
data[chunk_slice] = x
# slice: read only part of the data set
else:
full_shape = record_component.shape

Expand Down

0 comments on commit 32696e7

Please sign in to comment.