Fix: Avoid _like function in Chunking

When we prepare chunked reads, we assume a single chunk for all backends but ADIOS2. Preparing the returned data, we use `data = np.full_like(record_component, np.nan)`. It turns out that numpy seems to trigger a `__getitem__` access or full copy of our `record_component` at this point, which causes severe slowdown. This was first seen for particles, but affects every read where we do not slice a subset. Co-authored-by: AlexanderSinn <[email protected]>
openPMD · Apr 13, 2022 · 32696e7 · 32696e7
1 parent 5511da8
commit 32696e7
Showing 1 changed file with 7 additions and 3 deletions.
diff --git a/openpmd_viewer/openpmd_timeseries/data_reader/io_reader/utilities.py b/openpmd_viewer/openpmd_timeseries/data_reader/io_reader/utilities.py
@@ -9,6 +9,7 @@
 License: 3-Clause-BSD-LBNL
 """
 import numpy as np
+from openpmd_api import ChunkInfo
 
 
 def chunk_to_slice(chunk):
@@ -56,17 +57,20 @@ def get_data(series, record_component, i_slice=None, pos_slice=None,
     if i_slice is not None and not isinstance(i_slice, list):
         i_slice = [i_slice]
 
-    chunks = record_component.available_chunks()
-
+    # read whole data set
     if pos_slice is None:
+        print(f"record_component: {record_component}, {record_component.shape}, {record_component.dtype}")
         # mask invalid regions with NaN
-        data = np.full_like(record_component, np.nan)
+        #   note: full_like triggers a full read, thus we avoid it #340
+        data = np.full(record_component.shape, np.nan, record_component.dtype)
         for chunk in chunks:
             chunk_slice = chunk_to_slice(chunk)
+            print(f"chunk: {chunk}, {chunk_slice}")
             # read only valid region
             x = record_component[chunk_slice]
             series.flush()
             data[chunk_slice] = x
+    # slice: read only part of the data set
     else:
         full_shape = record_component.shape