You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, arrow misses the support of pyarrow.compute.replace_with_mask for struct arrays: apache/arrow#29558
That's why we have our own implementation used by NestedExtenstionArray.__setitem__(). The implementation has an overhead of creating a len(self)-sized struct array to perform the replacement. This approach would work well when we are going to replace many elements, but when we replacing just few, it would produce a large memory foot-print and probably take a while.
An alternative approach would be copying the original array to np.ndarray[pa.StructScalar], replace the elements in-place, and convert it back:
defreplace_with_mask(array: pa.ChunkedArray, mask: pa.BooleanArray, value: pa.Array) ->pa.ChunkedArray:
"""Replace the elements of the array with the value where the mask is True"""np_array=np.fromiter(array, dtype=object)
np_array[mask] =valuenew_pa_array=pa.array(np_array)
returnpa.chunked_array([new_pa_array])
We should create a benchmark and see what works faster and have smaller memory foot-print.
The text was updated successfully, but these errors were encountered:
Benchmarks reveal the problem with single element assignment performance, this rise happened after we switched from ArrowExtensionArray to a custom implementation of NestedExtensionArray:
Currently, arrow misses the support of
pyarrow.compute.replace_with_mask
for struct arrays:apache/arrow#29558
That's why we have our own implementation used by
NestedExtenstionArray.__setitem__()
. The implementation has an overhead of creating alen(self)
-sized struct array to perform the replacement. This approach would work well when we are going to replace many elements, but when we replacing just few, it would produce a large memory foot-print and probably take a while.An alternative approach would be copying the original array to
np.ndarray[pa.StructScalar]
, replace the elements in-place, and convert it back:We should create a benchmark and see what works faster and have smaller memory foot-print.
The text was updated successfully, but these errors were encountered: