You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Today, we have these ways to aggregate a single nested column values:
nf.reduce(np.mean, "lc.mag") - good, but not cheap and requires to join the output back to the frame
nf.eval("lc.mag.groupby(by=lc.mag.index).mean()") - expansive and not intuitive
It would be nice if we can develop an easier way of doing such aggregations. Options I see:
Currently, we can do nf.eval("lc.mag.mean()") / nf["lc.mag"].mean(), but it would output the aggregation over all the flat values, which is, especially in the first case, not intuitive. We can redefine it.
Add special interface for nested aggregations with .nest accessor, e.g. nf.lc.nest.mean() would return nf.shape[0] mean values.
Add special methods which would work in eval/query environment only, e.g. nf.eval("lc.mag.nest_mean()")
However I'm not sure how we'd make all these performant, it looks like pyarrow provides almost zero tooling for that. Maybe we can use things like numpy.ufunc.reduceat and scipy.ndimage.mean.
Before submitting
Please check the following:
I have described the purpose of the suggested change, specifying what I need the enhancement to accomplish, i.e. what problem it solves.
I have included any relevant links, screenshots, environment information, and data relevant to implementing the requested feature, as well as pseudocode for how I want to access the new functionality.
If I have ideas for how the new feature could be implemented, I have provided explanations and/or pseudocode and/or task lists for the steps.
The text was updated successfully, but these errors were encountered:
Feature request
Today, we have these ways to aggregate a single nested column values:
nf.reduce(np.mean, "lc.mag")
- good, but not cheap and requires to join the output back to the framenf.eval("lc.mag.groupby(by=lc.mag.index).mean()")
- expansive and not intuitiveIt would be nice if we can develop an easier way of doing such aggregations. Options I see:
nf.eval("lc.mag.mean()")
/nf["lc.mag"].mean()
, but it would output the aggregation over all the flat values, which is, especially in the first case, not intuitive. We can redefine it..nest
accessor, e.g.nf.lc.nest.mean()
would returnnf.shape[0]
mean values.eval/query
environment only, e.g.nf.eval("lc.mag.nest_mean()")
However I'm not sure how we'd make all these performant, it looks like
pyarrow
provides almost zero tooling for that. Maybe we can use things likenumpy.ufunc.reduceat
andscipy.ndimage.mean
.Before submitting
Please check the following:
The text was updated successfully, but these errors were encountered: