Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can a struct be aggregated with 'list' on a table.group_by? #44383

Open
robhod opened this issue Oct 11, 2024 · 0 comments
Open

Can a struct be aggregated with 'list' on a table.group_by? #44383

robhod opened this issue Oct 11, 2024 · 0 comments
Labels
Component: Python Type: usage Issue is a user question

Comments

@robhod
Copy link

robhod commented Oct 11, 2024

Describe the usage question you have. Please include as many useful details as possible.

I'm trying to create a list of structs based on a grouping column. See below:
Hitting not implemented error but hash_list docs (https://arrow.apache.org/docs/cpp/compute.html#aggregations) suggest it supports any input type so wasn't sure if there was an issue in how I set this up or if I should raise a feature request/bug?

import pyarrow as pa

# source data
table = pa.table(
    {
        "col1": [1, 1, 2, 2, 3],
        "struct_col": [
            {"a": 1, "b": "testa"},
            {"a": 1, "b": "testb"},
            {"a": 2, "b": "testc"},
            {"a": 2, "b": "testd"},
            {"a": 3, "b": "teste"},
        ],
    }
)

# required output
grouped_table = pa.table(
    {
        "grouped": [1, 2, 3],
        "agg_struct_col": [
            [{"a": 1, "b": "testa"}, {"a": 1, "b": "testb"}],
            [{"a": 2, "b": "testc"}, {"a": 2, "b": "testd"}],
            [{"a": 3, "b": "teste"}],
        ],
    }
)

# using group_by
grouped = table.group_by("col1").aggregate([("struct_col", "list")])

File "scratch/pyarrowexample.py", line 30, in
grouped = table.group_by("col1").aggregate([("struct_col", "hash_list")])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pyarrow/table.pxi", line 6359, in pyarrow.lib.TableGroupBy.aggregate
File "l/.venv/lib/python3.11/site-packages/pyarrow/acero.py", line 403, in _group_by
return decl.to_table(use_threads=use_threads)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pyarrow/_acero.pyx", line 590, in pyarrow._acero.Declaration.to_table
File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowNotImplementedError: Function 'hash_list' has no kernel matching input types (struct<a: int64, b: string>, uint32)

Component(s)

Python

@robhod robhod added the Type: usage Issue is a user question label Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Python Type: usage Issue is a user question
Projects
None yet
Development

No branches or pull requests

1 participant