Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/pandas api idxmax #25

Merged

Conversation

tortolavivo23
Copy link

@tortolavivo23 tortolavivo23 commented Jan 8, 2024

Feature idxmax

What does this change introduce?

  • Provides an implemementation for the idxmax method

Parameters:

Name Type Description Default
axis int The axis to calculate the minimum across 0 is columns, 1 is rows. 0
skipna bool Ignore any null values along the axis. True
numeric_only bool Only use columns of the table that are of a numeric data type. False

Returns:

Type Description
Dictionary A dictionary where the key represent the column name / row number and the values are the result of calling idxmax on that column / row.

It follow the pandas definition. It returns a dictionary instead of a pandas series.

General

  • Has an example been added to demo the new feature?
  • Have existing examples been updated or tested?
  • Have you added any new Environment Variables/Configuration Options? If yes please tick the boxes below as applicable
    • Addition to reimporter logic within src/pykx/pykx.q and src/pykx/reimporter.py
    • Have updated the src/pykx/util.py logic which is used for environment variable
  • If there have been any dependency updates have they been reflected in all files?
    • pyproject.toml
    • docs/getting-started/installing.md
    • conda-recipe/meta.yaml
    • README.md
  • If any examples have been updated has it's associated .zip been updated

Code

  • Has all temporary code used during development been removed?
  • Has all commented out (unused) code been removed?
  • Where reasonable have you ensured there is no duplication of existing code?
  • If applicable for your use-case have you ensured that the code is performant?

Testing

  • Have unit tests been created or existing ones updated to test this new functionality?

Documentation

  • Has documentation been added for all public code?
  • Has a release note been included for the new feature?
  • Has any documentation which would benefit from this feature been updated to use the most up to date functionality?
  • If a new class has been added has a documentation stub .md file associated with it been created?
  • If any documentation page has been created has it been added to mkdocs.yml
  • Have you checked your changes with a spell checker? (US English)

@tortolavivo23 tortolavivo23 self-assigned this Jan 8, 2024
@github-actions github-actions bot added documentation Improvements or additions to documentation python tests labels Jan 8, 2024
@tortolavivo23 tortolavivo23 linked an issue Jan 8, 2024 that may be closed by this pull request
src/pykx/pandas_api/pandas_meta.py Outdated Show resolved Hide resolved
docs/user-guide/advanced/Pandas_API.ipynb Show resolved Hide resolved
src/pykx/pandas_api/pandas_meta.py Outdated Show resolved Hide resolved
src/pykx/pandas_api/pandas_meta.py Outdated Show resolved Hide resolved
src/pykx/pandas_api/pandas_meta.py Outdated Show resolved Hide resolved
src/pykx/pandas_api/pandas_meta.py Outdated Show resolved Hide resolved
docs/user-guide/advanced/Pandas_API.ipynb Outdated Show resolved Hide resolved
@tortolavivo23 tortolavivo23 changed the base branch from main to feature/pandas-api-2nd-block-extra January 10, 2024 11:51
Copy link
Member

@neutropolis neutropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that the implementation isn't correct. Take a detailed look at preparse_computations since I think you aren't handling its output correctly. I haven't reviewed further aspects of this contribution (like tests or docs).

@@ -228,6 +228,14 @@ def max(self, axis=0, skipna=True, numeric_only=False):
res
), cols)

@convert_result
def idxmax(self, axis=0, skipna=True, numeric_only=False):
tab = self
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this line and use self directly. It's fine if you want to keep it, since it would be consistent with other functions, just a personal taste.

@@ -228,6 +228,14 @@ def max(self, axis=0, skipna=True, numeric_only=False):
res
), cols)

@convert_result
def idxmax(self, axis=0, skipna=True, numeric_only=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original API also accepts index or columns as possible inputs for axis. As far as I know, preparse_computations is unable to deal with symbols.

def idxmax(self, axis=0, skipna=True, numeric_only=False):
tab = self
res, cols = preparse_computations(tab, axis, skipna, numeric_only)
col_names = _get_numeric_only_subtable_with_bools(tab)[1] if numeric_only else tab.columns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get why you use this expression here, isn't col_names exactly cols?

tab = self
res, cols = preparse_computations(tab, axis, skipna, numeric_only)
col_names = _get_numeric_only_subtable_with_bools(tab)[1] if numeric_only else tab.columns
max_vals = [elems.index(max(elems)) for elems in res]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is broken, elems has forgotten original indexes. Take a look at this case:

>>> t
pykx.Table(pykx.q('
a b
---
  1
1 2
2 3
'))
>>> t.pd()
     a  b
0  NaN  1
1  1.0  2
2  2.0  3
>>> t.pd().idxmax()
a    2
b    2
dtype: int64
>>> t.idxmax()
pykx.Dictionary(pykx.q('
a| 1
b| 2
'))

Co-authored-by: Jesús López-González <[email protected]>
@neutropolis neutropolis marked this pull request as ready for review February 13, 2024 08:06
@neutropolis neutropolis merged commit 4f59edd into feature/pandas-api-2nd-block-extra Feb 13, 2024
1 check passed
@neutropolis neutropolis deleted the feature/pandas-api-idxmax branch February 13, 2024 08:28
nipsn pushed a commit that referenced this pull request Mar 11, 2024
Fix broken link (remote-functions)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation python tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature: Implement idxmax from Pandas API
3 participants