Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas sort_values with multiple columns does not work for AffineScalarFunc #186

Open
NelDav opened this issue Jan 12, 2024 · 1 comment

Comments

@NelDav
Copy link

NelDav commented Jan 12, 2024

When sorting a pandas dataframe by multiple columns and one of the columns contains values of type AffineScalarFunc, the sort failes due to missing __hash__ method.

TypeError: unhashable type: 'AffineScalarFunc'

The Variable type implements a __hash__ method. Therfore it is possible to do it with this type:

>>> a = pd.DataFrame([
... [ufloat(2,0.021), 8],
... [ufloat(3,0.002), 7],
... [ufloat(1,0.001), 9]])
>>> a
                 0  1
0    2.000+/-0.021  8
1  3.0000+/-0.0020  7
2  1.0000+/-0.0010  9

>>> a.sort_values(by=[0,1])
                 0  1
2  1.0000+/-0.0010  9
0    2.000+/-0.021  8
1  3.0000+/-0.0020  7

As soon as we start calculating, the dataframe no longer contains values of type Variable but of type AffineScalarFunc.
Because of that, sorting multiple columns does no longer work:

>>> a[0] = a[0] * ufloat(1, 0.01)
>>> a
               0  1
0  2.000+/-0.029  8
1  3.000+/-0.030  7
2  1.000+/-0.010  9

>>> a.sort_values(by=[0,1])
.
.
.
TypeError: unhashable type: 'AffineScalarFunc'

To enable this functionality, 'AffineScalarFunc' must be hashable.

I think this would be possible by implementing something like this:

def __hash__(self):
        ids = [id(d) for d in self.derivatives.keys()]
        return hash((self._nominal_value, self._linear_part, tuple(ids)))

I think the derivative ids must be part of the hash to make the has dependent from the derivatives.
Additionally, I think that the nominal and linear part must also be part of the hash to ensure different hashes in case the uncertainty is multiplied with a regular float:

k = ufloat(3, 0.0021)
u = k * 2
hash(k) != hash(u) #this should be the case right?
@wshanks
Copy link
Collaborator

wshanks commented Apr 8, 2024

I thought #184 might help with this but after playing around with the ExtensionArray API I came to the conclusion that it can not. I opened pandas-dev/pandas#58182 regarding that. I feel like if the ExtensionArray subclass knows how to sort its items it should not be necessary for the items to be hashable but the current implementation relies on hashability for tracking which items are equivalent (and sorting by multiple columns only makes sense if some items are equivalent; otherwise you could just sort by a single column). There would need to be an API that sorted with equivalence, like argsort but assigning the same index to items that are equivalent instead of choosing an ordering of them (so the output is not one-to-one reordering of range(len(array))).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants