-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support polars and other data libraries via dataframe interchange #3369
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #3369 +/- ##
==========================================
- Coverage 98.33% 98.32% -0.01%
==========================================
Files 77 77
Lines 24335 24381 +46
==========================================
+ Hits 23929 23973 +44
- Misses 406 408 +2
|
thanks for the ping I tried this out, but got a failure for the tips = pl.from_pandas(sns.load_dataset('tips'))
g = sns.FacetGrid(tips, col="time", row="sex")
g.map(sns.scatterplot, "total_bill", "tip") SchemaError Traceback (most recent call last)
Cell In[4], line 2
1 g = sns.FacetGrid(tips, col="time", row="sex")
----> 2 g.map(sns.scatterplot, "total_bill", "tip")
File ~/seaborn-dev/seaborn/axisgrid.py:720, in FacetGrid.map(self, func, *args, **kwargs)
717 warnings.warn(warning)
719 # Iterate over the data subsets
--> 720 for (row_i, col_j, hue_k), data_ijk in self.facet_data():
721
722 # If this subset is null, move on
723 if not data_ijk.values.size:
724 continue
File ~/seaborn-dev/seaborn/axisgrid.py:674, in FacetGrid.facet_data(self)
670 # Here is the main generator loop
671 for (i, row), (j, col), (k, hue) in product(enumerate(row_masks),
672 enumerate(col_masks),
673 enumerate(hue_masks)):
--> 674 data_ijk = data[row & col & hue & self._not_na]
675 yield (i, j, k), data_ijk
File ~/seaborn-dev/.venv/lib/python3.10/site-packages/polars/series/series.py:439, in Series.__and__(self, other)
437 if not isinstance(other, Series):
438 other = Series([other])
--> 439 return self._from_pyseries(self._s.bitand(other._s))
SchemaError: cannot unpack series of type `list[bool]` into `bool` |
Right, this just addresses the objects interface; the older code will need to be handled separately. Are you seeing any errors in the |
I see, thanks! I tried
and both commands pass without errors (in my |
ba32db8
to
a8df532
Compare
… test consequences
Alright after a bit of aditional work this should now support alternative dataframes throughout seaborn. Wouldn't surprise me if there are still a few weird edge cases here and there but we'll need users to surface those as I don't do any actual work with these libraries. Gonna merge over the failing codecov check which is on the pandas <2.0.2 warning. There's not an easy way to exercise that code, and it's just a warning. Thanks @MarcoGorelli for getting the ball rolling and weighing in here. |
This PR leverages the dataframe interchange protocol to let seaborn consume objects from other dataframe libraries, such as polars.
Dataframes are converted to pandas objects upon consumption, and that is what is used internally by seaborn, so seaborn's statistical operations don't take advantage of any parallelism / out of core / etc. functionality offered by these libraries. While that would be ideal, I don't see it happening any time soon.
Nevertheless, this should make it easy to prep data in a library of choice and then pipe it to seaborn without thinking too much about the representation.
For testing, my approach is to use a simple mock object that is not (by inheritance) a pandas DataFrame, but that does "support" the interchange protocol. I think this is sufficient, with the assumption that pandas / other data libraries will together correctly implement the dataframe interchange itself. Testing whether that works correctly feels out of scope for seaborn's unit tests.
I wouldn't be surprised to learn of various edge cases as this roles out to people using alternative dataframe libraries heavily (I only did some light testing with polars using toy datasets) so we'll address those as they happen.
Thanks to @MarcoGorelli for getting the ball rolling with #3340 and advising on the approach.
Closes #3368