Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] improved polars support - native logic #342

Open
fkiraly opened this issue May 18, 2024 · 1 comment
Open

[ENH] improved polars support - native logic #342

fkiraly opened this issue May 18, 2024 · 1 comment
Assignees
Labels
feature request New feature or request module:datatypes datatypes module: data containers, checkers & converters module:regression probabilistic regression module

Comments

@fkiraly
Copy link
Collaborator

fkiraly commented May 18, 2024

Issue about next steps in extending polars support in skpro.

Background: skpro already supports both lazy and eager polars containers, as polars_eager_table and polars_lazy_table.

However, key limitations:

  • support is currently exclusively via conversion back/forth
  • the round trip to numpy or pandas breaks the lazy chain

So, suggested next steps, using only the eager type:

  1. try polars input/output with a few estimators, fit / predict only. Add tests in a dedicated polars test file, non-systematic for the start.
  2. sklearn now supports polars - so, we should try to pass on polars frames in some of the estimators, extending the X_inner_mtype to both pandas and polars (in a list of str, with polars_eager_table). Example: GaussianProcess
  3. if that works well, we should extend other regresssors wrapping sklearn in the same way.
  4. next I would work on Pipeline, a composite. The Pipeline is native to skpro.

Once the above seems to work, I would look at the lazy` type.
Here, we should use the boilerplate layer to make lazy state changes.
Design will follow when we are there, ideas appreciated.

FYI @julian-fong.

@fkiraly fkiraly added module:regression probabilistic regression module module:datatypes datatypes module: data containers, checkers & converters feature request New feature or request labels May 18, 2024
@fkiraly fkiraly changed the title [ENH] improved polars support - lazy and native logic [ENH] improved polars support - native logic May 18, 2024
@julian-fong
Copy link
Contributor

julian-fong commented May 19, 2024

Linking 6381 as discussion and coordination thread

@fkiraly fkiraly moved this to In Progress in 2024 May-Sep workstreams May 24, 2024
fkiraly pushed a commit that referenced this issue Jun 22, 2024
#### Reference Issues/PRs

#342 is the linked issue that overlays the entire workflow for polars
support. Opening this thread mainly for the test file I am planning to
implement for polars support in skpro

#### What does this implement/fix? Explain your changes.

Implements a test file for testing various fit/predict functions using
polars dataframes. Ideas for any tests regarding testing polars support
in skpro is much appreciated!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request module:datatypes datatypes module: data containers, checkers & converters module:regression probabilistic regression module
Projects
Status: In Progress
Development

No branches or pull requests

2 participants