Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove PartitionedDataset and IncrementalDataset from kedro.io #3187

Merged
merged 16 commits into from
Oct 24, 2023

Conversation

SajidAlamQB
Copy link
Contributor

@SajidAlamQB SajidAlamQB commented Oct 17, 2023

NOTE: Kedro datasets are moving from kedro.extras.datasets to a separate kedro-datasets package in
kedro-plugins repository. Any changes to the dataset implementations
should be done by opening a pull request in that repository.

Description

We migrated PartitionedDataset and IncrementalDataset to kedro-datasets, kedro-org/kedro-plugins#253, so we can remove them from framework.

Development notes

Updated test_data_catalog.py to test the confrim method using the moved IncrementalDataset.

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

  • Read the contributing guidelines
  • Signed off each commit with a Developer Certificate of Origin (DCO)
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes
  • Checked if this change will affect Kedro-Viz, and if so, communicated that with the Viz team

@SajidAlamQB SajidAlamQB self-assigned this Oct 17, 2023
Signed-off-by: SajidAlamQB <[email protected]>
Signed-off-by: SajidAlamQB <[email protected]>
@SajidAlamQB
Copy link
Contributor Author

This needs a kedro-datasets release.

@SajidAlamQB SajidAlamQB marked this pull request as ready for review October 17, 2023 15:42
@noklam noklam linked an issue Oct 17, 2023 that may be closed by this pull request
@noklam
Copy link
Contributor

noklam commented Oct 17, 2023

Is this going into kedro-org/kedro-plugins#388 or 2.0?

@SajidAlamQB
Copy link
Contributor Author

Is this going into kedro-org/kedro-plugins#388 or 2.0?

Isn't it going into 1.8.0? kedro-org/kedro-plugins#253

@merelcht
Copy link
Member

Is this going into kedro-org/kedro-plugins#388 or 2.0?

Isn't it going into 1.8.0? kedro-org/kedro-plugins#253

Yes it's going into 1.8.0!

Copy link
Contributor

@stichbury stichbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good from my perspective. I adjusted a couple of words for Vale feedback but we can ignore most of the whining because there's no easier/better way to explain it.

Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to add this to the release notes! Kedro datasets 1.8.0 is released, so this should be good to go.

@astrojuanlu
Copy link
Member

Ugh:

sphinx.errors.SphinxWarning: [autosummary] failed to import kedro_datasets.polars.LazyPolarsDataset.
Possible hints:
* ImportError: cannot import name 'AbstractVersionedDataSet' from 'kedro.io.core' (/home/docs/checkouts/readthedocs.org/user_builds/kedro/checkouts/3187/kedro/io/core.py)
* ModuleNotFoundError: No module named 'kedro_datasets.polars.LazyPolarsDataset'
* ImportError: 

Warning, treated as error:
[autosummary] failed to import kedro_datasets.polars.LazyPolarsDataset.
Possible hints:
* ImportError: cannot import name 'AbstractVersionedDataSet' from 'kedro.io.core' (/home/docs/checkouts/readthedocs.org/user_builds/kedro/checkouts/3187/kedro/io/core.py)
* ModuleNotFoundError: No module named 'kedro_datasets.polars.LazyPolarsDataset'
* ImportError:

This is because of kedro-datasets 1.8.0 of course. But why didn't this get caught by the RTD build over there...?

@astrojuanlu
Copy link
Member

The build over there passed https://readthedocs.org/projects/kedro-datasets/builds/22330550/

@astrojuanlu
Copy link
Member

Okay this is why:

The RTD project for kedro-datasets used Kedro 0.18.14. However, this PR is trying to build with the develop version of Kedro (obviously), and that causes this error:

In [1]: from kedro_datasets.polars import LazyPolarsDataset
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[1], line 1
----> 1 from kedro_datasets.polars import LazyPolarsDataset

File ~/.micromamba/envs/kedro38-dev/lib/python3.8/site-packages/lazy_loader/__init__.py:77, in attach.<locals>.__getattr__(name)
     75 elif name in attr_to_modules:
     76     submod_path = f"{package_name}.{attr_to_modules[name]}"
---> 77     submod = importlib.import_module(submod_path)
     78     attr = getattr(submod, name)
     80     # If the attribute lives in a file (module) with the same
     81     # name as the attribute, ensure that the attribute and *not*
     82     # the module is accessible on the package.

File ~/.micromamba/envs/kedro38-dev/lib/python3.8/importlib/__init__.py:127, in import_module(name, package)
    125             break
    126         level += 1
--> 127 return _bootstrap._gcd_import(name[level:], package, level)

File ~/.micromamba/envs/kedro38-dev/lib/python3.8/site-packages/kedro_datasets/polars/lazy_polars_dataset.py:14
     12 import polars as pl
     13 import pyarrow.dataset as ds
---> 14 from kedro.io.core import (
     15     AbstractVersionedDataSet,
     16     DatasetError,
     17     Version,
     18     get_filepath_str,
     19     get_protocol_and_path,
     20 )
     22 ACCEPTED_FILE_FORMATS = ["csv", "parquet"]
     24 PolarsFrame = Union[pl.LazyFrame, pl.DataFrame]

ImportError: cannot import name 'AbstractVersionedDataSet' from 'kedro.io.core' (/Users/juan_cano/Projects/QuantumBlack Labs/kedro/kedro/io/core.py)

It turns out the new datasets are using the old, deprecated names:

https://github.com/kedro-org/kedro-plugins/blob/87b446c92848f03521d79d03bc37a1caa83cea57/kedro-datasets/kedro_datasets/polars/lazy_polars_dataset.py#L14-L15

This was an oversight while reviewing kedro-org/kedro-plugins#350 ✋🏽

The best course of action, I think, is to not have those datasets in the docs yet, move forward with this PR, and in parallel make a fix on kedro-datasets and release a 1.8.1 soon-ish.

@merelcht
Copy link
Member

The best course of action, I think, is to not have those datasets in the docs yet, move forward with this PR, and in parallel make a fix on kedro-datasets and release a 1.8.1 soon-ish.

We'll have to do that ASAP then, because the whole reason of getting 1.8.0 out is that it was blocking changes that need to go in for 2.0.0.

@SajidAlamQB SajidAlamQB merged commit 60f676e into develop Oct 24, 2023
52 of 58 checks passed
@SajidAlamQB SajidAlamQB deleted the remove-partitoned-and-incremental branch October 24, 2023 17:11
@astrojuanlu
Copy link
Member

The bugfix can maybe go directly into 2.0.0, depending on how much time do we intend to wait for that one. As long as it's release before Kedro 0.19.0...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove PartitionedDataSet and IncrementalDataSet from kedro.io
5 participants