Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plan for Renaming DataSet to Dataset #2740

Closed
6 tasks
noklam opened this issue Jun 28, 2023 · 11 comments
Closed
6 tasks

Plan for Renaming DataSet to Dataset #2740

noklam opened this issue Jun 28, 2023 · 11 comments
Assignees
Labels
Type: Parent Issue Type: Technical DR 💾 Decision Records (technical decisions made)

Comments

@noklam
Copy link
Contributor

noklam commented Jun 28, 2023

Background

This is a parent issue to document what've been done and what more to be done. This refactoring isn't simple as just changing the name of the class, and the success of this require coordination of dev and marketing. As this also affect various things (kedro.org, blogs for new Dataset contributions)

Done:

Ongoing:

Things that are missing

Questions:

  • Do we need to finish all of this before release or we can find some midpoint?
  • It is a bit awkward if we start showing Deprecation warnings in kedro but kedro-datasets are still using DataSet. Particular for users using kedro-datasets already, it may resulted in hybrid DataSet and Dataset in their catalog.yml @deepyaman
@astrojuanlu
Copy link
Member

@deepyaman
Copy link
Member

Not sure the way of suppressing warnings is right...

>>> from kedro.io import MemoryDataSet
>>> from kedro.io.memory_dataset import MemoryDataSet
<stdin>:1: DeprecationWarning: 'MemoryDataSet' has been renamed to 'MemoryDataset', and the alias will be removed in Kedro 0.19.0
>>> 

We should still show DeprecationWarning.

It's annoying, but there's definitely precedent for still showing warnings, even when the user can't control them (like all the numpy warnings because of what libraries that use numpy do). Let me think if there's something better, but I think the lesser of two evils is to raise the warnings.

@astrojuanlu
Copy link
Member

@deepyaman sorry, I clarify - I was not advocating to suppress the DeprecationWarnings. I was advocating for only showing them for code that users wrote, and when there's an alternative. In particular, kedro-datasets hasn't been migrated already, but doing from kedro_datasets.pandas import CSVDataSet will emit a DeprecationWarning that the user can't do anything about. That's what kedro-org/kedro-plugins#255 aims to fix.

@deepyaman
Copy link
Member

@deepyaman sorry, I clarify - I was not advocating to suppress the DeprecationWarnings. I was advocating for only showing them for code that users wrote, and when there's an alternative. In particular, kedro-datasets hasn't been migrated already, but doing from kedro_datasets.pandas import CSVDataSet will emit a DeprecationWarning that the user can't do anything about. That's what kedro-org/kedro-plugins#255 aims to fix.

Understood; so are we saying we can undo the suppression in kedro/io/__init__.py then? #2746 is actually semi-choking on that (just a test, I could ignore it), hence why I started looking into it. :)

@deepyaman
Copy link
Member

Do we need to update these extra install with Dataset as well?

kedro/setup.py

Lines 44 to 50 in 6e5e4e1

pandas_require = {
"pandas.CSVDataSet": [PANDAS],
"pandas.ExcelDataSet": [PANDAS, "openpyxl>=3.0.6, <4.0"],
"pandas.FeatherDataSet": [PANDAS],
"pandas.GBQTableDataSet": [PANDAS, "pandas-gbq>=0.12.0, <0.18.0"],
"pandas.GBQQueryDataSet": [PANDAS, "pandas-gbq>=0.12.0, <0.18.0"],
"pandas.HDFDataSet": [

pip standardizes non-lowercase extras, so this would be a nice change for consistency, but it's not strictly necessary/high priority.

@zedrdave
Copy link

At the moment, it seems like kedro itself is triggering its own DeprecationWarning:

For example, the io package __init__.py contains:
from .cached_dataset import CachedDataset, CachedDataSet

(and similarly for other renamed ***Dataset classes)

This does not seem like a desirable behaviour… Or am I missing something?

@astrojuanlu
Copy link
Member

For reference, @zedrdave question was discussed in #2882

@astrojuanlu
Copy link
Member

Relevant: kedro-org/kedro-plugins#313

@zedrdave
Copy link

For reference, @zedrdave question was discussed in #2882

I do not think these 2 issues are the same.

One has to do with Kedro's import order (which triggers warnings when using custom classes), and the other has to do with the fact that current codebase (as of a month ago) was attempting to load both new and old class names, triggering a DeprecationWarning that users could not do anything about.

@astrojuanlu
Copy link
Member

Ah, thanks for the clarification @zedrdave, I read too fast 👍🏽

@merelcht merelcht moved this to To Do in Kedro Framework Sep 1, 2023
@merelcht merelcht self-assigned this Sep 4, 2023
@merelcht
Copy link
Member

merelcht commented Sep 5, 2023

Closing this in favour of #2129 which is the full list of tasks for renaming assuming that datasets have by that point been removed from the core Kedro project.

@merelcht merelcht closed this as completed Sep 5, 2023
@github-project-automation github-project-automation bot moved this from To Do to Done in Kedro Framework Sep 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Parent Issue Type: Technical DR 💾 Decision Records (technical decisions made)
Projects
Archived in project
Development

No branches or pull requests

5 participants