Kedro dataset CLI commands #3714

datajoely · 2024-03-14T13:44:55Z

Description

Related to the overall plug-in epic of #583 I've been thinking about both the Kedro team's own maintenance burden and what user friction I see with working with dataset contributions today.

Context

At a high level the following points contribute to this status quo:

Datasets are hard to maintain
Dataset contributions are welcome but the barrier is high, often prohibitively so
Datasets that should be contributed never are
Dataset PRs take ages to be merged/released
Lots of copying and pasting is happening
fsspec boilerplate overheard in every single file based class.
Poor metrics on popularity through docs/cli telemetry.

Possible Implementation

I suggest Kedro introduce a set of CLI commands focused on this dataset workflow. We have history of these ideas in the micropackaging journey as well.

They would all follow the kedro dataset <command> pattern:

command	priority	description
`pull`	P0	This would accept either `kedro-datasets` name as per the catalog e.g. `polars.GenericDataSet`. It would pull the source code, add the dependencies and provide an example catalog entry. Longer term we could think about how 3rd party polyrepos could work e.g. (1) (2)
`create`	P0	Create class in users environment with correct structure, may need a workflow for file based (fsspec) or not. Get users contribution ready on day 1, can even include test and lint rules.
`install`	P2	Provide an easy wrapper over the correct `pip` command, adding the dependency to your project and providing an example catalog entry.
`contribute`	P2	Provide a workflow for pushing the results of `pull`s/`create`s back into the open source project

The text was updated successfully, but these errors were encountered:

datajoely added the Issue: Feature Request New feature or improvement to existing feature label Mar 14, 2024

noklam added this to Kedro Framework Mar 20, 2024

github-actions bot mentioned this issue Apr 1, 2024

Monthly issue metrics report #3764

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kedro dataset CLI commands #3714

Kedro dataset CLI commands #3714

datajoely commented Mar 14, 2024

Kedro dataset CLI commands #3714

Kedro dataset CLI commands #3714

Comments

datajoely commented Mar 14, 2024

Description

Context

Possible Implementation