[Debugging] Show which datasets are outdated #2151
Replies: 3 comments
-
Somewhat related: kedro-org/kedro#221, kedro-org/kedro#2307 |
Beta Was this translation helpful? Give feedback.
-
Backlog grooming notes: This was also highlighted in #1750, and would build on the dataset preview and debugging work stream. We should consider implementing this. |
Beta Was this translation helpful? Give feedback.
-
I think showing whether a dataset is outdated or not is tricky, because for remote datasets, you don't have a zero-cost way of computing, say, a hash. So your locally computed hash will become outdated without you noticing. For the record, this is what Dagster does:
What do you think @francisduval ? |
Beta Was this translation helpful? Give feedback.
-
Description
When running
kedro viz run
, there is no way to know which datasets are up to date and which ones are outdated. A dataset is said to be outdated if the code upstream has changed since the dataset was run for the last time. This feature exists with the Targets package in R. Also, when you run the targets pipeline, only nodes that are outdated are run, which saves computing time.Context
This could be a nice feature since without it, there is no effective way to tell which parts of the pipeline you should rerun when changes have been made to the code. Sometimes, you are unsure if a dataset is up to date or not, and then you have to rerun it to be sure, which can take a long time.
Possible Implementation
Color datasets that are outdated with another color. Also, it would be nice to have a
kedro
command that would only run outdated datasets, such askedro run --only_outdated
orkedro run --pipeline pipeline_name --only_outdated
.Checklist
Beta Was this translation helpful? Give feedback.
All reactions