kedro-org · astrojuanlu · Aug 18, 2023 · Jun 27, 2023 · Jun 27, 2023 · Jun 27, 2023
@@ -176,7 +176,7 @@ From version 0.17.0, `TemplatedConfigLoader` also supports the [Jinja2](https://
 ```
 {% for speed in ['fast', 'slow'] %}
 {{ speed }}-trains:
-    type: MemoryDataSet
+    type: MemoryDataset
 
 {{ speed }}-cars:
     type: pandas.CSVDataSet
@@ -197,13 +197,13 @@ The output Python dictionary will look as follows:
 
 ```python
 {
-    "fast-trains": {"type": "MemoryDataSet"},
+    "fast-trains": {"type": "MemoryDataset"},
     "fast-cars": {
         "type": "pandas.CSVDataSet",
         "filepath": "s3://my_s3_bucket/fast-cars.csv",
         "save_args": {"index": True},
     },
-    "slow-trains": {"type": "MemoryDataSet"},
+    "slow-trains": {"type": "MemoryDataset"},
     "slow-cars": {
         "type": "pandas.CSVDataSet",
         "filepath": "s3://my_s3_bucket/slow-cars.csv",

@@ -66,7 +66,7 @@ node(
 )
 ```
 
-In both cases, under the hood parameters are added to the Data Catalog through the method `add_feed_dict()` in [`DataCatalog`](/kedro.io.DataCatalog), where they live as `MemoryDataSet`s. This method is also what the `KedroContext` class uses when instantiating the catalog.
+In both cases, under the hood parameters are added to the Data Catalog through the method `add_feed_dict()` in [`DataCatalog`](/kedro.io.DataCatalog), where they live as `MemoryDataset`s. This method is also what the `KedroContext` class uses when instantiating the catalog.
 
 ```{note}
 You can use `add_feed_dict()` to inject any other entries into your `DataCatalog` as per your use case.

@@ -55,7 +55,7 @@ gear = cars["gear"].values
 The following steps happened behind the scenes when `load` was called:
 
 - The value `cars` was located in the Data Catalog
-- The corresponding `AbstractDataSet` object was retrieved
+- The corresponding `AbstractDataset` object was retrieved
 - The `load` method of this dataset was called
 - This `load` method delegated the loading to the underlying pandas `read_csv` function
 
@@ -70,9 +70,9 @@ This pattern is not recommended unless you are using platform notebook environme
 To save data using an API similar to that used to load data:
 
 ```python
-from kedro.io import MemoryDataSet
+from kedro.io import MemoryDataset
 
-memory = MemoryDataSet(data=None)
+memory = MemoryDataset(data=None)
 io.add("cars_cache", memory)
 io.save("cars_cache", "Memory can store anything.")
 io.load("cars_cache")
@@ -190,7 +190,7 @@ io.save("test_data_set", data1)
 reloaded = io.load("test_data_set")
 assert data1.equals(reloaded)
 
-# raises DataSetError since the path
+# raises DatasetError since the path
 # data/01_raw/test.csv/my_exact_version/test.csv already exists
 io.save("test_data_set", data2)
 ```
@@ -219,7 +219,7 @@ io = DataCatalog({"test_data_set": test_data_set})
 
 io.save("test_data_set", data1)  # emits a UserWarning due to version inconsistency
 
-# raises DataSetError since the data/01_raw/test.csv/exact_load_version/test.csv
+# raises DatasetError since the data/01_raw/test.csv/exact_load_version/test.csv
 # file does not exist
 reloaded = io.load("test_data_set")
 ```
@@ -126,6 +126,7 @@ In the example above, the `catalog.yml` file contains references to credentials
 
 ### Dataset versioning
 
+
 Kedro enables dataset and ML model versioning through the `versioned` definition. For example:
 
 ```yaml
@@ -144,9 +145,9 @@ kedro run --load-version=cars:YYYY-MM-DDThh.mm.ss.sssZ
 ```
 where `--load-version` is dataset name and version timestamp separated by `:`.
 
-A dataset offers versioning support if it extends the [`AbstractVersionedDataSet`](/kedro.io.AbstractVersionedDataset) class to accept a version keyword argument as part of the constructor and adapt the `_save` and `_load` method to use the versioned data path obtained from `_get_save_path` and `_get_load_path` respectively.
+A dataset offers versioning support if it extends the [`AbstractVersionedDataset`](/kedro.io.AbstractVersionedDataset) class to accept a version keyword argument as part of the constructor and adapt the `_save` and `_load` method to use the versioned data path obtained from `_get_save_path` and `_get_load_path` respectively.
 
-To verify whether a dataset can undergo versioning, you should examine the dataset class code to inspect its inheritance [(you can find contributed datasets within the `kedro-datasets` repository)](https://github.com/kedro-org/kedro-plugins/tree/main/kedro-datasets/kedro_datasets). Check if the dataset class inherits from the `AbstractVersionedDataSet`. For instance, if you encounter a class like `CSVDataSet(AbstractVersionedDataSet[pd.DataFrame, pd.DataFrame])`, this indicates that the dataset is set up to support versioning.
+To verify whether a dataset can undergo versioning, you should examine the dataset class code to inspect its inheritance [(you can find contributed datasets within the `kedro-datasets` repository)](https://github.com/kedro-org/kedro-plugins/tree/main/kedro-datasets/kedro_datasets). Check if the dataset class inherits from the `AbstractVersionedDataset`. For instance, if you encounter a class like `CSVDataSet(AbstractVersionedDataset[pd.DataFrame, pd.DataFrame])`, this indicates that the dataset is set up to support versioning.
 
 ```{note}
 Note that HTTP(S) is a supported file system in the dataset implementations, but if you it, you can't also use versioning.

@@ -397,12 +397,12 @@ for loading, so the first node outputs a `pyspark.sql.DataFrame`, while the seco
 
 You can use the [`kedro catalog create` command to create a Data Catalog YAML configuration](../development/commands_reference.md#create-a-data-catalog-yaml-configuration-file).
 
-This creates a `<conf_root>/<env>/catalog/<pipeline_name>.yml` configuration file with `MemoryDataSet` datasets for each dataset in a registered pipeline if it is missing from the `DataCatalog`.
+This creates a `<conf_root>/<env>/catalog/<pipeline_name>.yml` configuration file with `MemoryDataset` datasets for each dataset in a registered pipeline if it is missing from the `DataCatalog`.
 
 ```yaml
 # <conf_root>/<env>/catalog/<pipeline_name>.yml
 rockets:
-  type: MemoryDataSet
+  type: MemoryDataset
 scooters:
-  type: MemoryDataSet
+  type: MemoryDataset
 ```
@@ -2,9 +2,9 @@
 
 [Kedro supports many datasets](/kedro_datasets) out of the box, but you may find that you need to create a custom dataset. For example, you may need to handle a proprietary data format or filesystem in your pipeline, or perhaps you have found a particular use case for a dataset that Kedro does not support. This tutorial explains how to create a custom dataset to read and save image data.
 
-## AbstractDataSet
+## AbstractDataset
 
-For contributors, if you would like to submit a new dataset, you must extend the [`AbstractDataSet` interface](/kedro.io.AbstractDataset) or [`AbstractVersionedDataSet` interface](/kedro.io.AbstractVersionedDataset) if you plan to support versioning. It requires subclasses to override the `_load` and `_save` and provides `load` and `save` methods that enrich the corresponding private methods with uniform error handling. It also requires subclasses to override `_describe`, which is used in logging the internal information about the instances of your custom `AbstractDataSet` implementation.
+For contributors, if you would like to submit a new dataset, you must extend the [`AbstractDataset` interface](/kedro.io.AbstractDataset) or [`AbstractVersionedDataset` interface](/kedro.io.AbstractVersionedDataset) if you plan to support versioning. It requires subclasses to override the `_load` and `_save` and provides `load` and `save` methods that enrich the corresponding private methods with uniform error handling. It also requires subclasses to override `_describe`, which is used in logging the internal information about the instances of your custom `AbstractDataset` implementation.
 
 
 ## Scenario
@@ -267,19 +267,19 @@ class ImageDataSet(AbstractDataset[np.ndarray, np.ndarray]):
 ```
 </details>
 
-## Integration with `PartitionedDataSet`
+## Integration with `PartitionedDataset`
 
 Currently, the `ImageDataSet` only works with a single image, but this example needs to load all Pokemon images from the raw data directory for further processing.
 
-Kedro's [`PartitionedDataSet`](./partitioned_and_incremental_datasets.md) is a convenient way to load multiple separate data files of the same underlying dataset type into a directory.
+Kedro's [`PartitionedDataset`](/kedro.io.PartitionedDataset) is a convenient way to load multiple separate data files of the same underlying dataset type into a directory.
 
-To use `PartitionedDataSet` with `ImageDataSet` to load all Pokemon PNG images, add this to the data catalog YAML so that `PartitionedDataSet` loads all PNG files from the data directory using `ImageDataSet`:
+To use `PartitionedDataset` with `ImageDataSet` to load all Pokemon PNG images, add this to the data catalog YAML so that `PartitionedDataset` loads all PNG files from the data directory using `ImageDataSet`:
 
 ```yaml
 # in conf/base/catalog.yml
 
 pokemon:
-  type: PartitionedDataSet
+  type: PartitionedDataset
   dataset: kedro_pokemon.extras.datasets.image_dataset.ImageDataSet
   path: data/01_raw/pokemon-images-and-types/images/images
   filename_suffix: ".png"
@@ -305,11 +305,11 @@ $ ls -la data/01_raw/pokemon-images-and-types/images/images/*.png | wc -l
 ### How to implement versioning in your dataset
 
 ```{note}
-Versioning doesn't work with `PartitionedDataSet`. You can't use both of them at the same time.
+Versioning doesn't work with `PartitionedDataset`. You can't use both of them at the same time.
 ```
 
 To add versioning support to the new dataset we need to extend the
- [AbstractVersionedDataSet](/kedro.io.AbstractVersionedDataset) to:
+ [AbstractVersionedDataset](/kedro.io.AbstractVersionedDataset) to:
 
 * Accept a `version` keyword argument as part of the constructor
 * Adapt the `_save` and `_load` method to use the versioned data path obtained from `_get_save_path` and `_get_load_path` respectively

@@ -215,7 +215,7 @@ The matches are ranked according to the following criteria:
 
 ## How to override the default dataset creation with dataset factories
 
-You can use dataset factories to define a catch-all pattern which will overwrite the default [`MemoryDataSet`](/kedro.io.MemoryDataset) creation.
+You can use dataset factories to define a catch-all pattern which will overwrite the default [`MemoryDataset`](/kedro.io.MemoryDataset) creation.
 
 ```yaml
 "{default_dataset}":