Skip to content

Commit

Permalink
Merge pull request #105 from quantumblacklabs/release/0.14.3
Browse files Browse the repository at this point in the history
Release 0.14.3
  • Loading branch information
nakhan98 authored Jun 26, 2019
2 parents a1fc18a + 4fb3930 commit d080ead
Show file tree
Hide file tree
Showing 160 changed files with 1,979 additions and 547 deletions.
2 changes: 1 addition & 1 deletion .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
## Notice

- [ ] I acknowledge and agree that, by checking this box and clicking Submit Pull Request:
- [ ] I acknowledge and agree that, by checking this box and clicking "Submit Pull Request":

- I submit this contribution under the [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0.txt) and represent that I am entitled to do so on behalf of myself, my employer, or relevant third parties, as applicable.
- I certify that (a) this contribution is my original creation and / or (b) to the extent it is not my original creation, I am authorised to submit this contribution on behalf of the original creator(s) or their licensees.
Expand Down
28 changes: 14 additions & 14 deletions LICENSE.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,22 +6,22 @@ You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND
NONINFRINGEMENT. IN NO EVENT WILL THE LICENSOR OR OTHER CONTRIBUTORS
BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN
ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF, OR IN
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND
NONINFRINGEMENT. IN NO EVENT WILL THE LICENSOR OR OTHER CONTRIBUTORS
BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN
ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF, OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

The QuantumBlack Visual Analytics Limited (QuantumBlack) name and logo
(either separately or in combination, QuantumBlack Trademarks) are
trademarks of QuantumBlack. The License does not grant you any right or
license to the QuantumBlack Trademarks. You may not use the QuantumBlack
Trademarks or any confusingly similar mark as a trademark for your product,
or use the QuantumBlack Trademarks in any other manner that might cause
confusion in the marketplace, including but not limited to in advertising,
The QuantumBlack Visual Analytics Limited ("QuantumBlack") name and logo
(either separately or in combination, "QuantumBlack Trademarks") are
trademarks of QuantumBlack. The License does not grant you any right or
license to the QuantumBlack Trademarks. You may not use the QuantumBlack
Trademarks or any confusingly similar mark as a trademark for your product,
or use the QuantumBlack Trademarks in any other manner that might cause
confusion in the marketplace, including but not limited to in advertising,
on websites, or on software.

See the License for the specific language governing permissions and
See the License for the specific language governing permissions and
limitations under the License.
3 changes: 2 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,8 @@ lint:
pylint -j 0 --disable=unnecessary-pass kedro
pylint -j 0 --disable=missing-docstring,redefined-outer-name,no-self-use,invalid-name tests
pylint -j 0 --disable=missing-docstring,no-name-in-module features
flake8 kedro tests features --exclude kedro/template*
pylint -j 0 extras
flake8 kedro tests features extras --exclude kedro/template*

test:
pytest tests
Expand Down
9 changes: 5 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
`develop` | `master`
----------|---------
[![CircleCI](https://circleci.com/gh/quantumblacklabs/kedro/tree/develop.svg?style=shield)](https://circleci.com/gh/quantumblacklabs/kedro/tree/develop) | [![CircleCI](https://circleci.com/gh/quantumblacklabs/kedro/tree/master.svg?style=shield)](https://circleci.com/gh/quantumblacklabs/kedro/tree/master)
[![Build status](https://ci.appveyor.com/api/projects/status/2u74p5g8fdc45wwh/branch/develop?svg=true)](https://ci.appveyor.com/project/QuantumBlack/kedro/branch/develop) | [![Build status](https://ci.appveyor.com/api/projects/status/2u74p5g8fdc45wwh/branch/master?svg=true)](https://ci.appveyor.com/project/QuantumBlack/kedro/branch/master)

[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Python Version](https://img.shields.io/badge/python-3.5%20%7C%203.6%20%7C%203.7-blue.svg)](https://pypi.org/project/kedro/)
Expand All @@ -13,7 +14,7 @@

# What is Kedro?

> The centre of your data pipeline.
> "The centre of your data pipeline."
Kedro is a workflow development tool that helps you build data pipelines that are robust, scalable, deployable, reproducible and versioned. We provide a standard approach so that you can:
- spend more time building your data pipeline,
Expand Down Expand Up @@ -58,9 +59,9 @@ For more detailed installation instructions, including how to setup Python virtu
### 4. Feature extensibility

- A plugin system that injects commands into the Kedro command line interface (CLI)
- (_coming soon_) List of officially supported plugins:
- Kedro-Airflow, making it easy to prototype your data pipeline in Kedro before deploying to [Airflow](https://github.com/apache/airflow), a workflow scheduler
- Kedro-Docker, a tool for packing and shipping Kedro projects within containers
- List of officially supported plugins:
- (_coming soon_) Kedro-Airflow, making it easy to prototype your data pipeline in Kedro before deploying to [Airflow](https://github.com/apache/airflow), a workflow scheduler
- [Kedro-Docker](https://github.com/quantumblacklabs/kedro-docker), a tool for packaging and shipping Kedro projects within containers
- Kedro can be deployed locally, on-premise and cloud (AWS, Azure and GCP) servers, or clusters (EMR, Azure HDinsight, GCP and Databricks)

![Kedro-Viz Pipeline Visualisation](https://raw.githubusercontent.com/quantumblacklabs/kedro/master/img/pipeline_visualisation.png)
Expand Down
27 changes: 25 additions & 2 deletions RELEASE.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,33 @@
# Release 0.14.3

## Major features and improvements
* Tab completion for catalog datasets in `ipython` or `jupyter` sessions. (Thank you [@datajoely](https://github.com/datajoely) and [@WaylonWalker](https://github.com/WaylonWalker))
* Added support for transcoding, an ability to decouple loading/saving mechanisms of a dataset from its storage location, denoted by adding '@' to the dataset name.
* Datasets have a new `release` function that instructs them to free any cached data. The runners will call this when the dataset is no longer needed downstream.

## Bug fixes and other changes
* Add support for pipeline nodes made up from partial functions.
* Expand user home directory `~` for TextLocalDataSet (see issue #19).
* Add a `short_name` property to `Node`s for a display-friendly (but not necessarily unique) name.
* Add Kedro project loader for IPython: `extras/kedro_project_loader.py`.
* Fix source file encoding issues with Python 3.5 on Windows.
* Fix local project source not having priority over the same source installed as a package, leading to local updates not being recognised.

## Breaking changes to the API
* Remove the max_loads argument from the `MemoryDataSet` constructor and from the `AbstractRunner.create_default_data_set` method.

## Thanks for supporting contributions
[Nikolaos Tsaousis](https://github.com/tsanikgr), [Ivan Danov](https://github.com/idanov), [Gordon Wrigley](https://github.com/tolomea), [Yetunde Dada](https://github.com/yetudada), [Kiyohito Kunii](https://github.com/921kiyo), [Lorena Balan](https://github.com/lorenabalan), [Richard Westenra](https://github.com/richardwestenra), [Dmitrii Deriabin](https://github.com/DmitryDeryabin), [Joel Schwarzmann](https://github.com/datajoely), [Alex Kalmikov](https://github.com/kalexqb)

# Release 0.14.2

## Major features and improvements
* Added Data Set transformer support in the form of AbstractTransformer and DataCatalog.add_transformer
* Added Data Set transformer support in the form of AbstractTransformer and DataCatalog.add_transformer.

## Breaking changes to the API
* Merged the ExistsMixin into AbstractDataSet
* Merged the `ExistsMixin` into `AbstractDataSet`.
* `Pipeline.node_dependencies` returns a dictionary keyed by node, with sets of parent nodes as values; `Pipeline` and `ParallelRunner` were refactored to make use of this for topological sort for node dependency resolution and running pipelines respectively.
* `Pipeline.grouped_nodes` returns a list of sets, rather than a list of lists.

## Thanks for supporting contributions

Expand Down
4 changes: 2 additions & 2 deletions docs/build-docs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,8 @@
# ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF, OR IN
# CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
#
# The QuantumBlack Visual Analytics Limited (QuantumBlack) name and logo
# (either separately or in combination, QuantumBlack Trademarks) are
# The QuantumBlack Visual Analytics Limited ("QuantumBlack") name and logo
# (either separately or in combination, "QuantumBlack Trademarks") are
# trademarks of QuantumBlack. The License does not grant you any right or
# license to the QuantumBlack Trademarks. You may not use the QuantumBlack
# Trademarks or any confusingly similar mark as a trademark for your product,
Expand Down
28 changes: 28 additions & 0 deletions docs/source/02_getting_started/01_prerequisites.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,34 @@ Kedro supports macOS, Linux and Windows (7 / 8 / 10 and Windows Server 2016+). I

In order to work effectively with Kedro projects, we highly recommend you download and install [Anaconda](https://www.anaconda.com/download/#macos) (Python 3.x version) and [Java](https://www.oracle.com/technetwork/java/javase/downloads/index.html) (if using PySpark).

### Build tools

On Unix-like operating systems, you will need to install a C compiler and related build tools for your platform. This is due to the inclusion of the [memory-profiler](https://pypi.org/project/memory-profiler/) library in our dependencies. If your operating system is not mentioned, please refer to its documentation.

#### macOS
To install Command Line Tools for Xcode, run the following from the terminal:

```bash
xcode-select --install
```

#### GNU/Linux

##### Debian/Ubuntu

The following command (run with root permissions) will install the `build-essential` metapackage for Debian-based distributions:

```bash
apt-get update && apt-get install build-essential
```

##### Red Hat Enterprise Linux / Centos
The following command (run with root permissions) will install the "Develop Tools" group of packages on RHEL/Centos:

```bash
yum groupinstall 'Development Tools'
```

### Windows

You will require admin rights to complete the installation of the following tools on your machine:
Expand Down
8 changes: 4 additions & 4 deletions docs/source/03_tutorial/05_package_a_project.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,12 @@
# Packaging a project

In this section, you will learn how to build your project documentation, as well as how to bundle your project into a Python package for handover.

In this section, you will learn how to build your project documentation, as well as how to bundle your project into a Python package for handover.

## Add documentation to your project

While Kedro documentation can be found by running `kedro docs` from the command line, project-specific documentation can be generated by running `kedro build-docs` in the project's root directory.
While Kedro documentation can be found by running `kedro docs` from the command line, project-specific documentation can be generated by running `kedro build-docs` in the project's root directory.

This will create documentation based on the code structure of your project. Documentation will also include the [`docstrings`](https://www.datacamp.com/community/tutorials/docstrings-python) defined in the project code. The resulting HTML files can be found in `docs/build/html/`.
This will create documentation based on the code structure of your project. Documentation will also include the [`docstrings`](https://www.datacamp.com/community/tutorials/docstrings-python) defined in the project code. The resulting HTML files can be found in `docs/build/html/`.

`kedro build-docs` uses the [Sphinx](https://www.sphinx-doc.org) framework to build your project documentation, so if you want to customise it, please refer to `docs/source/conf.py` and the [corresponding section](http://www.sphinx-doc.org/en/master/usage/configuration.html) of the Sphinx documentation.

Expand All @@ -16,6 +15,7 @@ This will create documentation based on the code structure of your project. Docu

You can package your project by running `kedro package` from the command line. This will create one `.egg` file and one `.whl` file within the `src/dist/` folder of your project, which are Python packaging formats. For further information about packaging for Python, documentation is provided [here](https://packaging.python.org/overview/).

You can also check out [Kedro-Docker](https://github.com/quantumblacklabs/kedro-docker), an officially supported Kedro plugin for packaging and shipping Kedro projects within [Docker](https://www.docker.com/) containers.

## What is next?

Expand Down
2 changes: 1 addition & 1 deletion docs/source/04_user_guide/01_setting_up_vscode.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Setting up Visual Studio Code

> *Note:* This documentation is based on `Kedro 0.14.2`, if you spot anything that is incorrect then please create an [issue](https://github.com/quantumblacklabs/kedro/issues) or pull request.
> *Note:* This documentation is based on `Kedro 0.14.3`, if you spot anything that is incorrect then please create an [issue](https://github.com/quantumblacklabs/kedro/issues) or pull request.
Start by opening a new project directory in VS Code and installing the Python plugin under **Tools and languages**:

Expand Down
2 changes: 1 addition & 1 deletion docs/source/04_user_guide/02_setting_up_pycharm.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Setting up PyCharm

> *Note:* This documentation is based on `Kedro 0.14.2`, if you spot anything that is incorrect then please create an [issue](https://github.com/quantumblacklabs/kedro/issues) or pull request.
> *Note:* This documentation is based on `Kedro 0.14.3`, if you spot anything that is incorrect then please create an [issue](https://github.com/quantumblacklabs/kedro/issues) or pull request.
This section will present a quick guide on how to configure [PyCharm](https://www.jetbrains.com/pycharm/) as a development environment for working on Kedro projects.

Expand Down
2 changes: 1 addition & 1 deletion docs/source/04_user_guide/03_configuration.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Configuration

> *Note:* This documentation is based on `Kedro 0.14.2`, if you spot anything that is incorrect then please create an [issue](https://github.com/quantumblacklabs/kedro/issues) or pull request.
> *Note:* This documentation is based on `Kedro 0.14.3`, if you spot anything that is incorrect then please create an [issue](https://github.com/quantumblacklabs/kedro/issues) or pull request.
This section contains detailed information about configuration. You may also want to consult the relevant API documentation on [kedro.config](/kedro.config.rst).

Expand Down
28 changes: 23 additions & 5 deletions docs/source/04_user_guide/04_data_catalog.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# The Data Catalog

> *Note:* This documentation is based on `Kedro 0.14.2`, if you spot anything that is incorrect then please create an [issue](https://github.com/quantumblacklabs/kedro/issues) or pull request.
> *Note:* This documentation is based on `Kedro 0.14.3`, if you spot anything that is incorrect then please create an [issue](https://github.com/quantumblacklabs/kedro/issues) or pull request.
This section introduces `catalog.yml`, the project-shareable Data Catalog. The file is located in `conf/base` and is a registry of all data sources available for use by a project; it manages loading and saving of data.

Expand Down Expand Up @@ -175,6 +175,24 @@ airplanes:

In this example the default `csv` configuration is inserted into `airplanes` and then the `load_args` block is overridden. Normally that would replace the whole dictionary. In order to extend `load_args` the defaults for that block are then re-inserted.


### Transcoding datasets

You may come across a situation where you would like to read the same file using two different dataset implementations. For instance, `parquet` files can not only be loaded via the `ParquetLocalDataSet`, but also directly by `SparkDataSet` using `pandas`. To do this, you can can define your `catalog.yml` as follows:

```yaml
mydata@pandas:
type: ParquetLocalDataSet
filepath: data/01_raw/data.parquet
mydata@spark:
type: kedro.contrib.io.pyspark.SparkDataSet
filepath: data/01_raw/data.parquet
```

In your pipeline, you may refer to either dataset as input or output, and it will ensure the dependencies point to a single dataset `mydata` both while running the pipeline and in the visualisation.


### Transforming datasets

If you need to augment the loading and / or saving of one or more datasets you can use the transformer API. To do this create a subclass of `AbstractTransformer` that implements your changes and then apply it to your catalog with `DataCatalog.add_transformer`. For example to print the runtimes of load and save operations you could do this:
Expand Down Expand Up @@ -231,9 +249,9 @@ In a file like `catalog.py`, you can generate the Data Catalog. This will allow
io = DataCatalog({
'bikes': CSVLocalDataSet(filepath='../data/01_raw/bikes.csv'),
'cars': CSVLocalDataSet(filepath='../data/01_raw/cars.csv', load_args=dict(sep=',')), # additional arguments
'scooters': SQLTableDataSet(table_name="scooters", credentials=dict(con="sqlite:///kedro.db")),
'cars_table': SQLTableDataSet(table_name="cars", credentials=dict(con="sqlite:///kedro.db")),
'scooters_query': SQLQueryDataSet(sql="select * from cars where gear=4", credentials=dict(con="sqlite:///kedro.db")),
'trucks': ParquetLocalDataSet(filepath="trucks.parquet")
'ranked': ParquetLocalDataSet(filepath="ranked.parquet")
})
```

Expand Down Expand Up @@ -280,7 +298,7 @@ io.load('car_cache')

#### Saving data to a SQL database for querying

At this point we may want to put the data in a SQLite database to run queries on it. Let's use that to rank cars by their mpg.
At this point we may want to put the data in a SQLite database to run queries on it. Let's use that to rank scooters by their mpg.

```python
# This cleans up the database in case it exists at this point
Expand All @@ -291,7 +309,7 @@ except FileNotFoundError:
pass
io.save('cars_table', cars)
ranked = io.load('cars_query')[['brand', 'mpg']]
ranked = io.load('scooters_query')[['brand', 'mpg']]
```

#### Saving data in parquet
Expand Down
2 changes: 1 addition & 1 deletion docs/source/04_user_guide/05_nodes_and_pipelines.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Nodes and pipelines

> *Note:* This documentation is based on `Kedro 0.14.2`, if you spot anything that is incorrect then please create an [issue](https://github.com/quantumblacklabs/kedro/issues) or pull request.
> *Note:* This documentation is based on `Kedro 0.14.3`, if you spot anything that is incorrect then please create an [issue](https://github.com/quantumblacklabs/kedro/issues) or pull request.
In this section we introduce pipelines and nodes.

Relevant API documentation:
Expand Down
2 changes: 1 addition & 1 deletion docs/source/04_user_guide/06_logging.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Logging

> *Note:* This documentation is based on `Kedro 0.14.2`, if you spot anything that is incorrect then please create an [issue](https://github.com/quantumblacklabs/kedro/issues) or pull request.
> *Note:* This documentation is based on `Kedro 0.14.3`, if you spot anything that is incorrect then please create an [issue](https://github.com/quantumblacklabs/kedro/issues) or pull request.
Kedro uses, and facilitates the use of Python’s `logging` library, by providing a default logging configuration. This can be found in `conf/base/logging.yml` in every project generated using Kedro’s CLI `kedro new` command.

Expand Down
2 changes: 1 addition & 1 deletion docs/source/04_user_guide/07_advanced_io.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Advanced IO

> *Note:* This documentation is based on `Kedro 0.14.2`, if you spot anything that is incorrect then please create an [issue](https://github.com/quantumblacklabs/kedro/issues) or pull request.
> *Note:* This documentation is based on `Kedro 0.14.3`, if you spot anything that is incorrect then please create an [issue](https://github.com/quantumblacklabs/kedro/issues) or pull request.
In this tutorial, you will learn about advanced uses of the [Kedro IO](/kedro.io.rst) module and understand the underlying implementation.

Expand Down
2 changes: 1 addition & 1 deletion docs/source/04_user_guide/08_pyspark.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Working with PySpark

> *Note:* This documentation is based on `Kedro 0.14.2`, if you spot anything that is incorrect then please create an [issue](https://github.com/quantumblacklabs/kedro/issues) or pull request.
> *Note:* This documentation is based on `Kedro 0.14.3`, if you spot anything that is incorrect then please create an [issue](https://github.com/quantumblacklabs/kedro/issues) or pull request.
In this tutorial we explain how to work with `PySpark` in a Kedro pipeline.

Expand Down
Loading

0 comments on commit d080ead

Please sign in to comment.