The code in this repository demonstrates best practice when working with Kedro and PySpark. It contains a Kedro starter template with some initial configuration and an example pipeline, and originates from the Kedro documentation about how to work with PySpark.
While Spark allows you to specify many different configuration options, this starter uses /conf/base/spark.yml
as a single configuration location.
This Kedro starter contains the initialisation code for SparkSession
in the ProjectContext
and takes its configuration from /conf/base/spark.yml
. Modify this code if you want to further customise your SparkSession
, e.g. to use YARN.