-
Notifications
You must be signed in to change notification settings - Fork 0
Connecting to S3 storage
To connect to S3 storage it is needed to pass the credential keys to the spark pods. This can be done by hard coding them into the applications manifest. However, to have more control over how and where the secrets are used we have choosen to use kubernetes secrets to store the keys.
Kubernetes secrets are a method to hold small amount of sensitive data to be used on a kubernetes cluster. A kubernetes user can create the sercet which then runs on the cluster. Pods running on the cluster then be configured to have mount this secret so that the data is accessiable in the pods or reference the data in the secret through use of environment variables. Access to secrets can be controlled through service accounts and the secret is not accessible from outside of the cluster without kubectl privileges with the correct correct service account. For more details on kubernetes secrets see here
Before using a secret it must first be loaded onto the kubernetes cluster. For connecting to S3 storage we want to use the following secret definition:
apiVersion: v1
kind: Secret
metadata:
name: name-for-your-secrets
type: Opaque
data:
accessKey: base64 encoded access key to s3 # echo -n "AKIAIOSFODNN7EXAMPLE" | base64
secretKey: base64 encoded secret key to s3
Note, you can encode your keys using echo -n "YOURKEY" | base64
Once the definition has been written simply navigate to it in a terminal and run kubectl apply -f name_of_secret_file.yaml
Our aim for using secrets is to have a secure way to access S3 storage when running a spark job with the Piezo web app.
When running a spark job that requires S3 storage the driver and executor pods try to create a connection with the S3 interface and thus require knowledge of the keys. When the connection is created, the spark pods use the hadoop configuration settings. In particular they look for the environment variables AWS_ACCESS_KEY_ID
and AWS_ACCESS_KEY_ID
.
The simpliest way to use the data from our secret to form a connection is to set the environment variables for AWS_ACCESS_KEY_ID
and AWS_ACCESS_KEY_ID
within the spark pods directly from the secret. This hides any knowledge of the secrets from the user. To set the environment variables the following is required in the spec.driver
and spec.executor
sections of the manifest defining the spark application:
envSecretKeyRefs:
AWS_ACCESS_KEY_ID:
name: name_of_your_secret
key: accessKey
AWS_SECRET_ACCESS_KEY:
name: name_of_your_secret
key: secretKey
In the piezo web app this is all taken care of behind the scenes and the user requires no knowledge of the keys or the secret to run their application