Run Unsupported Runtimes on Cloud ML Engine using a Custom Container

This tutorial covers how to train a Keras model using the nightly build of TensorFlow via a Custom Container (docker image) on Cloud ML Engine. In this way, you or your team can test new versions of TensorFlow or other frameworks before that specific runtime is supported by Cloud ML Engine's training service. In this tutorial, you will build a docker image to train a model. The Keras model predicts whether the given sonar signals are bouncing off a metal cylinder or off a cylindrical rock from UCI Machine Learning Repository.

Citation: Dua, D. and Karra Taniskidou, E. (2017). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

How to run a TensorFlow nightly build using a custom container

Create your model
Create the docker image
Build the docker image
Test your docker image locally
Deploy the docker image to Cloud Container Registry
Submit your training job

Prerequisites

Before you jump in, let’s cover some of the different tools you’ll be using to get your container up and running on ML Engine.

Google Cloud Platform lets you build and host applications and websites, store data, and analyze data on Google's scalable infrastructure.

Cloud ML Engine is a managed service that enables you to easily build machine learning models that work on any type of data, of any size.

Cloud Container Registry is a single place for your team to manage Docker images, perform vulnerability analysis, and decide who can access what with fine-grained access control.

Google Cloud Storage (GCS) is a unified object storage for developers and enterprises, from live data serving to data analytics/ML to data archiving.

Cloud SDK is a command line tool which allows you to interact with Google Cloud products. In order to run this tutorial, make sure that Cloud SDK is installed in the same environment as your Jupyter kernel.

docker is a containerization technology that allows developers to package their applications and dependencies easily so that they can be run anywhere.

Part 0: Setup

Create a project on GCP
Create a Google Cloud Storage Bucket
Enable Cloud Machine Learning Engine, Container Registry, and Compute Engine APIs
Install Cloud SDK
Install docker

These variables will be needed for the following steps.

Replace these variables:

# PROJECT_ID: your project's id. Use the PROJECT_ID that matches your Google Cloud Platform project.
export PROJECT_ID=YOUR_PROJECT_ID

# BUCKET_ID: the bucket id you created above.
export BUCKET_ID=BUCKET_ID

Additional variables:

# IMAGE_REPO_NAME: where the image will be stored on Cloud Container Registry
export IMAGE_REPO_NAME=sonar_tf_nightly_container

# IMAGE_TAG: an easily identifiable tag for your docker image
export IMAGE_TAG=sonar_tf

# IMAGE_URI: the complete URI location for Cloud Container Registry
export IMAGE_URI=gcr.io/$PROJECT_ID/$IMAGE_REPO_NAME:$IMAGE_TAG

# REGION: select a region from https://cloud.google.com/ml-engine/docs/regions
# or use the default '`us-central1`'. The region is where the model will be deployed.
export REGION=us-central1

# JOB_NAME: the name of your job running on Cloud ML Engine.
export JOB_NAME=custom_container_tf_nightly_job_$(date +%Y%m%d_%H%M%S)

Part 1: Create the model you want to train

Here we provide an example model.py that builds a Keras model to predict whether the given sonar signals are bouncing off a metal cylinder or off a cylindrical rock.

Open up the task.py to see exactly how the model is called during training.

data_utils.py is used to download / load the data and exports your trained model and uploads the model to Google Cloud Storage.

The dataset for the model is hosted originally at the UCI Machine Learning Repository. We've hosted the sonar dataset in Cloud Storage for use with this sample.

Part 2: Create the docker Image

Open the Dockerfile to see how the Docker image is created that will run on Cloud ML Engine.

Part 3: Build the docker Image

docker build -f Dockerfile -t $IMAGE_URI ./

Part 4: Test your docker image locally

docker run $IMAGE_URI --epochs 1

If it ran successfully, the last line of output should be similar to: [0.71799388669786, 0.35714287].

Part 5: Deploy the docker image to Cloud Container Registry

You should have configured docker to use Cloud Container Registry, found here.

docker push $IMAGE_URI

Part 6: Submit your training job

Submit the training job to Cloud ML Engine using gcloud.

Note: You may need to install gcloud beta to submit the training job.

gcloud components install beta

gcloud beta ml-engine jobs submit training $JOB_NAME \
  --region $REGION \
  --master-image-uri $IMAGE_URI \
  --scale-tier BASIC \
  -- \
  --model-dir=$BUCKET_ID \
  --epochs=10

[Optional] StackDriver Logging

You can view the logs for your training job:

Go to https://console.cloud.google.com/
Select "Logging" in left-hand pane
Select "Cloud ML Job" resource from the drop-down
In filter by prefix, use the value of $JOB_NAME to view the logs

[Optional] Verify Model File in GCS

View the contents of the destination model folder to verify that model file has indeed been uploaded to GCS.

Note: The model can take a few minutes to train and show up in GCS.

gsutil ls gs://$BUCKET_ID/sonar_*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Run Unsupported Runtimes on Cloud ML Engine using a Custom Container

How to run a TensorFlow nightly build using a custom container

Prerequisites

Part 0: Setup

Part 1: Create the model you want to train

Part 2: Create the docker Image

Part 3: Build the docker Image

Part 4: Test your docker image locally

Part 5: Deploy the docker image to Cloud Container Registry

Part 6: Submit your training job

[Optional] StackDriver Logging

[Optional] Verify Model File in GCS

Files

README.md

Latest commit

History

README.md

File metadata and controls

Run Unsupported Runtimes on Cloud ML Engine using a Custom Container

How to run a TensorFlow nightly build using a custom container

Prerequisites

Part 0: Setup

Part 1: Create the model you want to train

Part 2: Create the docker Image

Part 3: Build the docker Image

Part 4: Test your docker image locally

Part 5: Deploy the docker image to Cloud Container Registry

Part 6: Submit your training job

[Optional] StackDriver Logging

[Optional] Verify Model File in GCS