The legacy versions of AI Platform Training, AI Platform Prediction, AI Platform Pipelines, and AI Platform Data Labeling Service are deprecated and will no longer be available on Google Cloud after their shutdown date. All the functionality of legacy AI Platform and new features are available on the Vertex AI platform. See Migrate to Vertex AI to learn how to migrate your resources.

Getting started: training and prediction with Keras

Run this tutorial as a notebook in Colab

View the notebook on GitHub

This tutorial shows how to train a neural network on AI Platform using the Keras sequential API and how to serve predictions from that model.

Keras is a high-level API for building and training deep learning models. tf.keras is TensorFlow’s implementation of this API.

The first two parts of the tutorial walk through training a model on AI Platform using prewritten Keras code, deploying the trained model to AI Platform, and serving online predictions from the deployed model.

The last part of the tutorial digs into the training code used for this model and ensuring it's compatible with AI Platform. To learn more about building machine learning models in Keras more generally, read TensorFlow's Keras tutorials.

Dataset

This tutorial uses the United States Census Income Dataset provided by the UC Irvine Machine Learning Repository. This dataset contains information about people from a 1994 Census database, including age, education, marital status, occupation, and whether they make more than $50,000 a year.

Objective

The goal is to train a deep neural network (DNN) using Keras that predicts whether a person makes more than $50,000 a year (target label) based on other Census information about the person (features).

This tutorial focuses more on using this model with AI Platform than on the design of the model itself. However, it's always important to think about potential problems and unintended consequences when building machine learning systems. See the Machine Learning Crash Course exercise about fairness to learn about sources of bias in the Census dataset, as well as machine learning fairness more generally.

Costs

This tutorial uses billable components of Google Cloud (Google Cloud):

AI Platform Training
AI Platform Prediction
Cloud Storage

Learn about AI Platform Training pricing, AI Platform Prediction pricing, and Cloud Storage pricing, and use the Pricing Calculator to generate a cost estimate based on your projected usage.

Before you begin

You must do several things before you can train and deploy a model in AI Platform:

Set up your local development environment.
Set up a Google Cloud project with billing and the necessary APIs enabled.
Create a Cloud Storage bucket to store your training package and your trained model.

Set up your local development environment

You need the following to complete this tutorial:

Git
Python 3
virtualenv
The Google Cloud SDK

The Google Cloud guide to Setting up a Python development environment provides detailed instructions for meeting these requirements. The following steps provide a condensed set of instructions:

Install Python 3.
Install virtualenv and create a virtual environment that uses Python 3.
Activate that environment.
Complete the steps in the following section to install the Google Cloud SDK.

Set up your Google Cloud project

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Go to project selector

Make sure that billing is enabled for your Google Cloud project.

Enable the AI Platform Training & Prediction and Compute Engine APIs.

Enable the APIs

Install the Google Cloud CLI.

To initialize the gcloud CLI, run the following command:

gcloud init

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Go to project selector

Make sure that billing is enabled for your Google Cloud project.

Enable the AI Platform Training & Prediction and Compute Engine APIs.

Enable the APIs

Install the Google Cloud CLI.

To initialize the gcloud CLI, run the following command:

gcloud init

Authenticate your GCP account

To set up authentication, you need to create a service account key and set an environment variable for the file path to the service account key.

Create a service account:
1. In the Google Cloud console, go to the Create service account page.
  
  Go to Create service account
2. In the Service account name field, enter a name.
3. Optional: In the Service account description field, enter a description.
4. Click Create.
5. Click the Select a role field. Under All roles, select AI Platform > AI Platform Admin.
6. Click Add another role.
7. Click the Select a role field. Under All roles, select Storage > Storage Object Admin.
  
  Note: The roles you select allow your service account to access resources. You can view and change these roles later by using the Google Cloud console. If you are developing a production app, you might need to specify roles with fewer permissions than AI Platform Admin and Storage Object Admin. For more information, see access control for AI Platform.
8. Click Done to create the service account.
  
  Do not close your browser window. You will use it in the next step.
Create a service account key for authentication:
1. In the Google Cloud console, click the email address for the service account that you created.
2. Click Keys.
3. Click Add key, then Create new key.
4. Click Create. A JSON key file is downloaded to your computer.
5. Click Close.
Set the environment variable GOOGLE_APPLICATION_CREDENTIALS to the file path of the JSON file that contains your service account key. This variable only applies to your current shell session, so if you open a new session, set the variable again.
Example: Linux or macOS

Replace [PATH] with the file path of the JSON file that contains your service account key.
```
export GOOGLE_APPLICATION_CREDENTIALS="[PATH]"
```
For example:
```
export GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/service-account-file.json"
```
Example: Windows

Replace [PATH] with the file path of the JSON file that contains your service account key, and [FILE_NAME] with the filename.

With PowerShell:
```
$env:GOOGLE_APPLICATION_CREDENTIALS="[PATH]"
```
For example:
```
$env:GOOGLE_APPLICATION_CREDENTIALS="C:\Users\username\Downloads\[FILE_NAME].json"
```
With command prompt:
```
set GOOGLE_APPLICATION_CREDENTIALS=[PATH]
```

Create a Cloud Storage bucket

When you submit a training job using the Cloud SDK, you upload a Python package containing your training code to a Cloud Storage bucket. AI Platform runs the code from this package. In this tutorial, AI Platform also saves the trained model that results from your job in the same bucket. You can then create an AI Platform model version based on this output in order to serve online predictions.

Set the name of your Cloud Storage bucket as an environment variable. It must be unique across all Cloud Storage buckets:

BUCKET_NAME="your-bucket-name"

Select a region where AI Platform Training and AI Platform Prediction are available, and create another environment variable. For example:

REGION="us-central1"

Create your Cloud Storage bucket in this region and, later, use the same region for training and prediction. Run the following command to create the bucket if it doesn't already exist:

gsutil mb -l $REGION gs://$BUCKET_NAME

Quickstart for training in AI Platform

This section of the tutorial walks you through submitting a training job to AI Platform. This job runs sample code that uses Keras to train a deep neural network on the United States Census data. It outputs the trained model as a TensorFlow SavedModel directory in your Cloud Storage bucket.

Get training code and dependencies

First, download the training code and change the working directory:

# Clone the repository of AI Platform samples
git clone --depth 1 https://github.com/GoogleCloudPlatform/cloudml-samples

# Set the working directory to the sample code directory
cd cloudml-samples/census/tf-keras

Notice that the training code is structured as a Python package in the trainer/ subdirectory:

# `ls` shows the working directory's contents. The `p` flag adds trailing
# slashes to subdirectory names. The `R` flag lists subdirectories recursively.
ls -pR

.:
README.md  requirements.txt  trainer/

./trainer:
__init__.py  model.py  task.py  util.py

Next, install Python dependencies needed to train the model locally:

pip install -r requirements.txt

When you run the training job in AI Platform, dependencies are preinstalled based on the runtime version you choose.

Train your model locally

Before training on AI Platform, train the job locally to verify the file structure and packaging is correct.

For a complex or resource-intensive job, you may want to train locally on a small sample of your dataset to verify your code. Then you can run the job on AI Platform to train on the whole dataset.

This sample runs a relatively quick job on a small dataset, so the local training and the AI Platform job run the same code on the same data.

Run the following command to train a model locally:

# This is similar to `python -m trainer.task --job-dir local-training-output`
# but it better replicates the AI Platform environment, especially
# for distributed training (not applicable here).
gcloud ai-platform local train \
  --package-path trainer \
  --module-name trainer.task \
  --job-dir local-training-output

Observe training progress in your shell. At the end, the training application exports the trained model and prints a message like the following:

Model exported to:  local-training-output/keras_export/1553709223

Train your model using AI Platform

Next, submit a training job to AI Platform. This runs the training module in the cloud and exports the trained model to Cloud Storage.

First, give your training job a name and choose a directory within your Cloud Storage bucket for saving intermediate and output files. Set these as environment variables. For example:

JOB_NAME="my_first_keras_job"
JOB_DIR="gs://$BUCKET_NAME/keras-job-dir"

Run the following command to package the trainer/ directory, upload it to the specified --job-dir, and instruct AI Platform to run the trainer.task module from that package.

The --stream-logs flag lets you view training logs in your shell. You can also see logs and other job details in the Google Cloud Console.

gcloud ai-platform jobs submit training $JOB_NAME \
  --package-path trainer/ \
  --module-name trainer.task \
  --region $REGION \
  --python-version 3.7 \
  --runtime-version 1.15 \
  --job-dir $JOB_DIR \
  --stream-logs

This may take longer than local training, but you can observe training progress in your shell in a similar fashion. At the end, the training job exports the trained model to your Cloud Storage bucket and prints a message like the following:

INFO    2019-03-27 17:57:11 +0000   master-replica-0        Model exported to:  gs://your-bucket-name/keras-job-dir/keras_export/1553709421
INFO    2019-03-27 17:57:11 +0000   master-replica-0        Module completed; cleaning up.
INFO    2019-03-27 17:57:11 +0000   master-replica-0        Clean up finished.
INFO    2019-03-27 17:57:11 +0000   master-replica-0        Task completed successfully.

Hyperparameter tuning

You can optionally perform hyperparameter tuning by using the included hptuning_config.yaml configuration file. This file tells AI Platform to tune the batch size and learning rate for training over multiple trials to maximize accuracy.

In this example, the training code uses a TensorBoard callback, which creates TensorFlow Summary Events during training. AI Platform uses these events to track the metric you want to optimize. Learn more about hyperparameter tuning in AI Platform Training.

gcloud ai-platform jobs submit training ${JOB_NAME}_hpt \
  --config hptuning_config.yaml \
  --package-path trainer/ \
  --module-name trainer.task \
  --region $REGION \
  --python-version 3.7 \
  --runtime-version 1.15 \
  --job-dir $JOB_DIR \
  --stream-logs

Quickstart for online predictions in AI Platform

This section shows how to use AI Platform and your trained model from the previous section to predict a person's income bracket from other Census information about them.

Create model and version resources in AI Platform

To serve online predictions using the model you trained and exported in the quickstart for training, create a model resource in AI Platform and a version resource within it. The version resource is what actually uses your trained model to serve predictions. This structure lets you adjust and retrain your model many times and organize all the versions together in AI Platform. Learn more about models and versions.

First, name and create the model resource:

MODEL_NAME="my_first_keras_model"

gcloud ai-platform models create $MODEL_NAME \
  --regions $REGION

Created ml engine model [projects/your-project-id/models/my_first_keras_model].

Next, create the model version. The training job from the quickstart for training exported a timestamped TensorFlow SavedModel directory to your Cloud Storage bucket. AI Platform uses this directory to create a model version. Learn more about SavedModel and AI Platform.

You may be able to find the path to this directory in your training job's logs. Look for a line like:

Model exported to:  gs://your-bucket-name/keras-job-dir/keras_export/1545439782

Execute the following command to identify your SavedModel directory and use it to create a model version resource:

MODEL_VERSION="v1"

# Get a list of directories in the `keras_export` parent directory. Then pick
# the directory with the latest timestamp, in case you've trained multiple
# times.
SAVED_MODEL_PATH=$(gsutil ls $JOB_DIR/keras_export | head -n 1)

# Create model version based on that SavedModel directory
gcloud ai-platform versions create $MODEL_VERSION \
  --model $MODEL_NAME \
  --region $REGION \
  --runtime-version 1.15 \
  --python-version 3.7 \
  --framework tensorflow \
  --origin $SAVED_MODEL_PATH

Prepare input for prediction

To receive valid and useful predictions, you must preprocess input for prediction in the same way that training data was preprocessed. In a production system, you may want to create a preprocessing pipeline that can be used identically at training time and prediction time.

For this exercise, use the training package's data-loading code to select a random sample from the evaluation data. This data is in the form that was used to evaluate accuracy after each epoch of training, so it can be used to send test predictions without further preprocessing.

Open the Python interpreter (python) from your current working directory in order to run the next several snippets of code:

from trainer import util

_, _, eval_x, eval_y = util.load_data()

prediction_input = eval_x.sample(20)
prediction_targets = eval_y[prediction_input.index]

prediction_input

	age	workclass	education_num	marital_status	occupation	relationship	race	capital_gain	capital_loss	hours_per_week	native_country
1979	0.901213	1	1.525542	2	9	0	4	-0.144792	-0.217132	-0.437544	38
2430	-0.922154	3	-0.419265	4	2	3	4	-0.144792	-0.217132	-0.034039	38
4214	-1.213893	3	-0.030304	4	10	1	4	-0.144792	-0.217132	1.579979	38
10389	-0.630415	3	0.358658	4	0	3	4	-0.144792	-0.217132	-0.679647	38
14525	-1.505632	3	-1.586149	4	7	3	0	-0.144792	-0.217132	-0.034039	38
15040	-0.119873	5	0.358658	2	2	0	4	-0.144792	-0.217132	-0.841048	38
8409	0.244801	3	1.525542	2	9	0	4	-0.144792	-0.217132	1.176475	6
10628	0.098931	1	1.525542	2	9	0	4	0.886847	-0.217132	-0.034039	38
10942	0.390670	5	-0.030304	2	4	0	4	-0.144792	-0.217132	4.727315	38
5129	1.120017	3	1.136580	2	12	0	4	-0.144792	-0.217132	-0.034039	38
2096	-1.286827	3	-0.030304	4	11	3	4	-0.144792	-0.217132	-1.648058	38
12463	-0.703350	3	-0.419265	2	7	5	4	-0.144792	4.502280	-0.437544	38
8528	0.536539	3	1.525542	4	3	4	4	-0.144792	-0.217132	-0.034039	38
7093	-1.359762	3	-0.419265	4	6	3	2	-0.144792	-0.217132	-0.034039	38
12565	0.536539	3	1.136580	0	11	2	2	-0.144792	-0.217132	-0.034039	38
5655	1.338821	3	-0.419265	2	2	0	4	-0.144792	-0.217132	-0.034039	38
2322	0.682409	3	1.136580	0	12	3	4	-0.144792	-0.217132	-0.034039	38
12652	0.025997	3	1.136580	2	11	0	4	-0.144792	-0.217132	0.369465	38
4755	-0.411611	3	-0.419265	2	11	0	4	-0.144792	-0.217132	1.176475	38
4413	0.390670	6	1.136580	4	4	1	4	-0.144792	-0.217132	-0.034039	38

Notice that categorical fields, like occupation, have already been converted to integers (with the same mapping that was used for training). Numerical fields, like age, have been scaled to a z-score. Some fields have been dropped from the original data. Compare the prediction input with the raw data for the same examples:

import pandas as pd

_, eval_file_path = util.download(util.DATA_DIR)
raw_eval_data = pd.read_csv(eval_file_path,
                            names=util._CSV_COLUMNS,
                            na_values='?')

raw_eval_data.iloc[prediction_input.index]

	age	workclass	fnlwgt	education	education_num	marital_status	occupation	relationship	race	gender	capital_gain	capital_loss	hours_per_week	native_country	income_bracket
1979	51	Local-gov	99064	Masters	14	Married-civ-spouse	Prof-specialty	Husband	White	Male	0	0	35	United-States	<=50K
2430	26	Private	197967	HS-grad	9	Never-married	Craft-repair	Own-child	White	Male	0	0	40	United-States	<=50K
4214	22	Private	221694	Some-college	10	Never-married	Protective-serv	Not-in-family	White	Male	0	0	60	United-States	<=50K
10389	30	Private	96480	Assoc-voc	11	Never-married	Adm-clerical	Own-child	White	Female	0	0	32	United-States	<=50K
14525	18	Private	146225	10th	6	Never-married	Other-service	Own-child	Amer-Indian-Eskimo	Female	0	0	40	United-States	<=50K
15040	37	Self-emp-not-inc	50096	Assoc-voc	11	Married-civ-spouse	Craft-repair	Husband	White	Male	0	0	30	United-States	<=50K
8409	42	Private	102988	Masters	14	Married-civ-spouse	Prof-specialty	Husband	White	Male	0	0	55	Ecuador	>50K
10628	40	Local-gov	284086	Masters	14	Married-civ-spouse	Prof-specialty	Husband	White	Male	7688	0	40	United-States	>50K
10942	44	Self-emp-not-inc	52505	Some-college	10	Married-civ-spouse	Farming-fishing	Husband	White	Male	0	0	99	United-States	<=50K
5129	54	Private	106728	Bachelors	13	Married-civ-spouse	Tech-support	Husband	White	Male	0	0	40	United-States	<=50K
2096	21	Private	190916	Some-college	10	Never-married	Sales	Own-child	White	Female	0	0	20	United-States	<=50K
12463	29	Private	197565	HS-grad	9	Married-civ-spouse	Other-service	Wife	White	Female	0	1902	35	United-States	>50K
8528	46	Private	193188	Masters	14	Never-married	Exec-managerial	Unmarried	White	Male	0	0	40	United-States	<=50K
7093	20	Private	273147	HS-grad	9	Never-married	Machine-op-inspct	Own-child	Black	Male	0	0	40	United-States	<=50K
12565	46	Private	203653	Bachelors	13	Divorced	Sales	Other-relative	Black	Male	0	0	40	United-States	<=50K
5655	57	Private	174662	HS-grad	9	Married-civ-spouse	Craft-repair	Husband	White	Male	0	0	40	United-States	<=50K
2322	48	Private	232149	Bachelors	13	Divorced	Tech-support	Own-child	White	Female	0	0	40	United-States	<=50K
12652	39	Private	82521	Bachelors	13	Married-civ-spouse	Sales	Husband	White	Male	0	0	45	United-States	>50K
4755	33	Private	330715	HS-grad	9	Married-civ-spouse	Sales	Husband	White	Male	0	0	55	United-States	<=50K
4413	44	State-gov	128586	Bachelors	13	Never-married	Farming-fishing	Not-in-family	White	Male	0	0	40	United-States	<=50K

Export the prediction input to a newline-delimited JSON file:

import json

with open('prediction_input.json', 'w') as json_file:
  for row in prediction_input.values.tolist():
    json.dump(row, json_file)
    json_file.write('\n')

Exit the Python interpreter (exit()). From your shell, examine prediction_input.json:

cat prediction_input.json

[0.9012127751273994, 1.0, 1.525541514460902, 2.0, 9.0, 0.0, 4.0, -0.14479173735784842, -0.21713186390175285, -0.43754385253479555, 38.0]
[-0.9221541171760282, 3.0, -0.4192650914017433, 4.0, 2.0, 3.0, 4.0, -0.14479173735784842, -0.21713186390175285, -0.03403923708700391, 38.0]
[-1.2138928199445767, 3.0, -0.030303770229214273, 4.0, 10.0, 1.0, 4.0, -0.14479173735784842, -0.21713186390175285, 1.5799792247041626, 38.0]
[-0.6304154144074798, 3.0, 0.35865755094331475, 4.0, 0.0, 3.0, 4.0, -0.14479173735784842, -0.21713186390175285, -0.6796466218034705, 38.0]
[-1.5056315227131252, 3.0, -1.5861490549193304, 4.0, 7.0, 3.0, 0.0, -0.14479173735784842, -0.21713186390175285, -0.03403923708700391, 38.0]
[-0.11987268456252011, 5.0, 0.35865755094331475, 2.0, 2.0, 0.0, 4.0, -0.14479173735784842, -0.21713186390175285, -0.8410484679825871, 38.0]
[0.24480069389816542, 3.0, 1.525541514460902, 2.0, 9.0, 0.0, 4.0, -0.14479173735784842, -0.21713186390175285, 1.176474609256371, 6.0]
[0.0989313425138912, 1.0, 1.525541514460902, 2.0, 9.0, 0.0, 4.0, 0.8868473744801746, -0.21713186390175285, -0.03403923708700391, 38.0]
[0.39067004528243965, 5.0, -0.030303770229214273, 2.0, 4.0, 0.0, 4.0, -0.14479173735784842, -0.21713186390175285, 4.7273152251969375, 38.0]
[1.1200168022038106, 3.0, 1.1365801932883728, 2.0, 12.0, 0.0, 4.0, -0.14479173735784842, -0.21713186390175285, -0.03403923708700391, 38.0]
[-1.2868274956367138, 3.0, -0.030303770229214273, 4.0, 11.0, 3.0, 4.0, -0.14479173735784842, -0.21713186390175285, -1.6480576988781703, 38.0]
[-0.7033500900996169, 3.0, -0.4192650914017433, 2.0, 7.0, 5.0, 4.0, -0.14479173735784842, 4.5022796885373735, -0.43754385253479555, 38.0]
[0.5365393966667138, 3.0, 1.525541514460902, 4.0, 3.0, 4.0, 4.0, -0.14479173735784842, -0.21713186390175285, -0.03403923708700391, 38.0]
[-1.3597621713288508, 3.0, -0.4192650914017433, 4.0, 6.0, 3.0, 2.0, -0.14479173735784842, -0.21713186390175285, -0.03403923708700391, 38.0]
[0.5365393966667138, 3.0, 1.1365801932883728, 0.0, 11.0, 2.0, 2.0, -0.14479173735784842, -0.21713186390175285, -0.03403923708700391, 38.0]
[1.338820829280222, 3.0, -0.4192650914017433, 2.0, 2.0, 0.0, 4.0, -0.14479173735784842, -0.21713186390175285, -0.03403923708700391, 38.0]
[0.6824087480509881, 3.0, 1.1365801932883728, 0.0, 12.0, 3.0, 4.0, -0.14479173735784842, -0.21713186390175285, -0.03403923708700391, 38.0]
[0.0259966668217541, 3.0, 1.1365801932883728, 2.0, 11.0, 0.0, 4.0, -0.14479173735784842, -0.21713186390175285, 0.3694653783607877, 38.0]
[-0.4116113873310685, 3.0, -0.4192650914017433, 2.0, 11.0, 0.0, 4.0, -0.14479173735784842, -0.21713186390175285, 1.176474609256371, 38.0]
[0.39067004528243965, 6.0, 1.1365801932883728, 4.0, 4.0, 1.0, 4.0, -0.14479173735784842, -0.21713186390175285, -0.03403923708700391, 38.0]

The gcloud command-line tool accepts newline-delimited JSON for online prediction, and this particular Keras model expects a flat list of numbers for each input example.

AI Platform requires a different format when you make online prediction requests to the REST API without using the gcloud tool. The way you structure your model may also change how you must format data for prediction. Learn more about formatting data for online prediction.

Submit the online prediction request

Use gcloud to submit your online prediction request:

gcloud ai-platform predict \
  --model $MODEL_NAME \
  --region $REGION \
  --version $MODEL_VERSION \
  --json-instances prediction_input.json

DENSE_4
[0.6854287385940552]
[0.011786997318267822]
[0.037236183881759644]
[0.016223609447479248]
[0.0012015104293823242]
[0.23621389269828796]
[0.6174039244651794]
[0.9822691679000854]
[0.3815768361091614]
[0.6715215444564819]
[0.001094043254852295]
[0.43077391386032104]
[0.22132840752601624]
[0.004075437784194946]
[0.22736871242523193]
[0.4111979305744171]
[0.27328649163246155]
[0.6981356143951416]
[0.3309604525566101]
[0.20807647705078125]

Since the model's last layer uses a sigmoid function for its activation, outputs between 0 and 0.5 represent negative predictions ("<=50K") and outputs between 0.5 and 1 represent positive ones (">50K").

Developing the Keras model from scratch

At this point, you have trained a machine learning model on AI Platform, deployed the trained model as a version resource on AI Platform, and received online predictions from the deployment. The next section walks through recreating the Keras code used to train your model. It covers the following parts of developing a machine learning model for use with AI Platform:

Downloading and preprocessing data
Designing and training the model
Visualizing training and exporting the trained model

While this section provides more detailed insight to the tasks completed in previous parts, to learn more about using tf.keras, read TensorFlow's guide to Keras. To learn more about structuring code as a training packge for AI Platform, read Packaging a training application and reference the complete training code, which is structured as a Python package.

Import libraries and define constants

First, import Python libraries required for training:

import os
from six.moves import urllib
import tempfile

import numpy as np
import pandas as pd
import tensorflow as tf

# Examine software versions
print(__import__('sys').version)
print(tf.__version__)
print(tf.keras.__version__)

Then, define some useful constants:

Information for downloading training and evaluation data
Information required for Pandas to interpret the data and convert categorical fields into numeric features
Hyperparameters for training, such as learning rate and batch size

### For downloading data ###

# Storage directory
DATA_DIR = os.path.join(tempfile.gettempdir(), 'census_data')

# Download options.
DATA_URL = 'https://storage.googleapis.com/cloud-samples-data/ai-platform' \
           '/census/data'
TRAINING_FILE = 'adult.data.csv'
EVAL_FILE = 'adult.test.csv'
TRAINING_URL = '%s/%s' % (DATA_URL, TRAINING_FILE)
EVAL_URL = '%s/%s' % (DATA_URL, EVAL_FILE)

### For interpreting data ###

# These are the features in the dataset.
# Dataset information: https://archive.ics.uci.edu/ml/datasets/census+income
_CSV_COLUMNS = [
    'age', 'workclass', 'fnlwgt', 'education', 'education_num',
    'marital_status', 'occupation', 'relationship', 'race', 'gender',
    'capital_gain', 'capital_loss', 'hours_per_week', 'native_country',
    'income_bracket'
]

_CATEGORICAL_TYPES = {
  'workclass': pd.api.types.CategoricalDtype(categories=[
    'Federal-gov', 'Local-gov', 'Never-worked', 'Private', 'Self-emp-inc',
    'Self-emp-not-inc', 'State-gov', 'Without-pay'
  ]),
  'marital_status': pd.api.types.CategoricalDtype(categories=[
    'Divorced', 'Married-AF-spouse', 'Married-civ-spouse',
    'Married-spouse-absent', 'Never-married', 'Separated', 'Widowed'
  ]),
  'occupation': pd.api.types.CategoricalDtype([
    'Adm-clerical', 'Armed-Forces', 'Craft-repair', 'Exec-managerial',
    'Farming-fishing', 'Handlers-cleaners', 'Machine-op-inspct',
    'Other-service', 'Priv-house-serv', 'Prof-specialty', 'Protective-serv',
    'Sales', 'Tech-support', 'Transport-moving'
  ]),
  'relationship': pd.api.types.CategoricalDtype(categories=[
    'Husband', 'Not-in-family', 'Other-relative', 'Own-child', 'Unmarried',
    'Wife'
  ]),
  'race': pd.api.types.CategoricalDtype(categories=[
    'Amer-Indian-Eskimo', 'Asian-Pac-Islander', 'Black', 'Other', 'White'
  ]),
  'native_country': pd.api.types.CategoricalDtype(categories=[
    'Cambodia', 'Canada', 'China', 'Columbia', 'Cuba', 'Dominican-Republic',
    'Ecuador', 'El-Salvador', 'England', 'France', 'Germany', 'Greece',
    'Guatemala', 'Haiti', 'Holand-Netherlands', 'Honduras', 'Hong', 'Hungary',
    'India', 'Iran', 'Ireland', 'Italy', 'Jamaica', 'Japan', 'Laos', 'Mexico',
    'Nicaragua', 'Outlying-US(Guam-USVI-etc)', 'Peru', 'Philippines', 'Poland',
    'Portugal', 'Puerto-Rico', 'Scotland', 'South', 'Taiwan', 'Thailand',
    'Trinadad&Tobago', 'United-States', 'Vietnam', 'Yugoslavia'
  ]),
  'income_bracket': pd.api.types.CategoricalDtype(categories=[
    '<=50K', '>50K'
  ])
}

# This is the label (target) we want to predict.
_LABEL_COLUMN = 'income_bracket'

### Hyperparameters for training ###

# This the training batch size
BATCH_SIZE = 128

# This is the number of epochs (passes over the full training data)
NUM_EPOCHS = 20

# Define learning rate.
LEARNING_RATE = .01

Download and preprocess data

Download the data

Next, define functions to download training and evaluation data. These functions also fix minor irregularities in the data's formatting.

def _download_and_clean_file(filename, url):
  """Downloads data from url, and makes changes to match the CSV format.

  The CSVs may use spaces after the comma delimters (non-standard) or include
  rows which do not represent well-formed examples. This function strips out
  some of these problems.

  Args:
    filename: filename to save url to
    url: URL of resource to download
  """
  temp_file, _ = urllib.request.urlretrieve(url)
  with tf.gfile.Open(temp_file, 'r') as temp_file_object:
    with tf.gfile.Open(filename, 'w') as file_object:
      for line in temp_file_object:
        line = line.strip()
        line = line.replace(', ', ',')
        if not line or ',' not in line:
          continue
        if line[-1] == '.':
          line = line[:-1]
        line += '\n'
        file_object.write(line)
  tf.gfile.Remove(temp_file)


def download(data_dir):
  """Downloads census data if it is not already present.

  Args:
    data_dir: directory where we will access/save the census data
  """
  tf.gfile.MakeDirs(data_dir)

  training_file_path = os.path.join(data_dir, TRAINING_FILE)
  if not tf.gfile.Exists(training_file_path):
    _download_and_clean_file(training_file_path, TRAINING_URL)

  eval_file_path = os.path.join(data_dir, EVAL_FILE)
  if not tf.gfile.Exists(eval_file_path):
    _download_and_clean_file(eval_file_path, EVAL_URL)

  return training_file_path, eval_file_path

Use those functions to download the data for training and verify that you have CSV files for training and evaluation:

training_file_path, eval_file_path = download(DATA_DIR)

Next, load these files using Pandas and examine the data:

# This census data uses the value '?' for fields (column) that are missing data.
# We use na_values to find ? and set it to NaN values.
# https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

train_df = pd.read_csv(training_file_path, names=_CSV_COLUMNS, na_values='?')
eval_df = pd.read_csv(eval_file_path, names=_CSV_COLUMNS, na_values='?')

The following table shows an excerpt of the data (train_df.head()) before preprocessing:

	age	workclass	fnlwgt	education	education_num	marital_status	occupation	relationship	race	gender	capital_gain	hours_per_week	native_country	income_bracket
0	39	State-gov	77516	Bachelors	13	Never-married	Adm-clerical	Not-in-family	White	Male	2174	40	United-States	<=50K
1	50	Self-emp-not-inc	83311	Bachelors	13	Married-civ-spouse	Exec-managerial	Husband	White	Male	0	13	United-States	<=50K
2	38	Private	215646	HS-grad	9	Divorced	Handlers-cleaners	Not-in-family	White	Male	0	40	United-States	<=50K
3	53	Private	234721	11th	7	Married-civ-spouse	Handlers-cleaners	Husband	Black	Male	0	40	United-States	<=50K
4	28	Private	338409	Bachelors	13	Married-civ-spouse	Prof-specialty	Wife	Black	Female	0	40	Cuba	<=50K

Preprocess the data

The first preprocessing step removes certain features from the data and converts categorical features to numerical values for use with Keras.

Learn more about feature engineering and bias in data.

UNUSED_COLUMNS = ['fnlwgt', 'education', 'gender']


def preprocess(dataframe):
  """Converts categorical features to numeric. Removes unused columns.

  Args:
    dataframe: Pandas dataframe with raw data

  Returns:
    Dataframe with preprocessed data
  """
  dataframe = dataframe.drop(columns=UNUSED_COLUMNS)

  # Convert integer valued (numeric) columns to floating point
  numeric_columns = dataframe.select_dtypes(['int64']).columns
  dataframe[numeric_columns] = dataframe[numeric_columns].astype('float32')

  # Convert categorical columns to numeric
  cat_columns = dataframe.select_dtypes(['object']).columns
  dataframe[cat_columns] = dataframe[cat_columns].apply(lambda x: x.astype(
    _CATEGORICAL_TYPES[x.name]))
  dataframe[cat_columns] = dataframe[cat_columns].apply(lambda x: x.cat.codes)
  return dataframe

prepped_train_df = preprocess(train_df)
prepped_eval_df = preprocess(eval_df)

The following table (prepped_train_df.head()) shows how preprocessing changed the data. Notice in particular that income_bracket, the label that you're training the model to predict, has changed from <=50K and >50K to 0 and 1:

	age	workclass	education_num	marital_status	occupation	relationship	race	capital_gain	hours_per_week	native_country
0	39.0	6	13.0	4	0	1	4	2174.0	40.0	38
1	50.0	5	13.0	2	3	0	4	0.0	13.0	38
2	38.0	3	9.0	0	5	1	4	0.0	40.0	38
3	53.0	3	7.0	2	5	0	2	0.0	40.0	38
4	28.0	3	13.0	2	9	5	2	0.0	40.0	4

Next, separate the data into features ("x") and labels ("y"), and reshape the label arrays into a format for use with tf.data.Dataset later:

# Split train and test data with labels.
# The pop() method will extract (copy) and remove the label column from the dataframe
train_x, train_y = prepped_train_df, prepped_train_df.pop(_LABEL_COLUMN)
eval_x, eval_y = prepped_eval_df, prepped_eval_df.pop(_LABEL_COLUMN)

# Reshape label columns for use with tf.data.Dataset
train_y = np.asarray(train_y).astype('float32').reshape((-1, 1))
eval_y = np.asarray(eval_y).astype('float32').reshape((-1, 1))

Scaling training data so each numerical feature column has a mean of 0 and a standard deviation of 1 can improve your model.

In a production system, you may want to save the means and standard deviations from your training set and use them to perform an identical transformation on test data at prediction time. For convenience in this exercise, temporarily combine the training and evaluation data to scale all of them:

def standardize(dataframe):
  """Scales numerical columns using their means and standard deviation to get
  z-scores: the mean of each numerical column becomes 0, and the standard
  deviation becomes 1. This can help the model converge during training.

  Args:
    dataframe: Pandas dataframe

  Returns:
    Input dataframe with the numerical columns scaled to z-scores
  """
  dtypes = list(zip(dataframe.dtypes.index, map(str, dataframe.dtypes)))
  # Normalize numeric columns.
  for column, dtype in dtypes:
      if dtype == 'float32':
          dataframe[column] -= dataframe[column].mean()
          dataframe[column] /= dataframe[column].std()
  return dataframe


# Join train_x and eval_x to normalize on overall means and standard
# deviations. Then separate them again.
all_x = pd.concat([train_x, eval_x], keys=['train', 'eval'])
all_x = standardize(all_x)
train_x, eval_x = all_x.xs('train'), all_x.xs('eval')

The next table (train_x.head()) shows what the fully preprocessed data looks like:

	age	workclass	education_num	marital_status	occupation	relationship	race	capital_gain	capital_loss	hours_per_week	native_country
0	0.025997	6	1.136580	4	0	1	4	0.146933	-0.217132	-0.034039	38
1	0.828278	5	1.136580	2	3	0	4	-0.144792	-0.217132	-2.212964	38
2	-0.046938	3	-0.419265	0	5	1	4	-0.144792	-0.217132	-0.034039	38
3	1.047082	3	-1.197188	2	5	0	2	-0.144792	-0.217132	-0.034039	38
4	-0.776285	3	1.136580	2	9	5	2	-0.144792	-0.217132	-0.034039	4

Design and train the model

Create training and validation datasets

Create an input function to convert features and labels into a tf.data.Dataset for training or evaluation:

def input_fn(features, labels, shuffle, num_epochs, batch_size):
  """Generates an input function to be used for model training.

  Args:
    features: numpy array of features used for training or inference
    labels: numpy array of labels for each example
    shuffle: boolean for whether to shuffle the data or not (set True for
      training, False for evaluation)
    num_epochs: number of epochs to provide the data for
    batch_size: batch size for training

  Returns:
    A tf.data.Dataset that can provide data to the Keras model for training or
      evaluation
  """
  if labels is None:
    inputs = features
  else:
    inputs = (features, labels)
  dataset = tf.data.Dataset.from_tensor_slices(inputs)

  if shuffle:
    dataset = dataset.shuffle(buffer_size=len(features))

  # We call repeat after shuffling, rather than before, to prevent separate
  # epochs from blending together.
  dataset = dataset.repeat(num_epochs)
  dataset = dataset.batch(batch_size)
  return dataset

Next, create these training and evaluation datasets.Use the NUM_EPOCHS and BATCH_SIZE hyperparameters defined previously to define how the training dataset provides examples to the model during training. Set up the validation dataset to provide all its examples in one batch, for a single validation step at the end of each training epoch.

# Pass a numpy array by using DataFrame.values
training_dataset = input_fn(features=train_x.values,
                    labels=train_y,
                    shuffle=True,
                    num_epochs=NUM_EPOCHS,
                    batch_size=BATCH_SIZE)

num_eval_examples = eval_x.shape[0]

# Pass a numpy array by using DataFrame.values
validation_dataset = input_fn(features=eval_x.values,
                    labels=eval_y,
                    shuffle=False,
                    num_epochs=NUM_EPOCHS,
                    batch_size=num_eval_examples)

Design a Keras Model

Design your neural network using the Keras Sequential API.

This deep neural network (DNN) has several hidden layers, and the last layer uses a sigmoid activation function to output a value between 0 and 1:

The input layer has 100 units using the ReLU activation function.
The hidden layer has 75 units using the ReLU activation function.
The hidden layer has 50 units using the ReLU activation function.
The hidden layer has 25 units using the ReLU activation function.
The output layer has 1 units using a sigmoid activation function.
The optimizer uses the binary cross-entropy loss function, which is appropriate for a binary classification problem like this one.

Feel free to change these layers to try to improve the model:

def create_keras_model(input_dim, learning_rate):
  """Creates Keras Model for Binary Classification.

  Args:
    input_dim: How many features the input has
    learning_rate: Learning rate for training

  Returns:
    The compiled Keras model (still needs to be trained)
  """
  Dense = tf.keras.layers.Dense
  model = tf.keras.Sequential(
    [
        Dense(100, activation=tf.nn.relu, kernel_initializer='uniform',
                input_shape=(input_dim,)),
        Dense(75, activation=tf.nn.relu),
        Dense(50, activation=tf.nn.relu),
        Dense(25, activation=tf.nn.relu),
        Dense(1, activation=tf.nn.sigmoid)
    ])

  # Custom Optimizer:
  # https://www.tensorflow.org/api_docs/python/tf/train/RMSPropOptimizer
  optimizer = tf.keras.optimizers.RMSprop(
      lr=learning_rate)

  # Compile Keras model
  model.compile(
      loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
  return model

Next, create the Keras model object:

num_train_examples, input_dim = train_x.shape
print('Number of features: {}'.format(input_dim))
print('Number of examples: {}'.format(num_train_examples))

keras_model = create_keras_model(
    input_dim=input_dim,
    learning_rate=LEARNING_RATE)

Examining the model with keras_model.summary() should return something like:

Number of features: 11
Number of examples: 32561
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 100)               1200      
_________________________________________________________________
dense_1 (Dense)              (None, 75)                7575      
_________________________________________________________________
dense_2 (Dense)              (None, 50)                3800      
_________________________________________________________________
dense_3 (Dense)              (None, 25)                1275      
_________________________________________________________________
dense_4 (Dense)              (None, 1)                 26        
=================================================================
Total params: 13,876
Trainable params: 13,876
Non-trainable params: 0
_________________________________________________________________

Train and evaluate the model

Define a learning rate decay to encourage model parameters to make smaller changes as training goes on:

# Setup Learning Rate decay.
lr_decay_cb = tf.keras.callbacks.LearningRateScheduler(
    lambda epoch: LEARNING_RATE + 0.02 * (0.5 ** (1 + epoch)),
    verbose=True)

# Setup TensorBoard callback.
JOB_DIR = os.getenv('JOB_DIR')
tensorboard_cb = tf.keras.callbacks.TensorBoard(
      os.path.join(JOB_DIR, 'keras_tensorboard'),
      histogram_freq=1)

Finally, train the model. Provide the appropriate steps_per_epoch for the model to train on the entire training dataset (with BATCH_SIZE examples per step) during each epoch. And instruct the model to calculate validation accuracy with one big validation batch at the end of each epoch.

history = keras_model.fit(training_dataset,
                          epochs=NUM_EPOCHS,
                          steps_per_epoch=int(num_train_examples/BATCH_SIZE),
                          validation_data=validation_dataset,
                          validation_steps=1,
                          callbacks=[lr_decay_cb, tensorboard_cb],
                          verbose=1)

Training progress may look like the following:

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.

Epoch 00001: LearningRateScheduler reducing learning rate to 0.02.
Epoch 1/20
254/254 [==============================] - 1s 5ms/step - loss: 0.6986 - acc: 0.7893 - val_loss: 0.3894 - val_acc: 0.8329

Epoch 00002: LearningRateScheduler reducing learning rate to 0.015.
Epoch 2/20
254/254 [==============================] - 1s 4ms/step - loss: 0.3574 - acc: 0.8335 - val_loss: 0.3861 - val_acc: 0.8131

...

Epoch 00019: LearningRateScheduler reducing learning rate to 0.010000038146972657.
Epoch 19/20
254/254 [==============================] - 1s 4ms/step - loss: 0.3239 - acc: 0.8512 - val_loss: 0.3334 - val_acc: 0.8496

Epoch 00020: LearningRateScheduler reducing learning rate to 0.010000019073486329.
Epoch 20/20
254/254 [==============================] - 1s 4ms/step - loss: 0.3279 - acc: 0.8504 - val_loss: 0.3174 - val_acc: 0.8523

Visualize training and export the trained model

Visualize training

Import matplotlib to visualize how the model learned over the training period. (If necessary, first install it with pip install matplotlib.)

from matplotlib import pyplot as plt

Plot the model's loss (binary cross-entropy) and accuracy, as measured at the end of each training epoch:

# Visualize History for Loss.
plt.title('Keras model loss')
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['training', 'validation'], loc='upper right')
plt.show()

# Visualize History for Accuracy.
plt.title('Keras model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.legend(['training', 'validation'], loc='lower right')
plt.show()

Over time, loss decreases and accuracy increases. But do they converge to a stable level? Are there big differences between the training and validation metrics (a sign of overfitting)?

Learn about how to improve your machine learning model. Then, feel free to adjust hyperparameters or the model architecture and train again.

Export the model for serving

Use tf.contrib.saved_model.save_keras_model to export a TensorFlow SavedModel directory. This is the format that AI Platform requires when you create a model version resource.

Since not all optimizers can be exported to the SavedModel format, you may see warnings during the export process. As long you successfully export a serving graph, AI Platform can used the SavedModel to serve predictions.

# Export the model to a local SavedModel directory
export_path = tf.contrib.saved_model.save_keras_model(keras_model, 'keras_export')
print("Model exported to: ", export_path)

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
* https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
* https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.

WARNING:tensorflow:This model was compiled with a Keras optimizer (<tensorflow.python.keras.optimizers.RMSprop object at 0x7fc198c4e400>) but is being saved in TensorFlow format with `save_weights`. The model's weights will be saved, but unlike with TensorFlow optimizers in the TensorFlow format the optimizer's state will not be saved.

Consider using a TensorFlow optimizer from `tf.train`.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/network.py:1436: update_checkpoint_state (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.train.CheckpointManager to manage checkpoints rather than manually editing the Checkpoint proto.
WARNING:tensorflow:Model was compiled with an optimizer, but the optimizer is not from `tf.train` (e.g. `tf.train.AdagradOptimizer`). Only the serving graph was exported. The train and evaluate graphs were not added to the SavedModel.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/saved_model/signature_def_utils_impl.py:205: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.
INFO:tensorflow:Signatures INCLUDED in export for Classify: None
INFO:tensorflow:Signatures INCLUDED in export for Regress: None
INFO:tensorflow:Signatures INCLUDED in export for Predict: ['serving_default']
INFO:tensorflow:Signatures INCLUDED in export for Train: None
INFO:tensorflow:Signatures INCLUDED in export for Eval: None
INFO:tensorflow:No assets to save.
INFO:tensorflow:No assets to write.
INFO:tensorflow:SavedModel written to: keras_export/1553710367/saved_model.pb
Model exported to: b'keras_export/1553710367'

You may export a SavedModel directory to your local filesystem or to Cloud Storage, as long as you have the necessary permissions. In your current environment, you granted access to Cloud Storage by authenticating your Google Cloud account and setting the GOOGLE_APPLICATION_CREDENTIALS environment variable. AI Platform training jobs can also export directly to Cloud Storage, because AI Platform service accounts have access to Cloud Storage buckets in their own project.

Try exporting directly to Cloud Storage:

JOB_DIR = os.getenv('JOB_DIR')

# Export the model to a SavedModel directory in Cloud Storage
export_path = tf.contrib.saved_model.save_keras_model(keras_model, JOB_DIR + '/keras_export')
print("Model exported to: ", export_path)

WARNING:tensorflow:This model was compiled with a Keras optimizer (<tensorflow.python.keras.optimizers.RMSprop object at 0x7fc198c4e400>) but is being saved in TensorFlow format with `save_weights`. The model's weights will be saved, but unlike with TensorFlow optimizers in the TensorFlow format the optimizer's state will not be saved.

Consider using a TensorFlow optimizer from `tf.train`.
WARNING:tensorflow:Model was compiled with an optimizer, but the optimizer is not from `tf.train` (e.g. `tf.train.AdagradOptimizer`). Only the serving graph was exported. The train and evaluate graphs were not added to the SavedModel.
INFO:tensorflow:Signatures INCLUDED in export for Classify: None
INFO:tensorflow:Signatures INCLUDED in export for Regress: None
INFO:tensorflow:Signatures INCLUDED in export for Predict: ['serving_default']
INFO:tensorflow:Signatures INCLUDED in export for Train: None
INFO:tensorflow:Signatures INCLUDED in export for Eval: None
INFO:tensorflow:No assets to save.
INFO:tensorflow:No assets to write.
INFO:tensorflow:SavedModel written to: gs://your-bucket-name/keras-job-dir/keras_export/1553710379/saved_model.pb
Model exported to:  b'gs://your-bucket-name/keras-job-dir/keras_export/1553710379'

You can now deploy this model to AI Platform and serve predictions by following the steps from the quickstart for prediction.

Cleaning up

To clean up all Google Cloud resources used in this project, you can delete the Google Cloud project you used for the tutorial.

Alternatively, you can clean up individual resources by running the following commands:

# Delete model version resource
gcloud ai-platform versions delete $MODEL_VERSION --quiet --model $MODEL_NAME

# Delete model resource
gcloud ai-platform models delete $MODEL_NAME --quiet

# Delete Cloud Storage objects that were created
gsutil -m rm -r $JOB_DIR

# If training job is still running, cancel it
gcloud ai-platform jobs cancel $JOB_NAME --quiet

If your Cloud Storage bucket doesn't contain any other objects and you would like to delete it, run gsutil rm -r gs://$BUCKET_NAME.

What's next?

View the complete training code used in this guide, which structures the code to accept custom hyperparameters as command-line flags.
Read about packaging code for an AI Platform training job.
Read about deploying a model to serve predictions.