This sample trains a model to predict a person's income level based on the Census Income Data Set. After you train and save the model locally, you deploy it to AI Platform Prediction and query it to get online predictions.
You can deploy and serve scikit-learn pipelines on AI Platform Prediction. The Pipeline module in scikit-learn enables you to apply multiple data transformations before training with an estimator. This encapsulates multiple steps in data processing and ensures that the same training data is used in each step.
This tutorial is also available on GitHub as a Jupyter notebook.
How to bring your model to AI Platform Prediction
You can bring your model to AI Platform Prediction to get predictions in five steps:
- Save your model to a file
- Upload the saved model to Cloud Storage
- Create a model resource on AI Platform Prediction
- Create a model version, linking your saved model
- Make an online prediction
Before you begin
Complete the following steps to set up a GCP account, activate the AI Platform Prediction API, and install and activate the Cloud SDK.
Set up your GCP project
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the AI Platform Training & Prediction and Compute Engine APIs.
- Install the Google Cloud CLI.
-
To initialize the gcloud CLI, run the following command:
gcloud init
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the AI Platform Training & Prediction and Compute Engine APIs.
- Install the Google Cloud CLI.
-
To initialize the gcloud CLI, run the following command:
gcloud init
Set up your environment
Choose one of the options below to set up your environment locally on macOS or in a remote environment on Cloud Shell.
For macOS users, we recommend that you set up your environment using the MACOS tab below. Cloud Shell, shown on the CLOUD SHELL tab, is available on macOS, Linux, and Windows. Cloud Shell provides a quick way to try AI Platform Prediction, but isn't suitable for ongoing development work.
macOS
-
Check Python installation
Confirm that you have Python installed and, if necessary, install it.python -V
-
Check
pip
installation
pip
is Python's package manager, included with current versions of Python. Check if you already havepip
installed by runningpip --version
. If not, see how to installpip
.You can upgrade
pip
using the following command:pip install -U pip
See the pip documentation for more details.
-
Install
virtualenv
virtualenv
is a tool to create isolated Python environments. Check if you already havevirtualenv
installed by runningvirtualenv --version
. If not, installvirtualenv
:pip install --user --upgrade virtualenv
To create an isolated development environment for this guide, create a new virtual environment in
virtualenv
. For example, the following command activates an environment namedaip-env
:virtualenv aip-env source aip-env/bin/activate
-
For the purposes of this tutorial, run the rest of the commands within your virtual environment.
See more information about usingvirtualenv
. To exitvirtualenv
, rundeactivate
.
Cloud Shell
-
Open the Google Cloud console.
-
Click the Activate Google Cloud Shell button at the top of the console window.
A Cloud Shell session opens inside a new frame at the bottom of the console and displays a command-line prompt. It can take a few seconds for the shell session to be initialized.
Your Cloud Shell session is ready to use.
-
Configure the
gcloud
command-line tool to use your selected project.gcloud config set project [selected-project-id]
where
[selected-project-id]
is your project ID. (Omit the enclosing brackets.)
Install frameworks
macOS
Within your virtual environment, run the following command to install the versions of scikit-learn and pandas used in AI Platform Prediction runtime version 2.11:
(aip-env)$ pip install scikit-learn==1.0.2 pandas==1.3.5
By providing version numbers in the preceding command, you ensure that the dependencies in your virtual environment match the dependencies in the runtime version. This helps prevent unexpected behavior when your code runs on AI Platform Prediction.
For more details, installation options, and troubleshooting information, refer to the installation instructions for each framework:
Cloud Shell
Run the following command to install scikit-learn, and pandas:
pip install --user scikit-learn pandas
For more details, installation options, and troubleshooting information, refer to the installation instructions for each framework:
Download the data
The Census Income Data Set that this sample uses for training is hosted by the UC Irvine Machine Learning Repository. See About the data for more information.
- Training file is
adult.data
- Evaluation file is
adult.test
Train and save a model
To train and save a model, complete the following steps:
- Load the data into a pandas DataFrame to prepare it for use with scikit-learn.
- Train a simple model in scikit-learn.
- Save the model to a file that can be uploaded to AI Platform Prediction.
If you already have a trained model to upload, see how to export your model.
Load and transform data
You can export
Pipeline
objects using
the version of joblib
included in scikit-learn or pickle
, similarly to how
you export
scikit-learn estimators. The following
example uses Pipelines to convert individual categorical features to numerical
values, combines them and uses a
RandomForestClassifier
to
train the model.
from sklearn.externals import joblib
import json
import numpy as np
import os
import pandas as pd
import pickle
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_selection import SelectKBest
from sklearn.pipeline import FeatureUnion
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import LabelBinarizer
# Define the format of your input data, including unused columns.
# These are the columns from the census data files.
COLUMNS = (
'age',
'workclass',
'fnlwgt',
'education',
'education-num',
'marital-status',
'occupation',
'relationship',
'race',
'sex',
'capital-gain',
'capital-loss',
'hours-per-week',
'native-country',
'income-level'
)
# Categorical columns are columns that need to be turned into a numerical value to be used by scikit-learn
CATEGORICAL_COLUMNS = (
'workclass',
'education',
'marital-status',
'occupation',
'relationship',
'race',
'sex',
'native-country'
)
# Load the training census dataset
with open('./census_data/adult.data', 'r') as train_data:
raw_training_data = pd.read_csv(train_data, header=None, names=COLUMNS)
# Remove the column we are trying to predict ('income-level') from our features list
# Convert the Dataframe to a lists of lists
train_features = raw_training_data.drop('income-level', axis=1).as_matrix().tolist()
# Create our training labels list, convert the Dataframe to a lists of lists
train_labels = (raw_training_data['income-level'] == ' >50K').as_matrix().tolist()
# Load the test census dataset
with open('./census_data/adult.test', 'r') as test_data:
raw_testing_data = pd.read_csv(test_data, names=COLUMNS, skiprows=1)
# Remove the column we are trying to predict ('income-level') from our features list
# Convert the Dataframe to a lists of lists
test_features = raw_testing_data.drop('income-level', axis=1).as_matrix().tolist()
# Create our training labels list, convert the Dataframe to a lists of lists
test_labels = (raw_testing_data['income-level'] == ' >50K.').as_matrix().tolist()
# Since the census data set has categorical features, we need to convert
# them to numerical values. We'll use a list of pipelines to convert each
# categorical column and then use FeatureUnion to combine them before calling
# the RandomForestClassifier.
categorical_pipelines = []
# Each categorical column needs to be extracted individually and converted to a numerical value.
# To do this, each categorical column will use a pipeline that extracts one feature column via
# SelectKBest(k=1) and a LabelBinarizer() to convert the categorical value to a numerical one.
# A scores array (created below) will select and extract the feature column. The scores array is
# created by iterating over the COLUMNS and checking if it is a CATEGORICAL_COLUMN.
for i, col in enumerate(COLUMNS[:-1]):
if col in CATEGORICAL_COLUMNS:
# Create a scores array to get the individual categorical column.
# Example:
# data = [39, 'State-gov', 77516, 'Bachelors', 13, 'Never-married', 'Adm-clerical',
# 'Not-in-family', 'White', 'Male', 2174, 0, 40, 'United-States']
# scores = [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
#
# Returns: [['Sate-gov']]
scores = []
# Build the scores array
for j in range(len(COLUMNS[:-1])):
if i == j: # This column is the categorical column we want to extract.
scores.append(1) # Set to 1 to select this column
else: # Every other column should be ignored.
scores.append(0)
skb = SelectKBest(k=1)
skb.scores_ = scores
# Convert the categorical column to a numerical value
lbn = LabelBinarizer()
r = skb.transform(train_features)
lbn.fit(r)
# Create the pipeline to extract the categorical feature
categorical_pipelines.append(
('categorical-{}'.format(i), Pipeline([
('SKB-{}'.format(i), skb),
('LBN-{}'.format(i), lbn)])))
# Create pipeline to extract the numerical features
skb = SelectKBest(k=6)
# From COLUMNS use the features that are numerical
skb.scores_ = [1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0]
categorical_pipelines.append(('numerical', skb))
# Combine all the features using FeatureUnion
preprocess = FeatureUnion(categorical_pipelines)
# Create the classifier
classifier = RandomForestClassifier()
# Transform the features and fit them to the classifier
classifier.fit(preprocess.transform(train_features), train_labels)
# Create the overall model as a single pipeline
pipeline = Pipeline([
('union', preprocess),
('classifier', classifier)
])
Export your model
To export your model, you can use joblib
or the Python pickle
library:
joblib
from sklearn.externals import joblib
# Export the model to a file
joblib.dump(pipeline, 'model.joblib')
pickle
# Export the model to a file
with open('model.pkl', 'wb') as model_file:
pickle.dump(pipeline, model_file)
Model file naming requirements
The saved model file that you upload to Cloud Storage must be named
either model.pkl
or model.joblib
, depending on which library you used. This
restriction ensures that AI Platform Prediction uses the same pattern to
reconstruct the model on import as was used during export.
Library used to export model | Correct model name |
---|---|
pickle |
model.pkl |
sklearn.externals.joblib |
model.joblib |
For future iterations of your model, organize your Cloud Storage bucket so that each new model has a dedicated directory.
Store your model in Cloud Storage
For the purposes of this tutorial, it is easiest to use a dedicated Cloud Storage bucket in the same project you're using for AI Platform Prediction.
If you're using a bucket in a different project, you must ensure that your AI Platform Prediction service account can access your model in Cloud Storage. Without the appropriate permissions, your request to create an AI Platform Prediction model version fails. See more about granting permissions for storage.
Set up your Cloud Storage bucket
This section shows you how to create a new bucket. You can use an existing bucket, but it must be in the same region where you plan on running AI Platform jobs. Additionally, if it is not part of the project you are using to run AI Platform Prediction, you must explicitly grant access to the AI Platform Prediction service accounts.
-
Specify a name for your new bucket. The name must be unique across all buckets in Cloud Storage.
BUCKET_NAME="YOUR_BUCKET_NAME"
For example, use your project name with
-aiplatform
appended:PROJECT_ID=$(gcloud config list project --format "value(core.project)") BUCKET_NAME=${PROJECT_ID}-aiplatform
-
Check the bucket name that you created.
echo $BUCKET_NAME
-
Select a region for your bucket and set a
REGION
environment variable.Use the same region where you plan on running AI Platform Prediction jobs. See the available regions for AI Platform Prediction services.
For example, the following code creates
REGION
and sets it tous-central1
:REGION=us-central1
-
Create the new bucket:
gcloud storage buckets create gs://$BUCKET_NAME --location=$REGION
Upload the exported model file to Cloud Storage
Run the following command to upload your saved model file to your bucket in Cloud Storage:
gcloud storage cp ./model.joblib gs://your_bucket_name/model.joblib
Format data for prediction
Before you send an online prediction request, you must format your test data to prepare it for use by the AI Platform Prediction prediction service. Make sure that the format of your input instances matches what your model expects.
gcloud
Create an input.json
file with each input instance on a separate line.
The following example uses the first ten data instances in
the test_features
list that was defined in previous steps.
[25, "Private", 226802, "11th", 7, "Never-married", "Machine-op-inspct", "Own-child", "Black", "Male", 0, 0, 40, "United-States"]
[38, "Private", 89814, "HS-grad", 9, "Married-civ-spouse", "Farming-fishing", "Husband", "White", "Male", 0, 0, 50, "United-States"]
[28, "Local-gov", 336951, "Assoc-acdm", 12, "Married-civ-spouse", "Protective-serv", "Husband", "White", "Male", 0, 0, 40, "United-States"]
[44, "Private", 160323, "Some-college", 10, "Married-civ-spouse", "Machine-op-inspct", "Husband", "Black", "Male", 7688, 0, 40, "United-States"]
[18, "?", 103497, "Some-college", 10, "Never-married", "?", "Own-child", "White", "Female", 0, 0, 30, "United-States"]
[34, "Private", 198693, "10th", 6, "Never-married", "Other-service", "Not-in-family", "White", "Male", 0, 0, 30, "United-States"]
[29, "?", 227026, "HS-grad", 9, "Never-married", "?", "Unmarried", "Black", "Male", 0, 0, 40, "United-States"]
[63, "Self-emp-not-inc", 104626, "Prof-school", 15, "Married-civ-spouse", "Prof-specialty", "Husband", "White", "Male", 3103, 0, 32, "United-States"]
[24, "Private", 369667, "Some-college", 10, "Never-married", "Other-service", "Unmarried", "White", "Female", 0, 0, 40, "United-States"]
[55, "Private", 104996, "7th-8th", 4, "Married-civ-spouse", "Craft-repair", "Husband", "White", "Male", 0, 0, 10, "United-States"]
Note that the format of input instances needs to match what your model
expects. In this example, the Census model requires 14 features, so your
input must be a matrix of shape (num_instances, 14
).
REST API
Create an input.json
file formatted with each input instance on a separate
line. The following example uses the first ten data instances in
the test_features
list that was defined in previous steps.
{
"instances": [
[25, "Private", 226802, "11th", 7, "Never-married", "Machine-op-inspct", "Own-child", "Black", "Male", 0, 0, 40, "United-States"],
[38, "Private", 89814, "HS-grad", 9, "Married-civ-spouse", "Farming-fishing", "Husband", "White", "Male", 0, 0, 50, "United-States"],
[28, "Local-gov", 336951, "Assoc-acdm", 12, "Married-civ-spouse", "Protective-serv", "Husband", "White", "Male", 0, 0, 40, "United-States"],
[44, "Private", 160323, "Some-college", 10, "Married-civ-spouse", "Machine-op-inspct", "Husband", "Black", "Male", 7688, 0, 40, "United-States"],
[18, "?", 103497, "Some-college", 10, "Never-married", "?", "Own-child", "White", "Female", 0, 0, 30, "United-States"],
[34, "Private", 198693, "10th", 6, "Never-married", "Other-service", "Not-in-family", "White", "Male", 0, 0, 30, "United-States"],
[29, "?", 227026, "HS-grad", 9, "Never-married", "?", "Unmarried", "Black", "Male", 0, 0, 40, "United-States"],
[63, "Self-emp-not-inc", 104626, "Prof-school", 15, "Married-civ-spouse", "Prof-specialty", "Husband", "White", "Male", 3103, 0, 32, "United-States"],
[24, "Private", 369667, "Some-college", 10, "Never-married", "Other-service", "Unmarried", "White", "Female", 0, 0, 40, "United-States"],
[55, "Private", 104996, "7th-8th", 4, "Married-civ-spouse", "Craft-repair", "Husband", "White", "Male", 0, 0, 10, "United-States"]
]
}
Note that the format of input instances needs to match what your model
expects. In this example, the Census model requires 14 features, so your
input must be a matrix of shape (num_instances, 14
).
See more information on formatting your input for online prediction.
Test your model with local predictions
You can use the
gcloud ai-platform local predict
command to test how your model serves predictions before you deploy it to
AI Platform Prediction. The command uses dependencies in your local environment
to perform prediction and returns results in the same format that
gcloud ai-platform predict
uses when it performs online predictions. Testing predictions locally can help
you discover errors before you incur costs for online prediction requests.
For the --model-dir
argument, specify a directory containing
your exported machine learning model, either on your local machine or in
Cloud Storage. For the --framework
argument, specify tensorflow
,
scikit-learn
, or xgboost
. You cannot use the
gcloud ai-platform local predict
command with a custom prediction
routine.
The following example shows how to perform local prediction:
gcloud ai-platform local predict --model-dir LOCAL_OR_CLOUD_STORAGE_PATH_TO_MODEL_DIRECTORY/ \
--json-instances LOCAL_PATH_TO_PREDICTION_INPUT.JSON \
--framework NAME_OF_FRAMEWORK
Deploy models and versions
AI Platform Prediction organizes your trained models using model and version resources. An AI Platform Prediction model is a container for the versions of your machine learning model.
To deploy a model, you create a model resource in AI Platform Prediction, create a version of that model, then link the model version to the model file stored in Cloud Storage.
Create a model resource
AI Platform Prediction uses model resources to organize different versions of your model.
You must decide at this time whether you want model versions belonging to this this model to use a regional endpoint or the global endpoint. In most cases, choose a regional endpoint. If you need functionality that is only available on legacy (MLS1) machine types, then use the global endpoint.
You must also decide at this time if you want model versions belonging to this model to export any logs when they serve predictions. The following examples do not enable logging. Learn how to enable logging.
console
Open the AI Platform Prediction Models page in the Google Cloud console:
Click the New Model button at the top of the Models page. This brings you to the Create model page.
Enter a unique name for your model in the Model name field.
When the Use regional endpoint checkbox is selected, AI Platform Prediction uses a regional endpoint. To use the global endpoint instead, clear the Use regional endpoint checkbox.
From the Region drop-down list, select a location for your prediction nodes. The available regions differ depending on whether you use a regional endpoint or the global endpoint.
Click Create.
Verify that you have returned to the Models page, and that your new model appears in the list.
gcloud
Regional endpoint
Run the following command:
gcloud ai-platform models create MODEL_NAME \
--region=REGION
Replace the following:
- MODEL_NAME: A name that you choose for your model.
- REGION: The region of the regional endpoint where you want prediction nodes to run. This must be a region that supports Compute Engine (N1) machine types.
If you don't specify the --region
flag, then the gcloud CLI
prompts you to select a regional endpoint (or to use us-central
on the
global endpoint).
Alternatively, you can set the ai_platform/region
property to a specific region in
order to make sure the gcloud CLI always uses the
corresponding regional endpoint for AI Platform Prediction, even when
you don't specify the --region
flag. (This configuration doesn't apply
to commands in the
gcloud ai-platform operations
command group.)
Global endpoint
Run the following command:
gcloud ai-platform models create MODEL_NAME \
--regions=REGION
Replace the following:
- MODEL_NAME: A name that you choose for your model.
- REGION: The region on the global endpoint where you want prediction nodes to run. This must be a region that supports legacy (MLS1) machine types.
If you don't specify the --regions
flag, then the
gcloud CLI prompts you to select a regional endpoint (or to
use us-central1
on the global endpoint).
REST API
Regional endpoint
Format your request by placing the model object in the request body. At minimum, specify a name for your model by replacing MODEL_NAME in the following sample:
{ "name": "MODEL_NAME" }
Make a REST API call to the following URL, replacing PROJECT_ID with your Google Cloud project ID:
POST https://REGION-ml.googleapis.com/v1/projects/PROJECT_ID/models/
Replace the following:
REGION: The region of the regional endpoint to deploy your model to. This must be a region that supports Compute Engine (N1) machine types.
PROJECT_ID: Your Google Cloud project ID.
For example, you can make the following request using the
curl
command. This command authorizes the request using the credentials associated with your Google Cloud CLI installation.curl -X POST -H "Content-Type: application/json" \ -d '{"name": "MODEL_NAME"}' \ -H "Authorization: Bearer `gcloud auth print-access-token`" \ "https://REGION-ml.googleapis.com/v1/projects/PROJECT_ID/models"
The API returns a response similar to the following:
{ "name": "projects/PROJECT_ID/models/MODEL_NAME", "regions": [ "REGION" ] }
Global endpoint
Format your request by placing the model object in the request body. At minimum, specify a name for your model by replacing MODEL_NAME in the following sample, and specify a region by replacing REGION with a region that supports legacy (MLS1) machine types:
{ "name": "MODEL_NAME", "regions": ["REGION"] }
Make a REST API call to the following URL, replacing PROJECT_ID with your Google Cloud project ID:
POST https://ml.googleapis.com/v1/projects/PROJECT_ID/models/
For example, you can make the following request using the
curl
command. This command authorizes the request using the credentials associated with your Google Cloud CLI installation.curl -X POST -H "Content-Type: application/json" \ -d '{"name": "MODEL_NAME", "regions": ["REGION"]}' \ -H "Authorization: Bearer `gcloud auth print-access-token`" \ "https://ml.googleapis.com/v1/projects/PROJECT_ID/models"
The API returns a response similar to the following:
{ "name": "projects/PROJECT_ID/models/MODEL_NAME", "regions": [ "REGION" ] }
See the AI Platform Prediction model API for more details.
Create a model version
Now you are ready to create a model version with the trained model you previously uploaded to Cloud Storage. When you create a version, you can specify a number of parameters. The following list describes common parameters, some of which are required:
name
: must be unique within the AI Platform Prediction model.deploymentUri
: the path to your model directory in Cloud Storage.- If you're deploying a TensorFlow model, this is a SavedModel directory.
- If you're deploying a scikit-learn or XGBoost model,
this is the directory containing your
model.joblib
,model.pkl
, ormodel.bst
file. - If you're deploying a custom prediction routine, this is the directory containing all your model artifacts. The total size of this directory must be 500 MB or less.
framework
:TENSORFLOW
,SCIKIT_LEARN
, orXGBOOST
.runtimeVersion
: a runtime version based on the dependencies your model needs. If you're deploying a scikit-learn model or an XGBoost model, this must be at least 1.4. If you plan to use the model version for batch prediction, then you must use runtime version 2.1 or earlier.pythonVersion
: must be set to "3.5" (for runtime versions 1.4 through 1.14) or "3.7" (for runtime versions 1.15 and later) to be compatible with model files exported using Python 3. Can also be set to "2.7" if used with runtime version 1.15 or earlier.machineType
(optional): the type of virtual machine that AI Platform Prediction uses for the nodes that serve predictions. Learn more about machine types. If not set, this defaults ton1-standard-2
on regional endpoints andmls1-c1-m2
on the global endpoint.
See more information about each of these parameters, as well as additional less common parameters, in the API reference for the version resource.
Additionally, if you created your model on a regional endpoint, make sure to also create the version on the same regional endpoint.
console
Open the AI Platform Prediction Models page in the Google Cloud console:
On the Models page, select the name of the model resource you would like to use to create your version. This brings you to the Model Details page.
Click the New Version button at the top of the Model Details page. This brings you to the Create version page.
Enter your version name in the Name field. Optionally, enter a description for your version in the Description field.
Enter the following information about how you trained your model in the corresponding dropdown boxes:
- Select the Python version you used to train your model.
- Select the Framework and Framework version.
- Select the ML runtime version. Learn more about AI Platform Prediction runtime versions.
Select a Machine type to run online prediction.
In the Model URI field, enter the Cloud Storage bucket location where you uploaded your model file. You may use the Browse button to find the correct path.
Make sure to specify the path to the directory containing the file, not the path to the model file itself. For example, use
gs://your_bucket_name/model-dir/
instead ofgs://your_bucket_name/model-dir/saved_model.pb
orgs://your_bucket_name/model-dir/model.pkl
.Select a Scaling option for online prediction deployment:
If you select "Auto scaling", the optional Minimum number of nodes field displays. You can enter the minimum number of nodes to keep running at all times, when the service has scaled down.
If you select "Manual scaling", you must enter the Number of nodes you want to keep running at all times.
Learn how scaling options differ depending on machine type.
Learn more about pricing for prediction costs.
To finish creating your model version, click Save.
gcloud
Set environment variables to store the path to the Cloud Storage directory where your model binary is located, your model name, your version name and your framework choice.
When you create a version with the gcloud CLI, you may provide the framework name in capital letters with underscores (for example,
SCIKIT_LEARN
) or in lowercase letters with hyphens (for example,scikit-learn
). Both options lead to identical behavior.Replace
[VALUES_IN_BRACKETS]
with the appropriate values:MODEL_DIR="gs://your_bucket_name/" VERSION_NAME="[YOUR-VERSION-NAME]" MODEL_NAME="[YOUR-MODEL-NAME]" FRAMEWORK="[YOUR-FRAMEWORK_NAME]"
Create the version:
gcloud ai-platform versions create $VERSION_NAME \ --model=$MODEL_NAME \ --origin=$MODEL_DIR \ --runtime-version=2.11 \ --framework=$FRAMEWORK \ --python-version=3.7 \ --region=REGION \ --machine-type=MACHINE_TYPE
Replace the following:
REGION: The region of the regional endpoint on which you created the model. If you created the model on the global endpoint, omit the
--region
flag.MACHINE_TYPE: A machine type, determining the computing resources available to your prediction nodes.
Creating the version takes a few minutes. When it is ready, you should see the following output:
Creating version (this might take a few minutes)......done.
Get information about your new version:
gcloud ai-platform versions describe $VERSION_NAME \ --model=$MODEL_NAME
You should see output similar to this:
createTime: '2018-02-28T16:30:45Z' deploymentUri: gs://your_bucket_name framework: [YOUR-FRAMEWORK-NAME] machineType: mls1-c1-m2 name: projects/[YOUR-PROJECT-ID]/models/[YOUR-MODEL-NAME]/versions/[YOUR-VERSION-NAME] pythonVersion: '3.7' runtimeVersion: '2.11' state: READY
REST API
Format your request body to contain the version object. This example specifies the version
name
,deploymentUri
,runtimeVersion
,framework
andmachineType
. Replace[VALUES_IN_BRACKETS]
with the appropriate values:{ "name": "[YOUR-VERSION-NAME]", "deploymentUri": "gs://your_bucket_name/", "runtimeVersion": "2.11", "framework": "[YOUR_FRAMEWORK_NAME]", "pythonVersion": "3.7", "machineType": "[YOUR_MACHINE_TYPE]" }
Make your REST API call to the following path, replacing
[VALUES_IN_BRACKETS]
with the appropriate values:POST https://REGION-ml.googleapis.com/v1/projects/[YOUR-PROJECT-ID]/models/[YOUR-MODEL-NAME]/versions
Replace REGION with the region of the regional endpoint where you created your model. If you created your model on the global endpoint, use
ml.googleapis.com
.For example, you can make the following request using the
curl
command:curl -X POST -H "Content-Type: application/json" \ -d '{"name": "[YOUR-VERSION-NAME]", "deploymentUri": "gs://your_bucket_name/", "runtimeVersion": "2.11", "framework": "[YOUR_FRAMEWORK_NAME]", "pythonVersion": "3.7", "machineType": "[YOUR_MACHINE_TYPE]"}' \ -H "Authorization: Bearer `gcloud auth print-access-token`" \ "https://REGION-ml.googleapis.com/v1/projects/[YOUR-PROJECT-ID]/models/[YOUR-MODEL-NAME]/versions"
Creating the version takes a few minutes. When it is ready, you should see output similar to this:
{ "name": "projects/[YOUR-PROJECT-ID]/operations/create_[YOUR-MODEL-NAME]_[YOUR-VERSION-NAME]-[TIMESTAMP]", "metadata": { "@type": "type.googleapis.com/google.cloud.ml.v1.OperationMetadata", "createTime": "2018-07-07T02:51:50Z", "operationType": "CREATE_VERSION", "modelName": "projects/[YOUR-PROJECT-ID]/models/[YOUR-MODEL-NAME]", "version": { "name": "projects/[YOUR-PROJECT-ID]/models/[YOUR-MODEL-NAME]/versions/[YOUR-VERSION-NAME]", "deploymentUri": "gs://your_bucket_name", "createTime": "2018-07-07T02:51:49Z", "runtimeVersion": "2.11", "framework": "[YOUR_FRAMEWORK_NAME]", "machineType": "[YOUR_MACHINE_TYPE]", "pythonVersion": "3.7" } } }
Send online prediction request
After you have successfully created a version, AI Platform Prediction starts a new server that is ready to serve prediction requests.
This section demonstrates the following:
- How to test your model with
gcloud
by sending requests for smaller datasets. - How to send larger requests for the full test dataset by using the Python client library, and view the first ten results.
gcloud
This section explains how to send a prediction request using the
input.json
file you created in
a previous step.
Set environment variables for your model name, version name, and the name of your input file. Replace
[VALUES_IN_BRACKETS]
with the appropriate values:MODEL_NAME="[YOUR-MODEL-NAME]" VERSION_NAME="[YOUR-VERSION-NAME]" INPUT_FILE="input.json"
Send the prediction request:
gcloud ai-platform predict --model $MODEL_NAME --version \ $VERSION_NAME --json-instances $INPUT_FILE
The prediction results return
True
if the person's income is predicted to be greater than $50,000 per year, andFalse
otherwise. As an example, your first ten results may appear similar to the following:[False, False, False, True, False, False, False, False, False, False]
REST API
This section explains how to send a prediction request using the
input.json
file you created in
the previous step.
Send the prediction request(s):
curl -X POST -H "Content-Type: application/json" -d @input.json \
-H "Authorization: Bearer `gcloud auth print-access-token`" \
"https://ml.googleapis.com/v1/projects/${PROJECT_ID}/models/${MODEL_NAME}/versions/${VERSION_NAME}:predict"
The prediction results return True
if the person's income is predicted
to be greater than $50,000 per year, and False
otherwise. The
prediction results display in the console as a list of boolean values.
As an example, your first ten results may appear similar to the
following:
{"predictions": [false, false, false, true, false, false, false, false, false, false]}
Python
This sample uses the Python client library to send prediction requests for the entire Census dataset, and prints out the first ten results. See more information about how to use the Python Client Library.
Replace [VALUES_IN_BRACKETS]
with the appropriate values:
import googleapiclient.discovery
# Fill in your PROJECT_ID, VERSION_NAME and MODEL_NAME before running
# this code.
PROJECT_ID = [YOUR PROJECT_ID HERE]
VERSION_NAME = [YOUR VERSION_NAME HERE]
MODEL_NAME = [YOUR MODEL_NAME HERE]
service = googleapiclient.discovery.build('ml', 'v1')
name = 'projects/{}/models/{}'.format(PROJECT_ID, MODEL_NAME)
name += '/versions/{}'.format(VERSION_NAME)
# Due to the size of the data, it needs to be split in 2
first_half = test_features[:int(len(test_features)/2)]
second_half = test_features[int(len(test_features)/2):]
complete_results = []
for data in [first_half, second_half]:
responses = service.projects().predict(
name=name,
body={'instances': data}
).execute()
if 'error' in responses:
print(response['error'])
else:
complete_results.extend(responses['predictions'])
# Print the first 10 responses
for i, response in enumerate(complete_results[:10]):
print('Prediction: {}\tLabel: {}'.format(response, test_labels[i]))
The prediction results return True
if the person's income is predicted
to be greater than $50,000 per year, and False
otherwise. As an example,
your first ten results may appear similar to the following:
Prediction: False Label: False
Prediction: False Label: False
Prediction: True Label: True
Prediction: True Label: True
Prediction: False Label: False
Prediction: False Label: False
Prediction: False Label: False
Prediction: True Label: True
Prediction: False Label: False
Prediction: False Label: False
See more information about each input parameter in the AI Platform Prediction API Predict Request details.
About the data
The Census Income Data Set that this sample uses for training is hosted by the UC Irvine Machine Learning Repository.
Census data courtesy of: Lichman, M. (2013). UCI Machine Learning Repository http://archive.ics.uci.edu/ml. Irvine, CA: University of California, School of Information and Computer Science. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://archive.ics.uci.edu/ml - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
What's next
- Try out this tutorial as a Jupyter notebook on GitHub.
- See more sample scikit-learn notebooks on GitHub.