During the process of training and deploying models and getting predictions, you need to manage resources on Google Cloud Platform. This page describes how to work with models, versions, and jobs.
Naming AI Platform Prediction resources
You must specify a name for every job you create. The rules for naming are consistent across all three types of resources. Each name:
- May only contain letters, numbers, and underscores.
- Is case-sensitive.
- Must start with a letter.
- Must be no more than 128 characters long.
- Must be unique within its namespace (your project for models and jobs, the parent model for versions).
You should create names that are easy to distinguish in lists of resources, such as job logs. Here are some suggestions:
- Name all jobs for the same model using the model name and a job index (the timestamp when the job is created works well).
- Name your models so that they are easily identified by the dataset they
use (
census_wide_deep
is usually better thanmy_new_model
, for example). - Versions are best if easily readable. Instead of using a timestamp or a
similar unique value, we recommend using simple version designators like
v1
.
Managing models
Your model resources in AI Platform Prediction are logical containers for individual implementations of your machine learning model. They are the simplest resources to work with because they have no complex operations or additional resources to allocate and maintain.
The following table summarizes the model operations and lists the interfaces you can use to perform them:
Operation | Interfaces | Notes |
---|---|---|
create |
projects.models.create
|
|
gcloud
ai-platform models create
|
||
Create Model on the AI Platform Prediction Models page. | ||
delete |
projects.models.delete
|
Deleting a model is a long-running operation. The model must have no versions associated with it before you can delete it. |
gcloud
ai-platform models delete
|
||
Delete in the Models list, or on the Model details page. | ||
get |
projects.models.get
|
The information you get is described in the
|
gcloud
ai-platform models describe
|
||
Model details page (enter with a link from the Models list. | ||
list |
projects.models.list
|
|
gcloud
ai-platform models list
|
||
AI Platform Prediction Models page. |
Managing versions
Your versions are specific iterations of your models. The core of a model version is a TensorFlow SavedModel.
The following table summarizes the version operations and lists the interfaces you can use to perform them:
Operation | Interfaces | Notes |
---|---|---|
create |
projects.models.versions.create
|
Creating a version is deploying a SavedModel to AI Platform Prediction. Refer to the model deployment guide for more information. |
gcloud
ai-platform versions create
|
||
Create Version on the Model details page (enter with a link from the Models list). | ||
delete |
projects.models.versions.delete
|
Deleting a version is a long-running operation. You cannot delete the default version of a model unless it is the only version assigned to that model. |
gcloud
ai-platform versions delete
|
||
Delete in the Versions list on the Model details page. | ||
get |
projects.models.versions.get
|
The information you get is described in the
|
gcloud
ai-platform versions describe
|
||
Version details page (from a link in the Versions list on the Model details page. | ||
list |
projects.models.versions.list
|
|
gcloud
ai-platform versions list
|
||
Versions list on the Model details page. | ||
setDefault |
projects.models.versions.setDefault
|
This is the only way to assign a new default version for a model; after the first, creating a version doesn't make the new version the default. |
gcloud
ai-platform versions set-default
|
||
Set as default on the Versions list on the Model details page. |
Managing jobs
AI Platform Prediction supports two types of jobs: training and batch prediction. The details for each are different, but the basic operation is the same.
The following table summarizes the job operations and lists the interfaces you can use to perform them:
Operation | Interfaces | Notes |
---|---|---|
create |
projects.jobs.create
|
Creating a job is described in detail in the training and batch prediction guides. |
No console implementation. | ||
cancel |
projects.jobs.cancel
|
Cancels a running job. |
Cancel on the Job details page. | ||
get |
projects.jobs.get
|
The information you get is described in the
Jobs resource reference.
|
Job details page (enter with a link from the Jobs list). | ||
list |
projects.jobs.list
|
Only jobs created in the last 90 days will be displayed. |
Jobs list. |
Handling asynchronous operations
Most of the AI Platform Prediction resource management operations return as quickly as possible, and provide a complete response. However, there are two kinds of asynchronous operations that you should understand: jobs and long-running operations.
When you start an asynchronous operation, you usually want to know when it completes. The process for getting status is different for jobs and long-running operations:
Getting the status of a job
You can use projects.jobs.get
to get the status of a job. This method is also provided as
gcloud ai-platform jobs describe
and in the Jobs page in the
Google Cloud console. Regardless of how you get the status, the information is based on the
members of the
Job resource. You'll know the
job is complete when Job.state
in the response is equal to one of these values:
SUCCEEDED
FAILED
CANCELLED
Getting the status of a long-running operation
AI Platform Prediction has three long-running operations:
- Creating a version
- Deleting a model
- Deleting a version
Of the long-running operations, only creating a version is likely to take much time to complete. Deleting models and versions is typically accomplished in near real time.
If you create a version by using the Google Cloud CLI or the Google Cloud console the interface automatically informs you when the operation is complete. If you create a version with the API, you can track the status of the operation yourself:
Get the service-assigned operation name from the Operation object in the response to your call to projects.models.versions.create. The key for the name value is
"name"
.Use projects.operations.get to periodically poll the status of the operation.
Use the operation name from the first step to form a name string of the form:
'projects/my_project/operations/operation_name'
The response message contains an Operation object.
Get the value for the
"done"
key. This is a Boolean indicator of operation completion. It is true if the operation is complete.
The Operation object will include one of two keys on completion:
The
"response"
key is present if the operation was successful. Its value should be google.protobuf.Empty, as none of the AI Platform Prediction long-running operations have response objects.The
"error"
key is present if there was an error. Its value is a Status object.
What's next
- Train a model.
- Learn about using labels to organize your resources.