REST Resource: projects.jobs

Resource: Job

Represents a training or prediction job.

JSON representation
{
  "jobId": string,
  "createTime": string,
  "startTime": string,
  "endTime": string,
  "state": enum(State),
  "errorMessage": string,

  // Union field input can be only one of the following:
  "trainingInput": {
    object(TrainingInput)
  },
  "predictionInput": {
    object(PredictionInput)
  },
  // End of list of possible types for union field input.

  // Union field output can be only one of the following:
  "trainingOutput": {
    object(TrainingOutput)
  },
  "predictionOutput": {
    object(PredictionOutput)
  },
  // End of list of possible types for union field output.
}
Fields
jobId

string

Required. The user-specified id of the job.

createTime

string (Timestamp format)

Output only. When the job was created.

A timestamp in RFC3339 UTC "Zulu" format, accurate to nanoseconds. Example: "2014-10-02T15:01:23.045123456Z".

startTime

string (Timestamp format)

Output only. When the job processing was started.

A timestamp in RFC3339 UTC "Zulu" format, accurate to nanoseconds. Example: "2014-10-02T15:01:23.045123456Z".

endTime

string (Timestamp format)

Output only. When the job processing was completed.

A timestamp in RFC3339 UTC "Zulu" format, accurate to nanoseconds. Example: "2014-10-02T15:01:23.045123456Z".

state

enum(State)

Output only. The detailed state of a job.

errorMessage

string

Output only. The details of a failure or a cancellation.

Union field input. Required. Parameters to create a job. input can be only one of the following:
trainingInput

object(TrainingInput)

Input parameters to create a training job.

predictionInput

object(PredictionInput)

Input parameters to create a prediction job.

Union field output. Output only. The current result of the job. output can be only one of the following:
trainingOutput

object(TrainingOutput)

The current training job result.

predictionOutput

object(PredictionOutput)

The current prediction job result.

TrainingInput

Represents input parameters for a training job.

JSON representation
{
  "scaleTier": enum(ScaleTier),
  "masterType": string,
  "workerType": string,
  "parameterServerType": string,
  "workerCount": string,
  "parameterServerCount": string,
  "packageUris": [
    string
  ],
  "pythonModule": string,
  "args": [
    string
  ],
  "hyperparameters": {
    object(HyperparameterSpec)
  },
  "region": string,
  "jobDir": string,
  "runtimeVersion": string,
}
Fields
scaleTier

enum(ScaleTier)

Required. Specifies the machine types, the number of replicas for workers and parameter servers.

masterType

string

Optional. Specifies the type of virtual machine to use for your training job's master worker.

The following types are supported:

standard
A basic machine configuration suitable for training simple models with small to moderate datasets.
large_model
A machine with a lot of memory, specially suited for parameter servers when your model is large (having many hidden layers or layers with very large numbers of nodes).
complex_model_s
A machine suitable for the master and workers of the cluster when your model requires more computation than the standard machine can handle satisfactorily.
complex_model_m
A machine with roughly twice the number of cores and roughly double the memory of

complex_model_s.

complex_model_l
A machine with roughly twice the number of cores and roughly double the memory of complex_model_m

.

standard_gpu
A machine equivalent to

standard that also includes a single NVIDIA Tesla K80 GPU. See more about using GPUs for training your model.

complex_model_m_gpu
A machine equivalent to complex_model_m

that also includes four NVIDIA Tesla K80 GPUs.

complex_model_l_gpu
A machine equivalent to

complex_model_l that also includes eight NVIDIA Tesla K80 GPUs.

standard_p100
A machine equivalent to standard

that also includes a single NVIDIA Tesla P100 GPU. The availability of these GPUs is in the Alpha launch stage.

complex_model_m_p100
A machine equivalent to

complex_model_m

that also includes four NVIDIA Tesla P100 GPUs. The availability of these GPUs is in the Alpha launch stage.

You must set this value when scaleTier is set to CUSTOM.

workerType

string

Optional. Specifies the type of virtual machine to use for your training job's worker nodes.

The supported values are the same as those described in the entry for masterType.

This value must be present when scaleTier is set to CUSTOM and workerCount is greater than zero.

parameterServerType

string

Optional. Specifies the type of virtual machine to use for your training job's parameter server.

The supported values are the same as those described in the entry for masterType.

This value must be present when scaleTier is set to CUSTOM and parameterServerCount is greater than zero.

workerCount

string (int64 format)

Optional. The number of worker replicas to use for the training job. Each replica in the cluster will be of the type specified in workerType.

This value can only be used when scaleTier is set to CUSTOM. If you set this value, you must also set workerType.

parameterServerCount

string (int64 format)

Optional. The number of parameter server replicas to use for the training job. Each replica in the cluster will be of the type specified in parameterServerType.

This value can only be used when scaleTier is set to CUSTOM.If you set this value, you must also set parameterServerType.

packageUris[]

string

Required. The Google Cloud Storage location of the packages with the training program and any additional dependencies. The maximum number of package URIs is 100.

pythonModule

string

Required. The Python module name to run after installing the packages.

args[]

string

Optional. Command line arguments to pass to the program.

hyperparameters

object(HyperparameterSpec)

Optional. The set of Hyperparameters to tune.

region

string

Required. The Google Compute Engine region to run the training job in.

jobDir

string

Optional. A Google Cloud Storage path in which to store training outputs and other data needed for training. This path is passed to your TensorFlow program as the 'jobDir' command-line argument. The benefit of specifying this field is that Cloud ML validates the path for use in training.

runtimeVersion

string

Optional. The Google Cloud ML runtime version to use for training. If not set, Google Cloud ML will choose the latest stable version.

ScaleTier

A scale tier is an abstract representation of the resources Cloud ML will allocate to a training job. When selecting a scale tier for your training job, you should consider the size of your training dataset and the complexity of your model. As the tiers increase, virtual machines are added to handle your job, and the individual machines in the cluster generally have more memory and greater processing power than they do at lower tiers. The number of training units charged per hour of processing increases as tiers get more advanced. Refer to the pricing guide for more details. Note that in addition to incurring costs, your use of training resources is constrained by the quota policy.

Enums
BASIC A single worker instance. This tier is suitable for learning how to use Cloud ML, and for experimenting with new models using small datasets.
STANDARD_1 Many workers and a few parameter servers.
PREMIUM_1 A large number of workers with many parameter servers.
BASIC_GPU A single worker instance with a GPU.
BASIC_TPU A single worker instance with a Cloud TPU
CUSTOM

The CUSTOM tier is not a set tier, but rather enables you to use your own cluster specification. When you use this tier, set values to configure your processing cluster according to these guidelines:

  • You must set TrainingInput.masterType to specify the type of machine to use for your master node. This is the only required setting.

  • You may set TrainingInput.workerCount to specify the number of workers to use. If you specify one or more workers, you must also set TrainingInput.workerType to specify the type of machine to use for your worker nodes.

  • You may set TrainingInput.parameterServerCount to specify the number of parameter servers to use. If you specify one or more parameter servers, you must also set TrainingInput.parameterServerType to specify the type of machine to use for your parameter servers.

Note that all of your workers must use the same machine type, which can be different from your parameter server type and master type. Your parameter servers must likewise use the same machine type, which can be different from your worker type and master type.

HyperparameterSpec

Represents a set of hyperparameters to optimize.

JSON representation
{
  "goal": enum(GoalType),
  "params": [
    {
      object(ParameterSpec)
    }
  ],
  "maxTrials": number,
  "maxParallelTrials": number,
  "hyperparameterMetricTag": string,
}
Fields
goal

enum(GoalType)

Required. The type of goal to use for tuning. Available types are MAXIMIZE and MINIMIZE.

Defaults to MAXIMIZE.

params[]

object(ParameterSpec)

Required. The set of parameters to tune.

maxTrials

number

Optional. How many training trials should be attempted to optimize the specified hyperparameters.

Defaults to one.

maxParallelTrials

number

Optional. The number of training trials to run concurrently. You can reduce the time it takes to perform hyperparameter tuning by adding trials in parallel. However, each trail only benefits from the information gained in completed trials. That means that a trial does not get access to the results of trials running at the same time, which could reduce the quality of the overall optimization.

Each trial will use the same scale tier and machine types.

Defaults to one.

hyperparameterMetricTag

string

Optional. The Tensorflow summary tag name to use for optimizing trials. For current versions of Tensorflow, this tag name should exactly match what is shown in Tensorboard, including all scopes. For versions of Tensorflow prior to 0.12, this should be only the tag passed to tf.Summary. By default, "training/hptuning/metric" will be used.

GoalType

The available types of optimization goals.

Enums
GOAL_TYPE_UNSPECIFIED Goal Type will default to maximize.
MAXIMIZE Maximize the goal metric.
MINIMIZE Minimize the goal metric.

ParameterSpec

Represents a single hyperparameter to optimize.

JSON representation
{
  "parameterName": string,
  "type": enum(ParameterType),
  "minValue": number,
  "maxValue": number,
  "categoricalValues": [
    string
  ],
  "discreteValues": [
    number
  ],
  "scaleType": enum(ScaleType),
}
Fields
parameterName

string

Required. The parameter name must be unique amongst all ParameterConfigs in a HyperparameterSpec message. E.g., "learning_rate".

type

enum(ParameterType)

Required. The type of the parameter.

minValue

number

Required if type is DOUBLE or INTEGER. This field should be unset if type is CATEGORICAL. This value should be integers if type is INTEGER.

maxValue

number

Required if typeis DOUBLE or INTEGER. This field should be unset if type is CATEGORICAL. This value should be integers if type is INTEGER.

categoricalValues[]

string

Required if type is CATEGORICAL. The list of possible categories.

discreteValues[]

number

Required if type is DISCRETE. A list of feasible points. The list should be in strictly increasing order. For instance, this parameter might have possible settings of 1.5, 2.5, and 4.0. This list should not contain more than 1,000 values.

scaleType

enum(ScaleType)

Optional. How the parameter should be scaled to the hypercube. Leave unset for categorical parameters. Some kind of scaling is strongly recommended for real or integral parameters (e.g., UNIT_LINEAR_SCALE).

ParameterType

The type of the parameter.

Enums
PARAMETER_TYPE_UNSPECIFIED You must specify a valid type. Using this unspecified type will result in an error.
DOUBLE Type for real-valued parameters.
INTEGER Type for integral parameters.
CATEGORICAL The parameter is categorical, with a value chosen from the categories field.
DISCRETE The parameter is real valued, with a fixed set of feasible points. If type==DISCRETE, feasible_points must be provided, and {minValue, maxValue} will be ignored.

ScaleType

The type of scaling that should be applied to this parameter.

Enums
NONE By default, no scaling is applied.
UNIT_LINEAR_SCALE Scales the feasible space to (0, 1) linearly.
UNIT_LOG_SCALE Scales the feasible space logarithmically to (0, 1). The entire feasible space must be strictly positive.
UNIT_REVERSE_LOG_SCALE Scales the feasible space "reverse" logarithmically to (0, 1). The result is that values close to the top of the feasible space are spread out more than points near the bottom. The entire feasible space must be strictly positive.

PredictionInput

Represents input parameters for a prediction job.

JSON representation
{
  "dataFormat": enum(DataFormat),
  "inputPaths": [
    string
  ],
  "outputPath": string,
  "maxWorkerCount": string,
  "region": string,
  "runtimeVersion": string,
  "batchSize": string,

  // Union field model_version can be only one of the following:
  "modelName": string,
  "versionName": string,
  "uri": string,
  // End of list of possible types for union field model_version.
}
Fields
dataFormat

enum(DataFormat)

Required. The format of the input data files.

inputPaths[]

string

Required. The Google Cloud Storage location of the input data files. May contain wildcards.

outputPath

string

Required. The output Google Cloud Storage location.

maxWorkerCount

string (int64 format)

Optional. The maximum number of workers to be used for parallel processing. Defaults to 10 if not specified.

region

string

Required. The Google Compute Engine region to run the prediction job in.

runtimeVersion

string

Optional. The Google Cloud ML runtime version to use for this batch prediction. If not set, Google Cloud ML will pick the runtime version used during the versions.create request for this model version, or choose the latest stable version when model version information is not available such as when the model is specified by uri.

batchSize

string (int64 format)

Optional. Number of records per batch, defaults to 64. The service will buffer batchSize number of records in memory before invoking one Tensorflow prediction call internally. So take the record size and memory available into consideration when setting this parameter.

Union field model_version. Required. The model or the version to use for prediction. model_version can be only one of the following:
modelName

string

Use this field if you want to use the default version for the specified model. The string must use the following format:

"projects/<var>[YOUR_PROJECT]</var>/models/<var>[YOUR_MODEL]</var>"

versionName

string

Use this field if you want to specify a version of the model to use. The string is formatted the same way as model_version, with the addition of the version information:

"projects/<var>[YOUR_PROJECT]</var>/models/<var>YOUR_MODEL/versions/<var>[YOUR_VERSION]</var>"

uri

string

Use this field if you want to specify a Google Cloud Storage path for the model to use.

DataFormat

The format used to separate data instances in the source files.

Enums
DATA_FORMAT_UNSPECIFIED Unspecified format.
TEXT The source file is a text file with instances separated by the new-line character.
TF_RECORD The source file is a TFRecord file.
TF_RECORD_GZIP The source file is a GZIP-compressed TFRecord file.

State

Describes the job state.

Enums
STATE_UNSPECIFIED The job state is unspecified.
QUEUED The job has been just created and processing has not yet begun.
PREPARING The service is preparing to run the job.
RUNNING The job is in progress.
SUCCEEDED The job completed successfully.
FAILED The job failed. errorMessage should contain the details of the failure.
CANCELLING The job is being cancelled. errorMessage should describe the reason for the cancellation.
CANCELLED The job has been cancelled. errorMessage should describe the reason for the cancellation.

TrainingOutput

Represents results of a training job. Output only.

JSON representation
{
  "completedTrialCount": string,
  "trials": [
    {
      object(HyperparameterOutput)
    }
  ],
  "consumedMlUnits": number,
  "isHyperparameterTuningJob": boolean,
}
Fields
completedTrialCount

string (int64 format)

The number of hyperparameter tuning trials that completed successfully. Only set for hyperparameter tuning jobs.

trials[]

object(HyperparameterOutput)

Results for individual Hyperparameter trials. Only set for hyperparameter tuning jobs.

consumedMlUnits

number

The amount of ML units consumed by the job.

isHyperparameterTuningJob

boolean

Whether this job is a hyperparameter tuning job.

HyperparameterOutput

Represents the result of a single hyperparameter tuning trial from a training job. The TrainingOutput object that is returned on successful completion of a training job with hyperparameter tuning includes a list of HyperparameterOutput objects, one for each successful trial.

JSON representation
{
  "trialId": string,
  "hyperparameters": {
    string: string,
    ...
  },
  "finalMetric": {
    object(HyperparameterMetric)
  },
  "allMetrics": [
    {
      object(HyperparameterMetric)
    }
  ],
}
Fields
trialId

string

The trial id for these results.

hyperparameters

map (key: string, value: string)

The hyperparameters given to this trial.

An object containing a list of "key": value pairs. Example: { "name": "wrench", "mass": "1.3kg", "count": "3" }.

finalMetric

object(HyperparameterMetric)

The final objective metric seen for this trial.

allMetrics[]

object(HyperparameterMetric)

All recorded object metrics for this trial. This field is not currently populated.

HyperparameterMetric

An observed value of a metric.

JSON representation
{
  "trainingStep": string,
  "objectiveValue": number,
}
Fields
trainingStep

string (int64 format)

The global training step for this metric.

objectiveValue

number

The objective value at this training step.

PredictionOutput

Represents results of a prediction job.

JSON representation
{
  "outputPath": string,
  "predictionCount": string,
  "errorCount": string,
  "nodeHours": number,
}
Fields
outputPath

string

The output Google Cloud Storage location provided at the job creation time.

predictionCount

string (int64 format)

The number of generated predictions.

errorCount

string (int64 format)

The number of data instances which resulted in errors.

nodeHours

number

Node hours used by the batch prediction job.

Methods

cancel

Cancels a running job.

create

Creates a training or a batch prediction job.

get

Describes a job.

getIamPolicy

Gets the access control policy for a resource.

list

Lists the jobs in the project.

setIamPolicy

Sets the access control policy on the specified resource.

testIamPermissions

Returns permissions that a caller has on the specified resource.

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...

Cloud Machine Learning Engine (Cloud ML Engine)