- Resource: Job
- TrainingInput
- ScaleTier
- ReplicaConfig
- HyperparameterSpec
- GoalType
- ParameterSpec
- ParameterType
- ScaleType
- Algorithm
- PredictionInput
- DataFormat
- State
- TrainingOutput
- HyperparameterOutput
- HyperparameterMetric
- BuiltInAlgorithmOutput
- PredictionOutput
- Methods
Resource: Job
Represents a training or prediction job.
JSON representation | |
---|---|
{ "jobId": string, "createTime": string, "startTime": string, "endTime": string, "state": enum ( |
Fields | ||
---|---|---|
jobId |
Required. The user-specified id of the job. |
|
createTime |
Output only. When the job was created. A timestamp in RFC3339 UTC "Zulu" format, accurate to nanoseconds. Example: |
|
startTime |
Output only. When the job processing was started. A timestamp in RFC3339 UTC "Zulu" format, accurate to nanoseconds. Example: |
|
endTime |
Output only. When the job processing was completed. A timestamp in RFC3339 UTC "Zulu" format, accurate to nanoseconds. Example: |
|
state |
Output only. The detailed state of a job. |
|
errorMessage |
Output only. The details of a failure or a cancellation. |
|
labels |
Optional. One or more labels that you can add, to organize your jobs. Each label is a key-value pair, where both the key and the value are arbitrary strings that you supply. For more information, see the documentation on using labels. An object containing a list of |
|
etag |
A base64-encoded string. |
|
Union field input . Required. Parameters to create a job. input can be only one of the following: |
||
trainingInput |
Input parameters to create a training job. |
|
predictionInput |
Input parameters to create a prediction job. |
|
Union field output . Output only. The current result of the job. output can be only one of the following: |
||
trainingOutput |
The current training job result. |
|
predictionOutput |
The current prediction job result. |
TrainingInput
Represents input parameters for a training job. When using the gcloud command to submit your training job, you can specify the input parameters as command-line arguments and/or in a YAML configuration file referenced from the --config command-line argument. For details, see the guide to submitting a training job.
JSON representation | |
---|---|
{ "scaleTier": enum ( |
Fields | |
---|---|
scaleTier |
Required. Specifies the machine types, the number of replicas for workers and parameter servers. |
masterType |
Optional. Specifies the type of virtual machine to use for your training job's master worker. You must specify this field when You can use certain Compute Engine machine types directly in this field. The following types are supported:
Learn more about using Compute Engine machine types. Alternatively, you can use the following legacy machine types:
Learn more about using legacy machine types. Finally, if you want to use a TPU for training, specify |
masterConfig |
Optional. The configuration for your master worker. You should only set Set |
workerType |
Optional. Specifies the type of virtual machine to use for your training job's worker nodes. The supported values are the same as those described in the entry for This value must be consistent with the category of machine type that If you use This value must be present when |
workerConfig |
Optional. The configuration for workers. You should only set Set |
parameterServerType |
Optional. Specifies the type of virtual machine to use for your training job's parameter server. The supported values are the same as those described in the entry for This value must be consistent with the category of machine type that This value must be present when |
parameterServerConfig |
Optional. The configuration for parameter servers. You should only set Set |
workerCount |
Optional. The number of worker replicas to use for the training job. Each replica in the cluster will be of the type specified in This value can only be used when The default value is zero. |
parameterServerCount |
Optional. The number of parameter server replicas to use for the training job. Each replica in the cluster will be of the type specified in This value can only be used when The default value is zero. |
packageUris[] |
Required. The Google Cloud Storage location of the packages with the training program and any additional dependencies. The maximum number of package URIs is 100. |
pythonModule |
Required. The Python module name to run after installing the packages. |
args[] |
Optional. Command line arguments to pass to the program. |
hyperparameters |
Optional. The set of Hyperparameters to tune. |
region |
Required. The Google Compute Engine region to run the training job in. See the available regions for AI Platform services. |
jobDir |
Optional. A Google Cloud Storage path in which to store training outputs and other data needed for training. This path is passed to your TensorFlow program as the '--job-dir' command-line argument. The benefit of specifying this field is that Cloud ML validates the path for use in training. |
runtimeVersion |
Optional. The AI Platform runtime version to use for training. If not set, AI Platform uses the default stable version, 1.0. For more information, see the runtime version list and how to manage runtime versions. |
pythonVersion |
Optional. The version of Python used in training. If not set, the default version is '2.7'. Python '3.5' is available when |
ScaleTier
A scale tier is an abstract representation of the resources Cloud ML will allocate to a training job. When selecting a scale tier for your training job, you should consider the size of your training dataset and the complexity of your model. As the tiers increase, virtual machines are added to handle your job, and the individual machines in the cluster generally have more memory and greater processing power than they do at lower tiers. The number of training units charged per hour of processing increases as tiers get more advanced. Refer to the pricing guide for more details. Note that in addition to incurring costs, your use of training resources is constrained by the quota policy.
Enums | |
---|---|
BASIC |
A single worker instance. This tier is suitable for learning how to use Cloud ML, and for experimenting with new models using small datasets. |
STANDARD_1 |
Many workers and a few parameter servers. |
PREMIUM_1 |
A large number of workers with many parameter servers. |
BASIC_GPU |
A single worker instance with a GPU. |
BASIC_TPU |
A single worker instance with a Cloud TPU. |
CUSTOM |
The CUSTOM tier is not a set tier, but rather enables you to use your own cluster specification. When you use this tier, set values to configure your processing cluster according to these guidelines:
Note that all of your workers must use the same machine type, which can be different from your parameter server type and master type. Your parameter servers must likewise use the same machine type, which can be different from your worker type and master type. |
ReplicaConfig
Represents the configuration for a replica in a cluster.
JSON representation | |
---|---|
{
"acceleratorConfig": {
object ( |
Fields | |
---|---|
acceleratorConfig |
Represents the type and number of accelerators used by the replica. Learn about restrictions on accelerator configurations for training. |
imageUri |
The Docker image to run on the replica. This image must be in Container Registry. Learn more about configuring custom containers. |
tpuTfVersion |
The AI Platform runtime version that includes a TensorFlow version matching the one used in the custom container. This field is required if the replica is a TPU worker that uses a custom container. Otherwise, do not specify this field. This must be a runtime version that currently supports training with TPUs. Note that the version of TensorFlow included in a runtime version may differ from the numbering of the runtime version itself, because it may have a different patch version. In this field, you must specify the runtime version (TensorFlow minor version). For example, if your custom container runs TensorFlow |
HyperparameterSpec
Represents a set of hyperparameters to optimize.
JSON representation | |
---|---|
{ "goal": enum ( |
Fields | |
---|---|
goal |
Required. The type of goal to use for tuning. Available types are Defaults to |
params[] |
Required. The set of parameters to tune. |
maxTrials |
Optional. How many training trials should be attempted to optimize the specified hyperparameters. Defaults to one. |
maxParallelTrials |
Optional. The number of training trials to run concurrently. You can reduce the time it takes to perform hyperparameter tuning by adding trials in parallel. However, each trail only benefits from the information gained in completed trials. That means that a trial does not get access to the results of trials running at the same time, which could reduce the quality of the overall optimization. Each trial will use the same scale tier and machine types. Defaults to one. |
maxFailedTrials |
Optional. The number of failed trials that need to be seen before failing the hyperparameter tuning job. You can specify this field to override the default failing criteria for AI Platform hyperparameter tuning jobs. Defaults to zero, which means the service decides when a hyperparameter job should fail. |
hyperparameterMetricTag |
Optional. The TensorFlow summary tag name to use for optimizing trials. For current versions of TensorFlow, this tag name should exactly match what is shown in TensorBoard, including all scopes. For versions of TensorFlow prior to 0.12, this should be only the tag passed to tf.Summary. By default, "training/hptuning/metric" will be used. |
resumePreviousJobId |
Optional. The prior hyperparameter tuning job id that users hope to continue with. The job id will be used to find the corresponding vizier study guid and resume the study. |
enableTrialEarlyStopping |
Optional. Indicates if the hyperparameter tuning job enables auto trial early stopping. |
algorithm |
Optional. The search algorithm specified for the hyperparameter tuning job. Uses the default AI Platform hyperparameter tuning algorithm if unspecified. |
GoalType
The available types of optimization goals.
Enums | |
---|---|
GOAL_TYPE_UNSPECIFIED |
Goal Type will default to maximize. |
MAXIMIZE |
Maximize the goal metric. |
MINIMIZE |
Minimize the goal metric. |
ParameterSpec
Represents a single hyperparameter to optimize.
JSON representation | |
---|---|
{ "parameterName": string, "type": enum ( |
Fields | |
---|---|
parameterName |
Required. The parameter name must be unique amongst all ParameterConfigs in a HyperparameterSpec message. E.g., "learning_rate". |
type |
Required. The type of the parameter. |
minValue |
Required if type is |
maxValue |
Required if type is |
categoricalValues[] |
Required if type is |
discreteValues[] |
Required if type is |
scaleType |
Optional. How the parameter should be scaled to the hypercube. Leave unset for categorical parameters. Some kind of scaling is strongly recommended for real or integral parameters (e.g., |
ParameterType
The type of the parameter.
Enums | |
---|---|
PARAMETER_TYPE_UNSPECIFIED |
You must specify a valid type. Using this unspecified type will result in an error. |
DOUBLE |
Type for real-valued parameters. |
INTEGER |
Type for integral parameters. |
CATEGORICAL |
The parameter is categorical, with a value chosen from the categories field. |
DISCRETE |
The parameter is real valued, with a fixed set of feasible points. If type==DISCRETE , feasible_points must be provided, and {minValue , maxValue } will be ignored. |
ScaleType
The type of scaling that should be applied to this parameter.
Enums | |
---|---|
NONE |
By default, no scaling is applied. |
UNIT_LINEAR_SCALE |
Scales the feasible space to (0, 1) linearly. |
UNIT_LOG_SCALE |
Scales the feasible space logarithmically to (0, 1). The entire feasible space must be strictly positive. |
UNIT_REVERSE_LOG_SCALE |
Scales the feasible space "reverse" logarithmically to (0, 1). The result is that values close to the top of the feasible space are spread out more than points near the bottom. The entire feasible space must be strictly positive. |
Algorithm
The available search algorithms for hyperparameter tuning. Learn more about these algorithms.
Enums | |
---|---|
ALGORITHM_UNSPECIFIED |
The default algorithm used by the hyperparameter tuning service. This is a Bayesian optimization algorithm. |
GRID_SEARCH |
Simple grid search within the feasible space. To use grid search, all parameters must be INTEGER , CATEGORICAL , or DISCRETE . |
RANDOM_SEARCH |
Simple random search within the feasible space. |
PredictionInput
Represents input parameters for a prediction job.
JSON representation | |
---|---|
{ "dataFormat": enum ( |
Fields | ||
---|---|---|
dataFormat |
Required. The format of the input data files. |
|
outputDataFormat |
Optional. Format of the output data files, defaults to JSON. |
|
inputPaths[] |
Required. The Cloud Storage location of the input data files. May contain wildcards. |
|
maxWorkerCount |
Optional. The maximum number of workers to be used for parallel processing. Defaults to 10 if not specified. |
|
region |
Required. The Google Compute Engine region to run the prediction job in. See the available regions for AI Platform services. |
|
runtimeVersion |
Optional. The AI Platform runtime version to use for this batch prediction. If not set, AI Platform will pick the runtime version used during the versions.create request for this model version, or choose the latest stable version when model version information is not available such as when the model is specified by uri. |
|
batchSize |
Optional. Number of records per batch, defaults to 64. The service will buffer batchSize number of records in memory before invoking one Tensorflow prediction call internally. So take the record size and memory available into consideration when setting this parameter. |
|
signatureName |
Optional. The name of the signature defined in the SavedModel to use for this job. Please refer to SavedModel for information about how to use signatures. Defaults to DEFAULT_SERVING_SIGNATURE_DEF_KEY , which is "serving_default". |
|
Union field model_version . Required. The model or the version to use for prediction. model_version can be only one of the following: |
||
modelName |
Use this field if you want to use the default version for the specified model. The string must use the following format:
|
|
versionName |
Use this field if you want to specify a version of the model to use. The string is formatted the same way as
|
|
uri |
Use this field if you want to specify a Google Cloud Storage path for the model to use. |
|
outputPath |
Required. The output Google Cloud Storage location. |
DataFormat
The format used to separate data instances in the source and destination files.
Enums | |
---|---|
DATA_FORMAT_UNSPECIFIED |
Unspecified format. |
JSON |
Each line of the file is a JSON dictionary representing one record. |
TEXT |
Deprecated. Use JSON instead. |
TF_RECORD |
The source file is a TFRecord file. Currently available only for input data. |
TF_RECORD_GZIP |
The source file is a GZIP-compressed TFRecord file. Currently available only for input data. |
CSV |
Values are comma-separated rows, with keys in a separate file. Currently available only for output data. |
State
Describes the job state.
Enums | |
---|---|
STATE_UNSPECIFIED |
The job state is unspecified. |
QUEUED |
The job has been just created and processing has not yet begun. |
PREPARING |
The service is preparing to run the job. |
RUNNING |
The job is in progress. |
SUCCEEDED |
The job completed successfully. |
FAILED |
The job failed. errorMessage should contain the details of the failure. |
CANCELLING |
The job is being cancelled. errorMessage should describe the reason for the cancellation. |
CANCELLED |
The job has been cancelled. errorMessage should describe the reason for the cancellation. |
TrainingOutput
Represents results of a training job. Output only.
JSON representation | |
---|---|
{ "completedTrialCount": string, "trials": [ { object ( |
Fields | |
---|---|
completedTrialCount |
The number of hyperparameter tuning trials that completed successfully. Only set for hyperparameter tuning jobs. |
trials[] |
Results for individual Hyperparameter trials. Only set for hyperparameter tuning jobs. |
consumedMLUnits |
The amount of ML units consumed by the job. |
isHyperparameterTuningJob |
Whether this job is a hyperparameter tuning job. |
isBuiltInAlgorithmJob |
Whether this job is a built-in Algorithm job. |
builtInAlgorithmOutput |
Details related to built-in algorithms jobs. Only set for built-in algorithms jobs. |
hyperparameterMetricTag |
The TensorFlow summary tag name used for optimizing hyperparameter tuning trials. See |
HyperparameterOutput
Represents the result of a single hyperparameter tuning trial from a training job. The TrainingOutput object that is returned on successful completion of a training job with hyperparameter tuning includes a list of HyperparameterOutput objects, one for each successful trial.
JSON representation | |
---|---|
{ "trialId": string, "hyperparameters": { string: string, ... }, "startTime": string, "endTime": string, "state": enum ( |
Fields | |
---|---|
trialId |
The trial id for these results. |
hyperparameters |
The hyperparameters given to this trial. An object containing a list of |
startTime |
Output only. Start time for the trial. A timestamp in RFC3339 UTC "Zulu" format, accurate to nanoseconds. Example: |
endTime |
Output only. End time for the trial. A timestamp in RFC3339 UTC "Zulu" format, accurate to nanoseconds. Example: |
state |
Output only. The detailed state of the trial. |
finalMetric |
The final objective metric seen for this trial. |
isTrialStoppedEarly |
True if the trial is stopped early. |
allMetrics[] |
All recorded object metrics for this trial. This field is not currently populated. |
builtInAlgorithmOutput |
Details related to built-in algorithms jobs. Only set for trials of built-in algorithms jobs that have succeeded. |
HyperparameterMetric
An observed value of a metric.
JSON representation | |
---|---|
{ "trainingStep": string, "objectiveValue": number } |
Fields | |
---|---|
trainingStep |
The global training step for this metric. |
objectiveValue |
The objective value at this training step. |
BuiltInAlgorithmOutput
Represents output related to a built-in algorithm Job.
JSON representation | |
---|---|
{ "framework": string, "runtimeVersion": string, "pythonVersion": string, "modelPath": string } |
Fields | |
---|---|
framework |
Framework on which the built-in algorithm was trained. |
runtimeVersion |
AI Platform runtime version on which the built-in algorithm was trained. |
pythonVersion |
Python version on which the built-in algorithm was trained. |
modelPath |
The Cloud Storage path to the |
PredictionOutput
Represents results of a prediction job.
JSON representation | |
---|---|
{ "outputPath": string, "predictionCount": string, "errorCount": string, "nodeHours": number } |
Fields | |
---|---|
outputPath |
The output Google Cloud Storage location provided at the job creation time. |
predictionCount |
The number of generated predictions. |
errorCount |
The number of data instances which resulted in errors. |
nodeHours |
Node hours used by the batch prediction job. |
Methods |
|
---|---|
|
Cancels a running job. |
|
Creates a training or a batch prediction job. |
|
Describes a job. |
|
Gets the access control policy for a resource. |
|
Lists the jobs in the project. |
|
Updates a specific job resource. |
|
Sets the access control policy on the specified resource. |
|
Returns permissions that a caller has on the specified resource. |