Class HyperparameterTuningJob (1.45.0)

HyperparameterTuningJob(
    display_name: str,
    custom_job: google.cloud.aiplatform.jobs.CustomJob,
    metric_spec: typing.Dict[str, str],
    parameter_spec: typing.Dict[
        str, google.cloud.aiplatform.hyperparameter_tuning._ParameterSpec
    ],
    max_trial_count: int,
    parallel_trial_count: int,
    max_failed_trial_count: int = 0,
    search_algorithm: typing.Optional[str] = None,
    measurement_selection: typing.Optional[str] = "best",
    project: typing.Optional[str] = None,
    location: typing.Optional[str] = None,
    credentials: typing.Optional[google.auth.credentials.Credentials] = None,
    labels: typing.Optional[typing.Dict[str, str]] = None,
    encryption_spec_key_name: typing.Optional[str] = None,
)

Vertex AI Hyperparameter Tuning Job.

Properties

create_time

Time this resource was created.

display_name

Display name of this resource.

encryption_spec

Customer-managed encryption key options for this Vertex AI resource.

If this is set, then all resources created by this Vertex AI resource will be encrypted with the provided encryption key.

end_time

Time when the Job resource entered the JOB_STATE_SUCCEEDED, JOB_STATE_FAILED, or JOB_STATE_CANCELLED state.

error

Detailed error info for this Job resource. Only populated when the Job's state is JOB_STATE_FAILED or JOB_STATE_CANCELLED.

gca_resource

The underlying resource proto representation.

labels

User-defined labels containing metadata about this resource.

Read more about labels at https://goo.gl/xmQnxf

name

Name of this resource.

network

The full name of the Google Compute Engine network to which this HyperparameterTuningJob should be peered.

Takes the format projects/{project}/global/networks/{network}. Where {project} is a project number, as in 12345, and {network} is a network name.

Private services access must already be configured for the network. If left unspecified, the HyperparameterTuningJob is not peered with any network.

preview

Exposes features available in preview for this class.

resource_name

Full qualified resource name.

start_time

Time when the Job resource entered the JOB_STATE_RUNNING for the first time.

state

Fetch Job again and return the current JobState.

Returns
Type Description
state (job_state.JobState) Enum that describes the state of a Vertex AI job.

update_time

Time this resource was last updated.

web_access_uris

Fetch the runnable job again and return the latest web access uris.

Returns
Type Description
(Dict[str, Union[str, Dict[str, str]]]) Web access uris of the runnable job.

Methods

HyperparameterTuningJob

HyperparameterTuningJob(
    display_name: str,
    custom_job: google.cloud.aiplatform.jobs.CustomJob,
    metric_spec: typing.Dict[str, str],
    parameter_spec: typing.Dict[
        str, google.cloud.aiplatform.hyperparameter_tuning._ParameterSpec
    ],
    max_trial_count: int,
    parallel_trial_count: int,
    max_failed_trial_count: int = 0,
    search_algorithm: typing.Optional[str] = None,
    measurement_selection: typing.Optional[str] = "best",
    project: typing.Optional[str] = None,
    location: typing.Optional[str] = None,
    credentials: typing.Optional[google.auth.credentials.Credentials] = None,
    labels: typing.Optional[typing.Dict[str, str]] = None,
    encryption_spec_key_name: typing.Optional[str] = None,
)

Configures a HyperparameterTuning Job.

Example usage:

from google.cloud.aiplatform import hyperparameter_tuning as hpt

worker_pool_specs = [
        {
            "machine_spec": {
                "machine_type": "n1-standard-4",
                "accelerator_type": "NVIDIA_TESLA_K80",
                "accelerator_count": 1,
            },
            "replica_count": 1,
            "container_spec": {
                "image_uri": container_image_uri,
                "command": [],
                "args": [],
            },
        }
    ]

custom_job = aiplatform.CustomJob(
    display_name='my_job',
    worker_pool_specs=worker_pool_specs,
    labels={'my_key': 'my_value'},
)


hp_job = aiplatform.HyperparameterTuningJob(
    display_name='hp-test',
    custom_job=job,
    metric_spec={
        'loss': 'minimize',
    },
    parameter_spec={
        'lr': hpt.DoubleParameterSpec(min=0.001, max=0.1, scale='log'),
        'units': hpt.IntegerParameterSpec(min=4, max=128, scale='linear'),
        'activation': hpt.CategoricalParameterSpec(values=['relu', 'selu']),
        'batch_size': hpt.DiscreteParameterSpec(values=[128, 256], scale='linear')
    },
    max_trial_count=128,
    parallel_trial_count=8,
    labels={'my_key': 'my_value'},
    )

hp_job.run()

print(hp_job.trials)

For more information on using hyperparameter tuning please visit: https://cloud.google.com/ai-platform-unified/docs/training/using-hyperparameter-tuning

Parameters
Name Description
display_name str

Required. The user-defined name of the HyperparameterTuningJob. The name can be up to 128 characters long and can be consist of any UTF-8 characters.

custom_job aiplatform.CustomJob

Required. Configured CustomJob. The worker pool spec from this custom job applies to the CustomJobs created in all the trials.

metric_spec typing.Dict[str, str]

Dict[str, str] Required. Dictionary representing metrics to optimize. The dictionary key is the metric_id, which is reported by your training job, and the dictionary value is the optimization goal of the metric('minimize' or 'maximize'). example: metric_spec = {'loss': 'minimize', 'accuracy': 'maximize'}

parameter_spec Dict[str, hyperparameter_tuning._ParameterSpec]

Required. Dictionary representing parameters to optimize. The dictionary key is the metric_id, which is passed into your training job as a command line key word argument, and the dictionary value is the parameter specification of the metric. from google.cloud.aiplatform import hyperparameter_tuning as hpt parameter_spec={ 'decay': hpt.DoubleParameterSpec(min=1e-7, max=1, scale='linear'), 'learning_rate': hpt.DoubleParameterSpec(min=1e-7, max=1, scale='linear') 'batch_size': hpt.DiscreteParamterSpec(values=[4, 8, 16, 32, 64, 128], scale='linear') } Supported parameter specifications can be found until aiplatform.hyperparameter_tuning. These parameter specification are currently supported: DoubleParameterSpec, IntegerParameterSpec, CategoricalParameterSpace, DiscreteParameterSpec

max_trial_count int

Required. The desired total number of Trials.

parallel_trial_count int

Required. The desired number of Trials to run in parallel.

max_failed_trial_count int

Optional. The number of failed Trials that need to be seen before failing the HyperparameterTuningJob. If set to 0, Vertex AI decides how many Trials must fail before the whole job fails.

search_algorithm str

The search algorithm specified for the Study. Accepts one of the following: None - If you do not specify an algorithm, your job uses the default Vertex AI algorithm. The default algorithm applies Bayesian optimization to arrive at the optimal solution with a more effective search over the parameter space. 'grid' - A simple grid search within the feasible space. This option is particularly useful if you want to specify a quantity of trials that is greater than the number of points in the feasible space. In such cases, if you do not specify a grid search, the Vertex AI default algorithm may generate duplicate suggestions. To use grid search, all parameter specs must be of type IntegerParameterSpec, CategoricalParameterSpace, or DiscreteParameterSpec. 'random' - A simple random search within the feasible space.

measurement_selection str

This indicates which measurement to use if/when the service automatically selects the final measurement from previously reported intermediate measurements. Accepts: 'best', 'last' Choose this based on two considerations: A) Do you expect your measurements to monotonically improve? If so, choose 'last'. On the other hand, if you're in a situation where your system can "over-train" and you expect the performance to get better for a while but then start declining, choose 'best'. B) Are your measurements significantly noisy and/or irreproducible? If so, 'best' will tend to be over-optimistic, and it may be better to choose 'last'. If both or neither of (A) and (B) apply, it doesn't matter which selection type is chosen.

project str

Optional. Project to run the HyperparameterTuningjob in. Overrides project set in aiplatform.init.

location str

Optional. Location to run the HyperparameterTuning in. Overrides location set in aiplatform.init.

credentials auth_credentials.Credentials

Optional. Custom credentials to use to run call HyperparameterTuning service. Overrides credentials set in aiplatform.init.

labels Dict[str, str]

Optional. The labels with user-defined metadata to organize HyperparameterTuningJobs. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.

encryption_spec_key_name str

Optional. Customer-managed encryption key options for a HyperparameterTuningJob. If this is set, then all resources created by the HyperparameterTuningJob will be encrypted with the provided encryption key.

cancel

cancel() -> None

Cancels this Job.

Success of cancellation is not guaranteed. Use Job.state property to verify if cancellation was successful.

delete

delete(sync: bool = True) -> None

Deletes this Vertex AI resource. WARNING: This deletion is permanent.

Parameter
Name Description
sync bool

Whether to execute this deletion synchronously. If False, this method will be executed in concurrent Future and any downstream object will be immediately returned and synced when the Future has completed.

done

done() -> bool

Method indicating whether a job has completed.

get

get(
    resource_name: str,
    project: typing.Optional[str] = None,
    location: typing.Optional[str] = None,
    credentials: typing.Optional[google.auth.credentials.Credentials] = None,
) -> google.cloud.aiplatform.jobs._RunnableJob

Get a Vertex AI Job for the given resource_name.

Parameters
Name Description
resource_name str

Required. A fully-qualified resource name or ID.

project str

Optional. project to retrieve dataset from. If not set, project set in aiplatform.init will be used.

location str

Optional. location to retrieve dataset from. If not set, location set in aiplatform.init will be used.

credentials auth_credentials.Credentials

Custom credentials to use to upload this model. Overrides credentials set in aiplatform.init.

list

list(
    filter: typing.Optional[str] = None,
    order_by: typing.Optional[str] = None,
    project: typing.Optional[str] = None,
    location: typing.Optional[str] = None,
    credentials: typing.Optional[google.auth.credentials.Credentials] = None,
) -> typing.List[google.cloud.aiplatform.base.VertexAiResourceNoun]

List all instances of this Job Resource.

Example Usage:

aiplatform.BatchPredictionJobs.list( filter='state="JOB_STATE_SUCCEEDED" AND display_name="my_job"', )

Parameters
Name Description
filter str

Optional. An expression for filtering the results of the request. For field names both snake_case and camelCase are supported.

order_by str

Optional. A comma-separated list of fields to order by, sorted in ascending order. Use "desc" after a field name for descending. Supported fields: display_name, create_time, update_time

project str

Optional. Project to retrieve list from. If not set, project set in aiplatform.init will be used.

location str

Optional. Location to retrieve list from. If not set, location set in aiplatform.init will be used.

credentials auth_credentials.Credentials

Optional. Custom credentials to use to retrieve list. Overrides credentials set in aiplatform.init.

run

run(
    service_account: typing.Optional[str] = None,
    network: typing.Optional[str] = None,
    timeout: typing.Optional[int] = None,
    restart_job_on_worker_restart: bool = False,
    enable_web_access: bool = False,
    tensorboard: typing.Optional[str] = None,
    sync: bool = True,
    create_request_timeout: typing.Optional[float] = None,
    disable_retries: bool = False,
) -> None

Run this configured CustomJob.

Parameters
Name Description
service_account str

Optional. Specifies the service account for workload run-as account. Users submitting jobs must have act-as permission on this run-as account.

network str

Optional. The full name of the Compute Engine network to which the job should be peered. For example, projects/12345/global/networks/myVPC. Private services access must already be configured for the network. If left unspecified, the network set in aiplatform.init will be used. Otherwise, the job is not peered with any network.

timeout int

Optional. The maximum job running time in seconds. The default is 7 days.

restart_job_on_worker_restart bool

Restarts the entire CustomJob if a worker gets restarted. This feature can be used by distributed training jobs that are not resilient to workers leaving and joining a job.

enable_web_access bool

Whether you want Vertex AI to enable interactive shell access to training containers. https://cloud.google.com/vertex-ai/docs/training/monitor-debug-interactive-shell

tensorboard str

Optional. The name of a Vertex AI Tensorboard resource to which this CustomJob will upload Tensorboard logs. Format: projects/{project}/locations/{location}/tensorboards/{tensorboard} The training script should write Tensorboard to following Vertex AI environment variable: AIP_TENSORBOARD_LOG_DIR service_account is required with provided tensorboard. For more information on configuring your service account please visit: https://cloud.google.com/vertex-ai/docs/experiments/tensorboard-training

sync bool

Whether to execute this method synchronously. If False, this method will unblock and it will be executed in a concurrent Future.

create_request_timeout float

Optional. The timeout for the create request in seconds.

disable_retries bool

Indicates if the job should retry for internal errors after the job starts running. If True, overrides restart_job_on_worker_restart to False.

to_dict

to_dict() -> typing.Dict[str, typing.Any]

Returns the resource proto as a dictionary.

wait

wait()

Helper method that blocks until all futures are complete.

wait_for_completion

wait_for_completion() -> None

Waits for job to complete.

Exceptions
Type Description
RuntimeError If job failed or cancelled.

wait_for_resource_creation

wait_for_resource_creation() -> None

Waits until resource has been created.