Class HyperparameterTuningJob (1.122.0)

HyperparameterTuningJob(
    display_name: str,
    custom_job: google.cloud.aiplatform.jobs.CustomJob,
    metric_spec: typing.Dict[str, str],
    parameter_spec: typing.Dict[
        str, google.cloud.aiplatform.hyperparameter_tuning._ParameterSpec
    ],
    max_trial_count: int,
    parallel_trial_count: int,
    max_failed_trial_count: int = 0,
    search_algorithm: typing.Optional[str] = None,
    measurement_selection: typing.Optional[str] = "best",
    project: typing.Optional[str] = None,
    location: typing.Optional[str] = None,
    credentials: typing.Optional[google.auth.credentials.Credentials] = None,
    labels: typing.Optional[typing.Dict[str, str]] = None,
    encryption_spec_key_name: typing.Optional[str] = None,
)

Vertex AI Hyperparameter Tuning Job.

Properties

create_time

Time this resource was created.

display_name

Display name of this resource.

encryption_spec

Customer-managed encryption key options for this Vertex AI resource.

If this is set, then all resources created by this Vertex AI resource will be encrypted with the provided encryption key.

end_time

Time when the Job resource entered the JOB_STATE_SUCCEEDED, JOB_STATE_FAILED, or JOB_STATE_CANCELLED state.

error

Detailed error info for this Job resource. Only populated when the Job's state is JOB_STATE_FAILED or JOB_STATE_CANCELLED.

gca_resource

The underlying resource proto representation.

labels

User-defined labels containing metadata about this resource.

Read more about labels at https://goo.gl/xmQnxf

name

Name of this resource.

network

The full name of the Google Compute Engine network to which this HyperparameterTuningJob should be peered.

Takes the format projects/{project}/global/networks/{network}. Where {project} is a project number, as in 12345, and {network} is a network name.

Private services access must already be configured for the network. If left unspecified, the HyperparameterTuningJob is not peered with any network.

preview

Exposes features available in preview for this class.

resource_name

Full qualified resource name.

start_time

Time when the Job resource entered the JOB_STATE_RUNNING for the first time.

state

Fetch Job again and return the current JobState.

Returns
Type	Description
`state (job_state.JobState)`	Enum that describes the state of a Vertex AI job.

update_time

Time this resource was last updated.

web_access_uris

Fetch the runnable job again and return the latest web access uris.

Returns
Type	Description
`(Dict[str, Union[str, Dict[str, str]]])`	Web access uris of the runnable job.

Methods

HyperparameterTuningJob

HyperparameterTuningJob(
    display_name: str,
    custom_job: google.cloud.aiplatform.jobs.CustomJob,
    metric_spec: typing.Dict[str, str],
    parameter_spec: typing.Dict[
        str, google.cloud.aiplatform.hyperparameter_tuning._ParameterSpec
    ],
    max_trial_count: int,
    parallel_trial_count: int,
    max_failed_trial_count: int = 0,
    search_algorithm: typing.Optional[str] = None,
    measurement_selection: typing.Optional[str] = "best",
    project: typing.Optional[str] = None,
    location: typing.Optional[str] = None,
    credentials: typing.Optional[google.auth.credentials.Credentials] = None,
    labels: typing.Optional[typing.Dict[str, str]] = None,
    encryption_spec_key_name: typing.Optional[str] = None,
)

Configures a HyperparameterTuning Job.

Example usage:

from google.cloud.aiplatform import hyperparameter_tuning as hpt

worker_pool_specs = [
        {
            "machine_spec": {
                "machine_type": "n1-standard-4",
                "accelerator_type": "NVIDIA_TESLA_K80",
                "accelerator_count": 1,
            },
            "replica_count": 1,
            "container_spec": {
                "image_uri": container_image_uri,
                "command": [],
                "args": [],
            },
        }
    ]

custom_job = aiplatform.CustomJob(
    display_name='my_job',
    worker_pool_specs=worker_pool_specs,
    labels={'my_key': 'my_value'},
)


hp_job = aiplatform.HyperparameterTuningJob(
    display_name='hp-test',
    custom_job=job,
    metric_spec={
        'loss': 'minimize',
    },
    parameter_spec={
        'lr': hpt.DoubleParameterSpec(min=0.001, max=0.1, scale='log'),
        'units': hpt.IntegerParameterSpec(min=4, max=128, scale='linear'),
        'activation': hpt.CategoricalParameterSpec(values=['relu', 'selu']),
        'batch_size': hpt.DiscreteParameterSpec(values=[128, 256], scale='linear')
    },
    max_trial_count=128,
    parallel_trial_count=8,
    labels={'my_key': 'my_value'},
    )

hp_job.run()

print(hp_job.trials)

For more information on using hyperparameter tuning please visit: https://cloud.google.com/ai-platform-unified/docs/training/using-hyperparameter-tuning

Parameters
Name	Description
`display_name`	`str` Required. The user-defined name of the HyperparameterTuningJob. The name can be up to 128 characters long and can be consist of any UTF-8 characters.
`custom_job`	`aiplatform.CustomJob` Required. Configured CustomJob. The worker pool spec from this custom job applies to the CustomJobs created in all the trials. A persistent_resource_id can be specified on the custom job to be used when running this Hyperparameter Tuning job.
`parameter_spec`	`Dict[str, hyperparameter_tuning._ParameterSpec]` Required. Dictionary representing parameters to optimize. The dictionary key is the metric_id, which is passed into your training job as a command line key word argument, and the dictionary value is the parameter specification of the metric. from google.cloud.aiplatform import hyperparameter_tuning as hpt parameter_spec={ 'decay': hpt.DoubleParameterSpec(min=1e-7, max=1, scale='linear'), 'learning_rate': hpt.DoubleParameterSpec(min=1e-7, max=1, scale='linear') 'batch_size': hpt.DiscreteParamterSpec(values=[4, 8, 16, 32, 64, 128], scale='linear') } Supported parameter specifications can be found until aiplatform.hyperparameter_tuning. These parameter specification are currently supported: DoubleParameterSpec, IntegerParameterSpec, CategoricalParameterSpace, DiscreteParameterSpec
`max_trial_count`	`int` Required. The desired total number of Trials.
`parallel_trial_count`	`int` Required. The desired number of Trials to run in parallel.
`max_failed_trial_count`	`int` Optional. The number of failed Trials that need to be seen before failing the HyperparameterTuningJob. If set to 0, Vertex AI decides how many Trials must fail before the whole job fails.
`search_algorithm`	`str` The search algorithm specified for the Study. Accepts one of the following: `None` - If you do not specify an algorithm, your job uses the default Vertex AI algorithm. The default algorithm applies Bayesian optimization to arrive at the optimal solution with a more effective search over the parameter space. 'grid' - A simple grid search within the feasible space. This option is particularly useful if you want to specify a quantity of trials that is greater than the number of points in the feasible space. In such cases, if you do not specify a grid search, the Vertex AI default algorithm may generate duplicate suggestions. To use grid search, all parameter specs must be of type `IntegerParameterSpec`, `CategoricalParameterSpace`, or `DiscreteParameterSpec`. 'random' - A simple random search within the feasible space.
`measurement_selection`	`str` This indicates which measurement to use if/when the service automatically selects the final measurement from previously reported intermediate measurements. Accepts: 'best', 'last' Choose this based on two considerations: A) Do you expect your measurements to monotonically improve? If so, choose 'last'. On the other hand, if you're in a situation where your system can "over-train" and you expect the performance to get better for a while but then start declining, choose 'best'. B) Are your measurements significantly noisy and/or irreproducible? If so, 'best' will tend to be over-optimistic, and it may be better to choose 'last'. If both or neither of (A) and (B) apply, it doesn't matter which selection type is chosen.
`project`	`str` Optional. Project to run the HyperparameterTuningjob in. Overrides project set in aiplatform.init.
`location`	`str` Optional. Location to run the HyperparameterTuning in. Overrides location set in aiplatform.init.
`credentials`	`auth_credentials.Credentials` Optional. Custom credentials to use to run call HyperparameterTuning service. Overrides credentials set in aiplatform.init.
`labels`	`Dict[str, str]` Optional. The labels with user-defined metadata to organize HyperparameterTuningJobs. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
`encryption_spec_key_name`	`str` Optional. Customer-managed encryption key options for a HyperparameterTuningJob. If this is set, then all resources created by the HyperparameterTuningJob will be encrypted with the provided encryption key.

cancel

cancel() -> None

Cancels this Job.

Success of cancellation is not guaranteed. Use Job.state property to verify if cancellation was successful.

delete

delete(sync: bool = True) -> None

Deletes this Vertex AI resource. WARNING: This deletion is permanent.

done

done() -> bool

Method indicating whether a job has completed.

get

get(
    resource_name: str,
    project: typing.Optional[str] = None,
    location: typing.Optional[str] = None,
    credentials: typing.Optional[google.auth.credentials.Credentials] = None,
) -> google.cloud.aiplatform.jobs._RunnableJob

Get a Vertex AI Job for the given resource_name.

Parameters
Name	Description
`resource_name`	`str` Required. A fully-qualified resource name or ID.
`project`	`str` Optional. project to retrieve dataset from. If not set, project set in aiplatform.init will be used.
`location`	`str` Optional. location to retrieve dataset from. If not set, location set in aiplatform.init will be used.
`credentials`	`auth_credentials.Credentials` Custom credentials to use to upload this model. Overrides credentials set in aiplatform.init.

list

list(
    filter: typing.Optional[str] = None,
    order_by: typing.Optional[str] = None,
    project: typing.Optional[str] = None,
    location: typing.Optional[str] = None,
    credentials: typing.Optional[google.auth.credentials.Credentials] = None,
) -> typing.List[google.cloud.aiplatform.base.VertexAiResourceNoun]

List all instances of this Job Resource.

Example Usage:

aiplatform.BatchPredictionJobs.list( filter='state="JOB_STATE_SUCCEEDED" AND display_name="my_job"', )

Parameters
Name	Description
`filter`	`str` Optional. An expression for filtering the results of the request. For field names both snake_case and camelCase are supported.
`order_by`	`str` Optional. A comma-separated list of fields to order by, sorted in ascending order. Use "desc" after a field name for descending. Supported fields: `display_name`, `create_time`, `update_time`
`project`	`str` Optional. Project to retrieve list from. If not set, project set in aiplatform.init will be used.
`location`	`str` Optional. Location to retrieve list from. If not set, location set in aiplatform.init will be used.
`credentials`	`auth_credentials.Credentials` Optional. Custom credentials to use to retrieve list. Overrides credentials set in aiplatform.init.

run

run(
    service_account: typing.Optional[str] = None,
    network: typing.Optional[str] = None,
    timeout: typing.Optional[int] = None,
    restart_job_on_worker_restart: bool = False,
    enable_web_access: bool = False,
    tensorboard: typing.Optional[str] = None,
    sync: bool = True,
    create_request_timeout: typing.Optional[float] = None,
    disable_retries: bool = False,
    scheduling_strategy: typing.Optional[
        google.cloud.aiplatform_v1.types.custom_job.Scheduling.Strategy
    ] = None,
    max_wait_duration: typing.Optional[int] = None,
) -> None

Run this configured CustomJob.

Parameters
Name	Description
`service_account`	`str` Optional. Specifies the service account for workload run-as account. Users submitting jobs must have act-as permission on this run-as account.
`network`	`str` Optional. The full name of the Compute Engine network to which the job should be peered. For example, projects/12345/global/networks/myVPC. Private services access must already be configured for the network. If left unspecified, the network set in aiplatform.init will be used. Otherwise, the job is not peered with any network.
`timeout`	`int` Optional. The maximum job running time in seconds. The default is 7 days.
`restart_job_on_worker_restart`	`bool` Restarts the entire CustomJob if a worker gets restarted. This feature can be used by distributed training jobs that are not resilient to workers leaving and joining a job.
`enable_web_access`	`bool` Whether you want Vertex AI to enable interactive shell access to training containers. https://cloud.google.com/vertex-ai/docs/training/monitor-debug-interactive-shell
`tensorboard`	`str` Optional. The name of a Vertex AI Tensorboard resource to which this CustomJob will upload Tensorboard logs. Format: `projects/{project}/locations/{location}/tensorboards/{tensorboard}` The training script should write Tensorboard to following Vertex AI environment variable: AIP_TENSORBOARD_LOG_DIR `service_account` is required with provided `tensorboard`. For more information on configuring your service account please visit: https://cloud.google.com/vertex-ai/docs/experiments/tensorboard-training
`sync`	`bool` Whether to execute this method synchronously. If False, this method will unblock and it will be executed in a concurrent Future.
`create_request_timeout`	`float` Optional. The timeout for the create request in seconds.
`disable_retries`	`bool` Indicates if the job should retry for internal errors after the job starts running. If True, overrides `restart_job_on_worker_restart` to False.
`scheduling_strategy`	`gca_custom_job_compat.Scheduling.Strategy` Optional. Indicates the job scheduling strategy.
`max_wait_duration`	`int` This is the maximum duration that a job will wait for the requested resources to be provisioned in seconds. If set to 0, the job will wait indefinitely. The default is 1 day.

to_dict

to_dict() -> typing.Dict[str, typing.Any]

Returns the resource proto as a dictionary.

wait

wait()

Helper method that blocks until all futures are complete.

wait_for_completion

wait_for_completion() -> None

Waits for job to complete.

Exceptions
Type	Description
`RuntimeError`	If job failed or cancelled.

wait_for_resource_creation

wait_for_resource_creation() -> None

Waits until resource has been created.