This page shows you how to train an AutoML classification model from an image dataset using either the Google Cloud console or the Vertex AI API.
Train an AutoML model
Google Cloud console
In the Google Cloud console, in the Vertex AI section, go to the Datasets page.
Click the name of the dataset you want to use to train your model to open its details page.
Click Train new model.
For the training method, select
AutoML.Click Continue.
Enter a name for the model.
If you want manually set how your training data is split, expand Advanced options and select a data split option. Learn more.
Click Start Training.
Model training can take many hours, depending on the size and complexity of your data and your training budget, if you specified one. You can close this tab and return to it later. You will receive an email when your model has completed training.
API
Select the tab below for your objective:
Classification
Select the tab below for your language or environment:
REST
Before using any of the request data, make the following replacements:
- LOCATION: Region where dataset is located and Model is created. For example,
us-central1
. - PROJECT: Your project ID.
- TRAININGPIPELINE_DISPLAYNAME: Required. A display name for the trainingPipeline.
- DATASET_ID: The ID number for the dataset to use for training.
fractionSplit
: Optional. One of several possible ML use split options for your data. ForfractionSplit
, values must sum to 1. For example:-
{"trainingFraction": "0.7","validationFraction": "0.15","testFraction": "0.15"}
-
- MODEL_DISPLAYNAME*: A display name for the model uploaded (created) by the TrainingPipeline.
- MODEL_DESCRIPTION*: A description for the model.
- modelToUpload.labels*: Any set of key-value pairs to organize your
models. For example:
- "env": "prod"
- "tier": "backend"
- MODELTYPE†: The type of Cloud-hosted model to train. The options
are:
CLOUD
(default)
- NODE_HOUR_BUDGET†: The actual training cost will be equal or less than this value. For Cloud models the budget must be: 8,000 - 800,000 milli node hours (inclusive). The default value is 192,000 which represents one day in wall time, assuming 8 nodes are used.
- PROJECT_NUMBER: Your project's automatically generated project number
* | Schema file's description you specify in trainingTaskDefinition describes the
use of this field. |
† | Schema file you specify in trainingTaskDefinition declares and describes this
field. |
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/trainingPipelines
Request JSON body:
{ "displayName": "TRAININGPIPELINE_DISPLAYNAME", "inputDataConfig": { "datasetId": "DATASET_ID", "fractionSplit": { "trainingFraction": "DECIMAL", "validationFraction": "DECIMAL", "testFraction": "DECIMAL" } }, "modelToUpload": { "displayName": "MODEL_DISPLAYNAME", "description": "MODEL_DESCRIPTION", "labels": { "KEY": "VALUE" } }, "trainingTaskDefinition": "gs://google-cloud-aiplatform/schema/trainingjob/definition/automl_image_classification_1.0.0.yaml", "trainingTaskInputs": { "multiLabel": "false", "modelType": ["MODELTYPE"], "budgetMilliNodeHours": NODE_HOUR_BUDGET } }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/trainingPipelines"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/trainingPipelines" | Select-Object -Expand Content
The response contains information about specifications as well as the TRAININGPIPELINE_ID.
Java
Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Java API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
Before trying this sample, follow the Node.js setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Node.js API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
Classification
Select the tab below for your language or environment:
REST
Before using any of the request data, make the following replacements:
- LOCATION: Region where dataset is located and Model is created. For example,
us-central1
. - PROJECT: Your project ID.
- TRAININGPIPELINE_DISPLAYNAME: Required. A display name for the trainingPipeline.
- DATASET_ID: The ID number for the dataset to use for training.
fractionSplit
: Optional. One of several possible ML use split options for your data. ForfractionSplit
, values must sum to 1. For example:-
{"trainingFraction": "0.7","validationFraction": "0.15","testFraction": "0.15"}
-
- MODEL_DISPLAYNAME*: A display name for the model uploaded (created) by the TrainingPipeline.
- MODEL_DESCRIPTION*: A description for the model.
- modelToUpload.labels*: Any set of key-value pairs to organize your
models. For example:
- "env": "prod"
- "tier": "backend"
- MODELTYPE†: The type of Cloud-hosted model to train. The options
are:
CLOUD
(default)
- NODE_HOUR_BUDGET†: The actual training cost will be equal or less than this value. For Cloud models the budget must be: 8,000 - 800,000 milli node hours (inclusive). The default value is 192,000 which represents one day in wall time, assuming 8 nodes are used.
- PROJECT_NUMBER: Your project's automatically generated project number
* | Schema file's description you specify in trainingTaskDefinition describes the
use of this field. |
† | Schema file you specify in trainingTaskDefinition declares and describes this
field. |
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/trainingPipelines
Request JSON body:
{ "displayName": "TRAININGPIPELINE_DISPLAYNAME", "inputDataConfig": { "datasetId": "DATASET_ID", "fractionSplit": { "trainingFraction": "DECIMAL", "validationFraction": "DECIMAL", "testFraction": "DECIMAL" } }, "modelToUpload": { "displayName": "MODEL_DISPLAYNAME", "description": "MODEL_DESCRIPTION", "labels": { "KEY": "VALUE" } }, "trainingTaskDefinition": "gs://google-cloud-aiplatform/schema/trainingjob/definition/automl_image_classification_1.0.0.yaml", "trainingTaskInputs": { "multiLabel": "true", "modelType": ["MODELTYPE"], "budgetMilliNodeHours": NODE_HOUR_BUDGET } }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/trainingPipelines"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/trainingPipelines" | Select-Object -Expand Content
The response contains information about specifications as well as the TRAININGPIPELINE_ID.
Java
Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Java API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
Before trying this sample, follow the Node.js setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Node.js API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
Control the data split using REST
You can control how your training data is split between the training,
validation, and test sets. When using the Vertex AI API, use the
Split
object to determine
your data split. The Split
object can be included in the InputConfig
object
as one of several object types, each of which provides a different way to
split the training data. You can select one method only.
-
FractionSplit
:- TRAINING_FRACTION: The fraction of the training data to be used for the training set.
- VALIDATION_FRACTION: The fraction of the training data to be used for the validation set. Not used for video data.
- TEST_FRACTION: The fraction of the training data to be used for the test set.
If any of the fractions are specified, all must be specified. The fractions must add up to 1.0. The default values for the fractions differ depending on your data type. Learn more.
"fractionSplit": { "trainingFraction": TRAINING_FRACTION, "validationFraction": VALIDATION_FRACTION, "testFraction": TEST_FRACTION },
-
FilterSplit
: - TRAINING_FILTER: Data items that match this filter are used for the training set.
- VALIDATION_FILTER: Data items that match this filter are used for the validation set. Must be "-" for video data.
- TEST_FILTER: Data items that match this filter are used for the test set.
These filters can be used with the ml_use
label,
or with any labels you apply to your data. Learn more about using
the ml-use label
and other labels
to filter your data.
The following example shows how to use the filterSplit
object with the ml_use
label, with the validation
set included:
"filterSplit": { "trainingFilter": "labels.aiplatform.googleapis.com/ml_use=training", "validationFilter": "labels.aiplatform.googleapis.com/ml_use=validation", "testFilter": "labels.aiplatform.googleapis.com/ml_use=test" }
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-09-11 UTC.