Hello custom training: Train a custom image classification model

This page shows you how to run a TensorFlow Keras training application on Vertex AI. This particular model trains an image classification model that can classify flowers by type.

This tutorial has several pages:

  1. Setting up your project and environment.

  2. Training a custom image classification model.

  3. Serving predictions from a custom image classification model.

  4. Cleaning up your project.

Each page assumes that you have already performed the instructions from the previous pages of the tutorial.

The rest of this document assumes that you are using the same Cloud Shell environment that you created when following the first page of this tutorial. If your original Cloud Shell session is no longer open, you can return to the environment by doing the following:

  1. In the Google Cloud console, activate Cloud Shell.

    Activate Cloud Shell

  2. In the Cloud Shell session, run the following command:

    cd hello-custom-sample
    

Run a custom training pipeline

This section describes using the training package that you uploaded to Cloud Storage to run a Vertex AI custom training pipeline.

  1. In the Google Cloud console, in the Vertex AI section, go to the Training pipelines page.

    Go to Training pipelines

  2. Click Create to open the Train new model pane.

  3. On the Choose training method step, do the following:

    1. In the Dataset drop-down list, select No managed dataset. This particular training application loads data from the TensorFlow Datasets library rather than a managed Vertex AI dataset.

    2. Ensure that Custom training (advanced) is selected.

    Click Continue.

  4. On the Model details step, in the Name field, enter hello_custom. Click Continue.

  5. On the Training container step, provide Vertex AI with information it needs to use the training package that you uploaded to Cloud Storage:

    1. Select Prebuilt container.

    2. In the Model framework drop-down list, select TensorFlow.

    3. In the Model framework version drop-down list, select 2.3.

    4. In the Package location field, enter cloud-samples-data/ai-platform/hello-custom/hello-custom-sample-v1.tar.gz.

    5. In the Python module field, enter trainer.task. trainer is the name of the Python package in your tarball, and task.py contains your training code. Therefore, trainer.task is the name of the module that you want Vertex AI to run.

    6. In the Model output directory field, click Browse. Do the following in the Select folder pane:

      1. Navigate to your Cloud Storage bucket.

      2. Click Create new folder .

      3. Name the new folder output. Then click Create.

      4. Click Select.

      Confirm that field has the value gs://BUCKET_NAME/output, where BUCKET_NAME is the name of your Cloud Storage bucket.

      This value gets passed to Vertex AI in the baseOutputDirectory API field, which sets several environment variables that your training application can access when it runs.

      For example, when you set this field to gs://BUCKET_NAME/output, Vertex AI sets the AIP_MODEL_DIR environment variable to gs://BUCKET_NAME/output/model. At the end of training, Vertex AI uses any artifacts in the AIP_MODEL_DIR directory to create a model resource.

      Learn more about the environment variables set by this field.

    Click Continue.

  6. On the optional Hyperparameters step, make sure that the Enable hyperparameter tuning checkbox is cleared. This tutorial does not use hyperparameter tuning. Click Continue.

  7. On the Compute and pricing step, allocate resources for the custom training job:

    1. In the Region drop-down list, select us-central1 (Iowa).

    2. In the Machine type drop-down list, select n1-standard-4 from the Standard section.

    Do not add any accelerators or worker pools for this tutorial. Click Continue.

  8. On the Prediction container step, provide Vertex AI with information it needs to serve predictions:

    1. Select Prebuilt container.

    2. In the Prebuilt container settings section, do the following:

      1. In the Model framework drop-down list, select TensorFlow.

      2. In the Model framework version drop-down list, select 2.3.

      3. In the Accelerator type drop-down list, select None.

      4. Confirm that Model directory field has the value gs://BUCKET_NAME/output, where BUCKET_NAME is the name of your Cloud Storage bucket. This matches the Model output directory value that you provided in a previous step.

    3. Leave the fields in the Predict schemata section blank.

  9. Click Start training to start the custom training pipeline.

You can now view your new training pipeline, which is named hello_custom, on the Training page. (You might need to refresh the page.) The training pipeline does two main things:

  1. The training pipeline creates a custom job resource named hello_custom-custom-job. After a few moments, you can view this resource on the Custom jobs page of the Training section:

    Go to Custom jobs

    The custom job runs the training application using the computing resources that you specified in this section.

  2. After the custom job completes, the training pipeline finds the artifacts that your training application creates in the output/model/ directory of your Cloud Storage bucket. It uses these artifacts to create a model resource.

Monitor training

To view training logs, do the following:

  1. In the Google Cloud console, in the Vertex AI section, go to the Custom jobs page.

    Go to Custom jobs

  2. To view details for the CustomJob that you just created, click hello_custom-custom-job in the list.

  3. On the job details page, click View logs.

View your trained model

When the custom training pipeline completes, you can find the trained model in the Google Cloud console, in the Vertex AI section, on the Models page.

Go to Models

The model has the name hello_custom.

What's next

Follow the next page of this tutorial to serve predictions from your trained ML model.