Train a model using Vertex AI and the Python SDK

This tutorial is a start-to-finish guide that shows you how to use the Vertex AI SDK for Python to create a custom-trained model. You run code in a Jupyter Notebook that uses a Docker container to train and create the model. The tutorial is for data scientists who are new to Vertex AI and familiar with notebooks, Python, and the Machine Learning (ML) workflow.

The process starts using the Google Cloud console to create the project that contains your work. In your project, you use Vertex AI Workbench to create a Jupyter Notebook. The notebook environment is where you run code that downloads and prepares a dataset, then use the dataset to create and train a model. At the end of the tutorial, the trained model generates predictions.

The goal of this tutorial is to walk you through every step required to create predictions in less than an hour. The dataset used is relatively small so that it doesn't take very long to train your model. When you're done, you can apply what you learn to larger datasets. The larger your dataset is, the more accurate your predictions are.

Tutorial steps

  1. Prerequisites - Create your Google Cloud account and project.

  2. Create a Jupyter notebook - Create and prepare a Jupyter Notebook and its environment. You use the notebook to run code that creates your dataset, creates and trains your model, and generates your predictions.

  3. Create a dataset - Download a publicly available BigQuery dataset, then use it to create a Vertex AI tabular dataset. The dataset contains the data you use to train your model.

  4. Create a training script - Create a Python script that you pass to your training job. The script runs when the training job trains and creates your model.

  5. Train a model - Use your tabular dataset to train and deploy a model. You use the model to create your predictions.

  6. Make predictions - Use your model to create predictions. This section also walks you through deleting resources you create while running this tutorial so you don't incur unnecessary charges.

What you accomplish

This tutorial walks you through how to use the Vertex AI SDK for Python to do the following:

  • Create a Cloud Storage bucket to store a dataset
  • Preprocess data for training
  • Use the processed data to create a dataset in BigQuery
  • Use the BigQuery dataset to create a Vertex AI tabular dataset
  • Create and train a custom-trained model
  • Deploy the custom-trained model to an endpoint
  • Generate a prediction
  • Undeploy the model
  • Delete all resources created in the tutorial so you don't incur further charges

Billable resources used

This tutorial uses billable resources associated with the Vertex AI, BigQuery, and Cloud Storage Google Cloud services. If you're new to Google Cloud, you might be able to use one or more of these services at no cost. Vertex AI offers $300 in free credits to new customers, and Cloud Storage and BigQuery have free tiers. For more information, see the following:

To prevent further charges, the final step of this tutorial walks you through removing all billable Google Cloud resources you created.