Before you can perform custom training with a prebuilt container, you must create a Python source distribution that contains your training application and upload it to a Cloud Storage bucket that your Google Cloud project can access.
Alternatives to creating a source distribution
This guide walks through manually creating a source distribution and uploading it to Cloud Storage. Before you follow the guide, consider the following alternative workflows, which might be more convenient for some cases:
If you want to train using code on your local computer and reduce the amount of manual packaging work as much as possible, then we recommend that you use the Google Cloud CLI's autopackaging feature. This feature lets you build a Docker container image, push it to Artifact Registry, and create a
CustomJob
resource based on the container image, all with a single command. Learn more in the guide to creating aCustomJob
.To use autopackaging, you must install Docker on your local computer. This option only lets you create a
CustomJob
, not aTrainingPipeline
orHyperparameterTuningJob
resource. (Learn about the differences between custom training resources.)To further customize your container image and to run your code in a container locally before running it on Vertex AI, you can use the gcloud CLI's
local-run
command to containerize your code and run it locally. Then you can manually push the image to Artifact Registry.To use the
local-run
command, you must install Docker on your local computer.If you can write your training code in a single Python script, then you can use the Vertex AI SDK for Python's
CustomJob
class to create a custom job orCustomTrainingJob
class to create a customTrainingPipeline
. Your training code is automatically packaged as a source distribution and uploaded to Cloud Storage.For the most flexibility, you can manually create a custom container image and push it to Artifact Registry.
If none of the preceding options fit your use case, or if you prefer to manually package your training application as a source distribution, follow the rest of this guide.
Before you begin
Before preparing your training application to run in the cloud, complete the the following steps:
Develop your training application using a machine learning (ML) framework available in one of Vertex AI's prebuilt containers for training. Make sure that your training application meets the training code requirements.
If you are writing the training application from scratch, we recommend that you organize your code according to the application structure described in a following section of this document.
Create a Cloud Storage bucket in the same Google Cloud project where you plan to use Vertex AI. You will store your training application in this bucket. (While it is possible to use a bucket in a different Google Cloud bucket, this requires additional configuration outside the scope of this guide.)
For the best performance, ensure that the Cloud Storage bucket is in the location where you plan to use Vertex AI.
Know all of the Python libraries that your training application depends on, whether they're custom dependencies or freely available through PyPI.
Application structure
When you perform custom training using a prebuilt container, you must specify your training code according to the following requirements:
Provide the code as one or more Python source distributions.
If you use the Vertex AI API to start custom training, specify these in the
packageUris
field.Create a module in one of these source distributions that acts as the entrypoint for training.
If you use the Vertex AI API to start custom training, specify this in the
pythonModule
field.
As long as you meet these requirements, you can structure your training application in any way you like. However, we recommend that you build a single Python source distribution by organizing your code in the following structure (which is frequently used in Vertex AI samples):
Use a main project directory, containing your
setup.py
file. See the following section for guidance about this file's contents.Within the main project directory, create a subdirectory named
trainer/
that serves as the main package for your training code.Within
trainer/
, create a module namedtask.py
that serves as the entrypoint for your training code.To support
trainer/task.py
, create any additional Python modules that you want in thetrainer/
package, and create any additional subdirectories with that additional code that you want in the main project directory.Create an
__init__.py
file in each subdirectory to make it a package.
The rest of this guide assumes that your code is organized according to this structure.
Create a source distribution
Building Python source distributions is an expansive topic that is largely beyond the scope of this documentation. For convenience, this section provides an overview of using Setuptools to build a source distribution to use with Vertex AI. There are other libraries you can use to do the same thing.
Create a
setup.py
file that tells Setuptools how to create the source distribution. A basicsetup.py
includes the following:Import statements for
setuptools.find_packages
andsetuptools.setup
.A call to
setuptools.setup
with (at a minimum) these parameters set:name
set to the name of your source distribution.version
set to the version number of this build of your source distribution.install_requires
set to a list of dependencies that are required by your application, with version requirements, like'docutils>=0.3'
.packages
set tofind_packages()
. This tells Setuptools to include all subdirectories of the parent directory that contain an__init__.py
file as packages.include_package_data
set toTrue
.
The following example shows a basic
setup.py
file for a training application:from setuptools import find_packages from setuptools import setup setup( name='trainer', version='0.1', packages=find_packages(), include_package_data=True, description='My training application.' )
Run the following command to create a source distribution,
dist/trainer-0.1.tar.gz
:python setup.py sdist --formats=gztar
Python application dependencies
Dependencies are packages that you import
in your code. Your application may
have many dependencies that it needs to make it work.
For each replica in your custom training job, your code runs in a container with many common Python dependencies already installed. Check the dependencies included in the prebuilt container that you plan to use for training and note any of your dependencies that are not already installed. You only need to complete the following steps for dependencies that are not already installed in the prebuilt container.
There are 2 types of dependencies that you may need to add:
- Standard dependencies, which are common distribution packages available on PyPI
- Custom dependencies, such as packages that you developed yourself, or those internal to an organization.
The sections below describe the procedure for each type.
Standard (PyPI) dependencies
You can specify your application's standard dependencies as part of its
setup.py
script. Vertex AI uses pip
to install your training
application on the replicas that it allocates for your job. The
pip install
command looks for configured dependencies and installs them.
The following example shows a setup.py
similar to the one from a previous
section. However, this setup.py
tells Vertex AI to install
some_PyPI_package
when it installs the training application:
from setuptools import find_packages
from setuptools import setup
REQUIRED_PACKAGES = ['some_PyPI_package>=1.0']
setup(
name='trainer',
version='0.1',
install_requires=REQUIRED_PACKAGES,
packages=find_packages(),
include_package_data=True,
description='My training application.'
)
Custom dependencies
You can specify your application's custom dependencies by passing their paths as
part of your job configuration. You need the URI of the source distribution of
each dependency. The custom dependencies must be in a
Cloud Storage location. Vertex AI uses
pip install
to install custom dependencies, so they can have standard dependencies of
their own in their setup.py
scripts.
Each URI you include is the path to a source distribution, formatted as a
tarball (.tar.gz
) or as a wheel (.whl
). Vertex AI installs each
dependency using pip install
on each replica
that it allocates for your training job.
If you use the Vertex AI API to start custom training, specify the
Cloud Storage URIs to these dependencies along with your training
application in the packageUris
field.
Python modules
Your application can contain multiple modules (Python files). You must identify the module that contains your application entry point. The training service runs that module by invoking Python, just as you would run it locally.
For example, if you follow the recommended structure from a preceding section,
your main module is task.py
. Since it's inside an import package (directory
with an __init__.py
file) named trainer
, the fully qualified name of this
module is trainer.task
. So if you use the Vertex AI API to start custom
training, set the moduleName
field to trainer.task
.
Refer to the Python guide to packages for more information about modules.
Upload your source distribution to Cloud Storage
You can use the gcloud CLI to upload your source distribution and any custom dependencies to a Cloud Storage bucket. For example:
gcloud storage cp dist/trainer-0.1.tar.gz CLOUD_STORAGE_DIRECTORY
Replace CLOUD_STORAGE_DIRECTORY with the URI (beginning with gs://
and ending with /
) of a Cloud Storage directory in a bucket that your
Google Cloud project can access.
To learn about other ways to upload your source distribution to Cloud Storage, read Uploading objects in the Cloud Storage documentation.
What's next
- Learn about additional training code requirements for custom training.
- Learn how to create a custom training job or a custom training pipeline that uses your training application.