Create a Python training application for a pre-built container

Before you can perform custom training with a pre-built container, you must create a Python source distribution that contains your training application and upload it to a Cloud Storage bucket that your Google Cloud project can access.

Alternatives to creating a source distribution

This guide walks through manually creating a source distribution and uploading it to Cloud Storage. Before you follow the guide, consider the following alternative workflows, which might be more convenient for some cases:

  • If you want to train using code on your local computer and reduce the amount of manual packaging work as much as possible, then we recommend that you use the gcloud command-line tool's autopackaging feature. This feature lets you build a Docker container image, push it to Container Registry, and create a CustomJob resource based on the container image, all with a single command. Learn more in the guide to creating a CustomJob.

    To use autopackaging, you must install Docker on your local computer. This option only lets you create a CustomJob, not a TrainingPipeline or HyperparameterTuningJob resource. (Learn about the differences between custom training resources.)

  • To further customize your container image and to run your code in a container locally before running it on Vertex AI, you can use the gcloud tool's local-run command to containerize your code and run it locally. Then you can manually push the image to Artifact Registry or Container Registry.

    To use the local-run command, you must install Docker on your local computer.

  • If you can write your training code in a single Python script, then you can use the Vertex AI SDK for Python's CustomTrainingJob class to automatically package your code as a source distribution and upload it to Cloud Storage, then create a custom TrainingPipeline.

    This option only lets you create a TrainingPipeline, not a CustomJob or HyperparameterTuningJob. (Learn about the differences between custom training resources.)

  • For the most flexibility, you can manually create a custom container image and push it to Artifact Registry or Container Registry.

If none of the preceding options fit your use case, or if you prefer to manually package your training application as a source distribution, follow the rest of this guide.

Before you begin

Before preparing your training application to run in the cloud, complete the the following steps:

  1. Develop your training application using a machine learning (ML) framework available in one of Vertex AI's pre-built containers for training. Make sure that your training application meets the training code requirements.

    If you are writing the training application from scratch, we recommend that you organize your code according to the application structure described in a following section of this document.

  2. Create a Cloud Storage bucket in the same Google Cloud project where you plan to use Vertex AI. You will store your training application in this bucket. (While it is possible to use a bucket in a different Google Cloud bucket, this requires additional configuration outside the scope of this guide.)

    For the best performance, ensure that the Cloud Storage bucket is in the location where you plan to use Vertex AI.

  3. Know all of the Python libraries that your training application depends on, whether they're custom dependencies or freely available through PyPI.

Application structure

When you perform custom training using a pre-built container, you must specify your training code according to the following requirements:

  • Provide the code as one or more Python source distributions.

    If you use the Vertex AI API to start custom training, specify these in the packageUris field.

  • Create a module in one of these source distributions that acts as the entrypoint for training.

    If you use the Vertex AI API to start custom training, specify this in the moduleName field.

As long as you meet these requirements, you can structure your training application in any way you like. However, we recommend that you build a single Python source distribution by organizing your code in the following structure (which is frequently used in Vertex AI samples):

  • Use a main project directory, containing your setup.py file. See the following section for guidance about this file's contents.

  • Within the main project directory, create a subdirectory named trainer/ that serves as the main package for your training code.

  • Within trainer/, create a module named task.py that serves as the entrypoint for your training code.

  • To support trainer/task.py, create any additional Python modules that you want in the trainer/ package, and create any additional subdirectories with that additional code that you want in the main project directory.

  • Create an __init__.py file in each subdirectory to make it a package.

The rest of this guide assumes that your code is organized according to this structure.

Create a source distribution

Building Python source distributions is an expansive topic that is largely beyond the scope of this documentation. For convenience, this section provides an overview of using Setuptools to build a source distribution to use with Vertex AI. There are other libraries you can use to do the same thing.

  1. Create a setup.py file that tells Setuptools how to create the source distribution. A basic setup.py includes the following:

    • Import statements for setuptools.find_packages and setuptools.setup.

    • A call to setuptools.setup with (at a minimum) these parameters set:

      • name set to the name of your source distribution.

      • version set to the version number of this build of your source distribution.

      • install_requires set to a list of dependencies that are required by your application, with version requirements, like 'docutils>=0.3'.

      • packages set to find_packages(). This tells Setuptools to include all subdirectories of the parent directory that contain an __init__.py file as packages.

      • include_package_data set to True.

    The following example shows a basic setup.py file for a training application:

    from setuptools import find_packages
    from setuptools import setup
    
    setup(
        name='trainer',
        version='0.1',
        packages=find_packages(),
        include_package_data=True,
        description='My training application.'
    )
    
  2. Run the following command to create a source distribution, dist/trainer-0.1.tar.gz:

    python setup.py sdist --formats=gztar
    

Python application dependencies

Dependencies are packages that you import in your code. Your application may have many dependencies that it needs to make it work.

For each replica in your custom training job, your code runs in a container with many common Python dependencies already installed. Check the dependencies included in the pre-built container that you plan to use for training and note any of your dependencies that are not already installed. You only need to complete the following steps for dependencies that are not already installed in the pre-built container.

There are 2 types of dependencies that you may need to add:

  • Standard dependencies, which are common distribution packages available on PyPI
  • Custom dependencies, such as packages that you developed yourself, or those internal to an organization.

The sections below describe the procedure for each type.

Standard (PyPI) dependencies

You can specify your application's standard dependencies as part of its setup.py script. Vertex AI uses pip to install your training application on the replicas that it allocates for your job. The pip install command looks for configured dependencies and installs them.

The following example shows a setup.py similar to the one from a previous section. However, this setup.py tells Vertex AI to install some_PyPI_package when it installs the training application:

from setuptools import find_packages
from setuptools import setup

REQUIRED_PACKAGES = ['some_PyPI_package>=1.0']

setup(
    name='trainer',
    version='0.1',
    install_requires=REQUIRED_PACKAGES,
    packages=find_packages(),
    include_package_data=True,
    description='My training application.'
)

Custom dependencies

You can specify your application's custom dependencies by passing their paths as part of your job configuration. You need the URI of the source distribution of each dependency. The custom dependencies must be in a Cloud Storage location. Vertex AI uses pip install to install custom dependencies, so they can have standard dependencies of their own in their setup.py scripts.

Each URI you include is the path to a source distribution, formatted as a tarball (.tar.gz) or as a wheel (.whl). Vertex AI installs each dependency using pip install on each replica that it allocates for your training job.

If you use the Vertex AI API to start custom training, specify the Cloud Storage URIs to these dependencies along with your training application in the packageUris field.

Python modules

Your application can contain multiple modules (Python files). You must identify the module that contains your application entry point. The training service runs that module by invoking Python, just as you would run it locally.

For example, if you follow the recommended structure from a preceding section, your main module is task.py. Since it's inside an import package (directory with an __init__.py file) named trainer, the fully qualified name of this module is trainer.task. So if you use the Vertex AI API to start custom training, set the moduleName field to trainer.task.

Refer to the Python guide to packages for more information about modules.

Upload your source distribution to Cloud Storage

You can use the gsutil tool to upload your source distribution and any custom dependencies to a Cloud Storage bucket. For example:

gsutil cp dist/trainer-0.1.tar.gz CLOUD_STORAGE_DIRECTORY

Replace CLOUD_STORAGE_DIRECTORY with the URI (beginning with gs:// and ending with /) of a Cloud Storage directory in a bucket that your Google Cloud project can access.

To learn about other ways to upload your source distribution to Cloud Storage, read Uploading objects in the Cloud Storage documentation.

What's next