Packaging a Training Application

Before you can run your trainer application with Cloud Machine Learning Engine, your code and any dependencies must be placed in a Google Cloud Storage location that your Google Cloud Platform project can access. This page shows you the steps to package your code and stage it in the cloud. You can find a detailed description of Cloud ML Engine's packaging requirements on the training concepts page.

Before you begin

Packaging your trainer code is one step in the model training process. You should have completed the following steps before you move your application to the cloud:

  1. Configure your development environment.

  2. Develop your trainer application with TensorFlow.

In addition, you'll get the best results if you:

  • Know all of the Python libraries that your trainer depends on, whether custom or freely available through PyPI.

  • Test your trainer locally; training with Cloud ML Engine incurs charges to your account for the resources used.

Packaging and uploading your code and dependencies

How you get your training application to its destination Google Cloud Storage location depends on these factors:

  • Will you use the gcloud tool (recommended) or code your own solution?

  • Do you need to create your package manually?

  • Do you have additional dependencies that aren't included in the Cloud ML Engine runtime that you are using?

Gather required information

You need to gather the following information to package your trainer:

Package path
If you are using the gcloud command-line tool to package your trainer, you must include the local path to your trainer source code. Refer to the recommended trainer project structure for more details.
Job directory
This is the root directory for your job's output. It must be a Cloud Storage path to a location that your project has write access to.
Dependency paths
If you have custom dependencies you need to have the URI to the package of each one. If you use the gcloud tool to run your training job, you can specify local directories and the tool will stage them in the cloud for you. If you run training jobs using the Cloud ML Engine API directly, you must stage your dependency packages in a Cloud Storage location yourself and then use the paths to them there.
Module name
The name of your trainer's main module. This is the Python file that you run to start your trainer. This name must use the namespace notation for your package. If you use the recommended trainer project structure your module name is trainer.task.
Staging bucket
The Google Cloud Storage location where your trainer is staged so that the training service can copy it to the training instances needed to run your job. If you package and stage yourself, you just need to copy your trainer package here and remember the path for when you start the training job. If you are using the gcloud tool to package your trainer, you specify this value and the tool copies your package for you. In the gcloud tool case, you can omit this value if you specify a job directory to have the tool use the output directory for staging.

The simplest way to package your trainer and upload it along with its dependencies is to use the gcloud tool:

  • As part of your gcloud ml-engine job submit training command:

    1. Set the --package-path flag to the path to the root directory of your trainer application.

    2. Set the --module-name flag to the name of your application's main module using your package's namespace dot notation (for example, in the recommended case of your main module being .../my_application/trainer/task.py, the module name is trainer.task).

    3. Set the --staging-bucket flag to the Cloud Storage location that you want the tool to use to stage your training and dependency packages.

It can be helpful to define your configuration values as environment variables:

TRAINER_PACKAGE_PATH="/path/to/your/application/sources"
MAIN_TRAINER_MODULE="trainer.task"
PACKAGE_STAGING_PATH="gs://your/chosen/staging/path"

The example training job submission command below packages a trainer application using the environment variables just defined, as well as these, which are explained more fully in the how-to page about starting training jobs:

Job name

The name for your job, which must be unique within your project. A common approach to ensuring meaningful, unique job names is to append the current date and time to the model name. For example, in BASH:

now=$(date +"%Y%m%d_%H%M%S")
JOB_NAME="census_$now"
Job directory

The Cloud Storage location that you want to use for your training outputs. Remember to use a location in a bucket in the same region that you run the job.

JOB_DIR="gs://your/chosen/job/output/path"

Here is the example command:

gcloud ml-engine jobs submit training $JOB_NAME \
    --job-dir $JOB_DIR  \
    --package-path $TRAINER_PACKAGE_PATH \
    --module-name $MAIN_TRAINER_MODULE \
    --region us-central1 \
    -- \
    --trainer_arg_1 value_1 \
    ...
    --trainer_arg_n value_n

To use the gcloud tool to upload your existing package

If you build your package yourself, you can upload it with the gcloud tool:

  • As part of your gcloud ml-engine job submit training command:

    1. Set the --packages argument to the path to your packaged trainer application.

    2. Set the --module-name argument to the name of your application's main module using your package's namespace dot notation.

This example command shows you how to use a zipped tarball package (called trainer-0.0.1.tar.gz here) that is in the same directory where you run the command. The main function is in a module called task.py:

gcloud ml-engine jobs submit training $JOB_NAME \
    --job-dir $JOB_DIR \
    --packages trainer-0.0.1.tar.gz \
    --module-name $MAIN_TRAINER_MODULE \
    --region us-central1 \
    -- \
    --trainer_arg_1 value_1 \
    ...
    --trainer_arg_n value_n

To use the gcloud tool to use an existing package already in the cloud

If you build your package yourself and upload it to a Cloud Storage location, you can upload it with gcloud:

  • As part of your gcloud ml-engine job submit training command:

    1. Set the --packages argument to the path to your packaged trainer application.

    2. Set the --module-name argument to the name of your application's main module using your package's namespace dot notation.

This example command shows you how to use a zipped tarball package (called trainer-0.0.1.tar.gz here) that is in the same directory where you run the command. The main function is in a module called task.py:

gcloud ml-engine jobs submit $JOB_NAME \
    --job-dir $JOB_DIR \
    --packages trainer-0.0.1.tar.gz \
    --module-name $MAIN_TRAINER_MODULE \
    --region us-central1 \
    -- \
    --trainer_arg_1 value_1 \
    ...
    --trainer_arg_n value_n

Working with dependencies

Your trainer may have many dependencies (packages that you import in your code) that you need to make it work. Your training job runs on training instances (specially-configured virtual machines) that have many common Python packages already installed. Check the packages included in the runtime version that you use for training and note any of your dependencies that are not already installed.

Extra dependencies come in two types: common Python packages available on PyPI, and custom packages (such as packages that you developed yourself, or those internal to an organization). There is a different procedure for each type.

To include additional PyPI dependencies

If your trainer relies on common Python packages that aren't part of the training instance image, you can include them as dependencies of your trainer package. Cloud ML Engine uses pip to install your package, which looks for configured dependencies and installs them.

Create a file called setup.py in the root directory of your trainer application (one directory up from your trainer directory if you follow the recommended pattern). Enter the following script, inserting your own values:

from setuptools import find_packages
from setuptools import setup

REQUIRED_PACKAGES = ['some_PyPI_package>=1.0']

setup(
    name='trainer',
    version='0.1',
    install_requires=REQUIRED_PACKAGES,
    packages=find_packages(),
    include_package_data=True,
    description='My trainer application package.'
)

If you are using the gcloud command-line tool to submit your training job, it will automatically use your setup.py file to make the package. If you submit the training job without using the tool, you need to run this script yourself using the following command:

python setup.py sdist

See the section about manually packaging your trainer on this page for more information.

To include custom dependencies with your package

If your application uses libraries that aren't included in the training instance for the runtime version you are using, you can upload them along with your application package:

  • As part of your gcloud ml-engine job submit training command:

    1. Specify your training application either as a path to the directory where your source code is stored or as the path to a built package.

    2. Set the --packages argument to include each dependency in a comma-separated list.

This example training job submit command uses a path to the application's sources and includes packaged dependencies named dep1.tar.gz and dep2.whl (one each of the supported package types):

gcloud ml-engine jobs submit training $JOB_NAME \
    --staging-bucket $PACKAGE_STAGING_PATH \
    --package-path /Users/mlguy/models/faces/trainer \
    --module-name $MAIN_TRAINER_MODULE \
    --packages dep1.tar.gz,dep2.whl \
    --region us-central1 \
    -- \
    --trainer_arg_1 value_1 \
    ...
    --trainer_arg_n value_n

This example training job submit command uses a built training application package and includes packaged dependencies named dep1.tar.gz and dep2.whl (one each of the supported package types):

gcloud ml-engine jobs submit training $JOB_NAME \
    --staging-bucket $PACKAGE_STAGING_PATH \
    --module-name $MAIN_TRAINER_MODULE \
    --packages trainer-0.0.1.tar.gz,dep1.tar.gz,dep2.whl
    --region us-central1 \
    -- \
    --trainer_arg_1 value_1 \
    ...
    --trainer_arg_n value_n

Building your trainer package manually

Packaging Python code is an expansive topic that is largely beyond the scope of this documentation. For convenience, this section provides an overview of using Setuptools, though there are other libraries you can use to do the same thing.

Packaging steps

  1. In each directory of your application package, include a file named __init__.py, which may be empty or may contain code that runs when that package (any module in that directory) is imported.

  2. In the root directory of your package, include the Setuptools file named setup.py that includes:

    1. Import statements for setuptools.find_packages and setuptools.setup.

    2. A call to setuptools.setup with (at a minimum) these parameters set:

      • _name_ set to the name of your package namespace.

      • _version_ set to the version number of this build of your package.

      • _install_requires_ set to a list of packages that are required by your application, with version requirements, like ‘docutils>=0.3'.

      • _packages_ set to find_packages().

      • _include_package_data_ set to True.

  3. Run python setup.py sdist to create your package.

You can structure your training application however you like. However, the following structure is commonly used in Cloud ML Engine samples, and having your project's organization be similar to the samples can make them easier to follow.

  • Use a main project directory, containing your setup.py file.

  • Use a subdirectory named trainer to store your main application module.

  • Name your main trainer application module task.py.

  • Make whatever other subdirectories in your main project directory that you need to implement your application.

  • Create an __init__.py file in every subdirectory. These files are used by Setuptools to identify directories with code to package, and may be empty.

In the samples, the trainer directory usually contains two source files in addition to task.py: model.py and util.py. The breakdown of code is:

  • task.py contains the trainer logic that manages the job.

  • model.py contains the TensorFlow graph code—the logic of the model.

  • util.py if present, contains code to run the trainer.

Recommended structure of a training application project

If you use the gcloud tool to package your application, you don't need to create a setup.py or any __init__.py files. When you run gcloud ml-engine job submit training, you can set the --package_path argument to the path of your main project directory, or you can run the tool from that directory and omit the argument altogether.

To manually upload packages

You can upload your packages manually if you have a reason to. The most common reason is calling the Cloud ML Engine API directly to start your training job. The easiest way to manually upload your package and any custom dependencies to your Google Cloud Storage bucket is to use the gsutil tool:

gsutil cp /local/path/to/package.tar.gz  gs://bucket/path/

However, if you can use the command line for this operation, you should just use gcloud ml-engine jobs submit training to upload your packages as part of setting up a training job. If you can't use the command-line, you can use the Google Cloud Storage client library to upload programmatically.

Remember that you can still use the gcloud tool to run training with a training application package that is already in the cloud.

What's next

Send feedback about...

Cloud Machine Learning Engine (Cloud ML Engine)