Packaging a Training Application

Before you can run your training application with Cloud Machine Learning Engine, you must upload your code and any dependencies into a Cloud Storage bucket that your Google Cloud Platform project can access. This page shows you how to package and stage your application in the cloud.

You'll get the best results if you test your training application locally before uploading it to the cloud. Training with Cloud ML Engine incurs charges to your account for the resources used.

Before you begin

Before you can move your training application to the cloud, you must complete the following steps:

  1. Configure your development environment, as described in the getting-started guide.
  2. Follow the guide to setting up a Cloud Storage bucket where you can store your training application's data and files.

  3. Know all of the Python libraries that your training application depends on, whether they're custom packages or freely available through PyPI.

This document discusses the following factors that influence how you package your application and upload it to Cloud Storage:

  • Using the gcloud tool (recommended) or coding your own solution.
  • Building your package manually if necessary.
  • How to include additional dependencies that aren't installed by the Cloud ML Engine runtime that you're using.

    The simplest way to package your application and upload it along with its dependencies is to use the gcloud tool. You use the same command (gcloud ml-engine jobs submit training) to package and upload the application and to submit your first training job.

    For convenience, it's useful to define your configuration values as environment variables. The following variables contain values used for staging your application package:

    TRAINER_PACKAGE_PATH="/path/to/your/application/sources"
    MAIN_TRAINER_MODULE="trainer.task"
    PACKAGE_STAGING_PATH="gs://your/chosen/staging/path"
    

    In addition, the following variables define values used when running the job:

    now=$(date +"%Y%m%d_%H%M%S")
    JOB_NAME="your_name_$now"
    JOB_DIR="gs://your/chosen/job/output/path"
    REGION="us-east1"
    

    The following example shows a gcloud ml-engine jobs submit training command that packages an application and submits the training job:

    gcloud ml-engine jobs submit training $JOB_NAME \
        --staging-bucket $PACKAGE_STAGING_PATH \
        --job-dir $JOB_DIR  \
        --package-path $TRAINER_PACKAGE_PATH \
        --module-name $MAIN_TRAINER_MODULE \
        --region $REGION \
        -- \
        --user_first_arg=first_arg_value \
        --user_second_arg=second_arg_value
    
    • --staging-bucket specifies the Cloud Storage location where you want to stage your training and dependency packages. Your GCP project must have access to this Cloud Storage bucket, and the bucket should be in the same region that you run the job. See the available regions for Cloud ML Engine services. If you don't specify a staging bucket, Cloud ML Engine stages your packages in the location specified in the job-dir parameter.

    • --job-dir specifies the Cloud Storage location that you want to use for your training job's output files. Your GCP project must have access to this Cloud Storage bucket, and the bucket should be in the same region that you run the job. See the available regions for Cloud ML Engine services.

    • --package-path specifies the local path to the root directory of your application. Refer to the recommended project structure.

    • --module-name specifies the name of your application's main module, using your package's namespace dot notation. This is the Python file that you run to start your application. For example, if your main module is .../my_application/trainer/task.py (see the recommended project structure), then the module name is trainer.task.

    • If you specify an option both in your configuration file (config.yaml) and as a command-line flag, the value on the command line overrides the value in the configuration file.
    • The empty -- flag marks the end of the gcloud specific flags and the start of the USER_ARGS that you want to pass to your application.
    • Flags specific to Cloud ML Engine, such as --module-name, --runtime-version, and --job-dir, must come before the empty -- flag. The Cloud ML Engine service interprets these flags.
    • The --job-dir flag, if specified, must come before the empty -- flag, because Cloud ML Engine uses the --job-dir to validate the path.
    • Your application must handle the --job-dir flag too, if specified. Even though the flag comes before the empty --, the --job-dir is also passed to your application as a command-line flag.
    • You can define as many USER_ARGS as you need. Cloud ML Engine passes --user_first_arg, --user_second_arg, and so on, through to your application.

    You can find out more about the job-submission flags in the guide to running a training job

    Working with dependencies

    Dependencies are packages that you import in your code. Your application may have many dependencies that it needs to make it work.

    When you run a training job on Cloud ML Engine, the job runs on training instances (specially-configured virtual machines) that have many common Python packages already installed. Check the packages included in the runtime version that you use for training, and note any of your dependencies that are not already installed.

    There are 2 types of dependencies that you may need to add:

    • Standard dependencies, which are common Python packages available on PyPI
    • Custom packages,such as packages that you developed yourself, or those internal to an organization.

    The sections below describe the procedure for each type.

    Adding standard (PyPI) dependencies

    You can specify your package's standard dependencies as part of its setup.py script. Cloud ML Engine uses pip to install your package on the training instances that it allocates for your job. The pip install command looks for configured dependencies and installs them.

    Create a file called setup.py in the root directory of your application (one directory up from your trainer directory if you follow the recommended pattern).

    Enter the following script in setup.py, inserting your own values:

    from setuptools import find_packages
    from setuptools import setup
    
    REQUIRED_PACKAGES = ['some_PyPI_package>=1.0']
    
    setup(
        name='trainer',
        version='0.1',
        install_requires=REQUIRED_PACKAGES,
        packages=find_packages(),
        include_package_data=True,
        description='My training application package.'
    )
    

    If you are using the gcloud command-line tool to submit your training job, it automatically uses your setup.py file to make the package.

    If you submit the training job without using gcloud, use the following command to run the script:

    python setup.py sdist
    

    For more information, see the section about manually packaging your training application.

    Adding custom dependencies

    You can specify your application's custom dependencies by passing their paths as part of your job configuration. You need the URI to the package of each dependency. The custom dependencies must be in a Cloud Storage location. Cloud ML Engine uses pip install to install custom dependencies, so they can have standard dependencies of their own in their setup.py scripts.

    If you use the gcloud tool to run your training job, you can specify local directories and the tool will stage them in the cloud for you. Run the gcloud ml-engine jobs submit training command:

    • Set the package-path flag to specify your training application either as a path to the directory where your source code is stored or as the path to a built package.

    • Set the --packages flag to include the dependencies in a comma-separated list.

    Each URI you include is the path to a package, formatted as a tarball (*.tar.gz) or as a wheel. Cloud ML Engine installs each package using pip install on every virtual machine it allocates for your training job.

    The example below specifies packaged dependencies named dep1.tar.gz and dep2.whl (one each of the supported package types) along with a path to the application's sources:

    gcloud ml-engine jobs submit training $JOB_NAME \
        --staging-bucket $PACKAGE_STAGING_PATH \
        --package-path /Users/mlguy/models/faces/trainer \
        --module-name $MAIN_TRAINER_MODULE \
        --packages dep1.tar.gz,dep2.whl \
        --region us-central1 \
        -- \
        --user_first_arg=first_arg_value \
        --user_second_arg=second_arg_value
    

    Similarly, the example below specifies packaged dependencies named dep1.tar.gz and dep2.whl (one each of the supported package types), but with a built training application:

    gcloud ml-engine jobs submit training $JOB_NAME \
        --staging-bucket $PACKAGE_STAGING_PATH \
        --module-name $MAIN_TRAINER_MODULE \
        --packages trainer-0.0.1.tar.gz,dep1.tar.gz,dep2.whl
        --region us-central1 \
        -- \
        --user_first_arg=first_arg_value \
        --user_second_arg=second_arg_value
    

    If you run training jobs using the Cloud ML Engine API directly, you must stage your dependency packages in a Cloud Storage location yourself and then use the paths to the packages in that location.

    Building your package manually

    Packaging Python code is an expansive topic that is largely beyond the scope of this documentation. For convenience, this section provides an overview of using Setuptools to build your package. There are other libraries you can use to do the same thing.

    Follow these steps to build your package manually:

    1. In each directory of your application package, include a file named __init__.py, which may be empty or may contain code that runs when that package (any module in that directory) is imported.

    2. In the root directory of your package (one directory up from your trainer directory if you follow the recommended pattern), include the Setuptools file named setup.py that includes:

      • Import statements for setuptools.find_packages and setuptools.setup.

      • A call to setuptools.setup with (at a minimum) these parameters set:

        • _name_ set to the name of your package namespace.

        • _version_ set to the version number of this build of your package.

        • _install_requires_ set to a list of packages that are required by your application, with version requirements, like ‘docutils>=0.3'.

        • _packages_ set to find_packages().

        • _include_package_data_ set to True.

    3. Run python setup.py sdist to create your package.

    Recommended project structure

    You can structure your training application in any way you like. However, the following structure is commonly used in Cloud ML Engine samples, and having your project's organization be similar to the samples can make it easier for you to follow the samples.

    • Use a main project directory, containing your setup.py file.

    • Use a subdirectory named trainer to store your main application module.

    • Name your main application module task.py.

    • Create whatever other subdirectories in your main project directory that you need to implement your application.

    • Create an __init__.py file in every subdirectory. These files are used by Setuptools to identify directories with code to package, and may be empty.

    In the Cloud ML Engine samples, the trainer directory usually contains the following source files:

    • task.py contains the application logic that manages the training job.

    • model.py contains the TensorFlow graph code—the logic of the model.

    • util.py if present, contains code to run the training application.

    Recommended structure of a training application project

    If you use the gcloud tool to package your application, you don't need to create a setup.py or any __init__.py files. When you run gcloud ml-engine jobs submit training, you can set the --package_path flag to the path of your main project directory, or you can run the tool from that directory and omit the flag altogether.

    Python modules

    Your application package can contain multiple modules (Python files). You must identify the module that contains your application entry point. The training service runs that module by invoking Python, just as you would run it locally.

    When you make your application into a Python package, you create a namespace. For example, if you create a package named trainer, and your main module is called task.py, you specify that package with the name trainer.task. So, when running gcloud ml-engine jobs submit training, set the --module-name flag to trainer.task.

    Refer to the Python guide to packages for more information about modules.

    Using the gcloud tool to upload an existing package

    If you build your package yourself, you can upload it with the gcloud tool. Run the gcloud ml-engine jobs submit training command:

    • Set the --packages flag to the path to your packaged application.

    • Set the --module-name flag to the name of your application's main module, using your package's namespace dot notation. This is the Python file that you run to start your application. For example, if your main module is .../my_application/trainer/task.py (see the recommended project structure), then the module name is trainer.task.

    The example below shows you how to use a zipped tarball package (called trainer-0.0.1.tar.gz here) that is in the same directory where you run the command. The main function is in a module called task.py:

    gcloud ml-engine jobs submit training $JOB_NAME \
        --staging-bucket $PACKAGE_STAGING_PATH \
        --job-dir $JOB_DIR \
        --packages trainer-0.0.1.tar.gz \
        --module-name $MAIN_TRAINER_MODULE \
        --region us-central1 \
        -- \
        --user_first_arg=first_arg_value \
        --user_second_arg=second_arg_value
    

    Using the gcloud tool to use an existing package already in the cloud

    If you build your package yourself and upload it to a Cloud Storage location, you can upload it with gcloud. Run the gcloud ml-engine jobs submit training command:

    • Set the --packages flag to the path to your packaged application.

    • Set the --module-name flag to the name of your application's main module, using your package's namespace dot notation. This is the Python file that you run to start your application. For example, if your main module is .../my_application/trainer/task.py (see the recommended project structure), then the module name is trainer.task.

    The example below shows you how to use a zipped tarball package that is in a Cloud Storage bucket:

    gcloud ml-engine jobs submit training $JOB_NAME \
        --job-dir $JOB_DIR \
        --packages $PATH_TO_PACKAGED_TRAINER \
        --module-name $MAIN_TRAINER_MODULE \
        --region us-central1 \
        -- \
        --user_first_arg=first_arg_value \
        --user_second_arg=second_arg_value
    

    Where $PATH_TO_PACKAGED_TRAINER is an environment variable that represents the path to an existing package already in the cloud. For example, the path could point to the following Cloud Storage location, containing a zipped tarball package called trainer-0.0.1.tar.gz:

    PATH_TO_PACKAGED_TRAINER=gs://$CLOUD_STORAGE_BUCKET_NAME/trainer-0.0.0.tar.gz
    

    Uploading packages manually

    You can upload your packages manually if you have a reason to. The most common reason is that you want to call the Cloud ML Engine API directly to start your training job. The easiest way to manually upload your package and any custom dependencies to your Cloud Storage bucket is to use the gsutil tool:

    gsutil cp /local/path/to/package.tar.gz  gs://bucket/path/
    

    However, if you can use the command line for this operation, you should just use gcloud ml-engine jobs submit training to upload your packages as part of setting up a training job. If you can't use the command line, you can use the Cloud Storage client library to upload programmatically.

    What's next

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud ML Engine for TensorFlow