AI Platform Training uses images to configure the VMs that service your training and prediction requests in the cloud. These images contain the base operating system, core technology packages, pip packages (Python libraries), and operating system packages. Images are upgraded periodically to include new improvements and features. AI Platform Training versioning enables you to select the right configuration to work with your model.
Important notes about versioning
- You should always test your training jobs and models thoroughly when switching to a new runtime version, regardless of whether it's a major or minor update.
AI Platform Training supports each runtime version for 12 months after its release. After the 12-month period, you can no longer create training jobs, batch prediction jobs, or model versions that use the runtime version.
Twenty-four months after the release of the runtime version, AI Platform Prediction deletes all model versions that use the runtime version.
Learn more about the timeline of availability for runtime versions.
Understanding version numbers
The images that AI Platform Training uses correspond to the AI Platform Training runtime version. The runtime version uses the following format:
major_version.minor_version
Major and minor versions
New major and minor versions are created periodically to incorporate one or more of the following:
- Releases for:
- Operating system
- Supported machine learning frameworks
- Changes or updates to AI Platform Training functionality.
A new major version may include breaking changes that require updates to code written against previous versions. A new minor version should not include breaking changes, and should be backward-compatible with all variations of the same major version.
Selecting runtime versions
Make sure to select the runtime version that supports the latest versions of your machine learning framework and other packages you are using.
The earliest AI Platform Training runtime version that provides support for scikit-learn and XGBoost is version 1.13.
You can see the details of each version in the AI Platform Training version list.
Setting the runtime version
Make sure to set the runtime version when you submit a training job request:
gcloud
Use the --runtime-version
flag when you run the
gcloud ai-platform jobs submit training
command.
gcloud ai-platform jobs submit training my_job \ --module-name trainer.task \ --job-dir gs://my/training/job/directory \ --package-path /path/to/my/project/trainer \ --region us-central1 \ --runtime-version 2.11 \ --python-version 3.7
Python
Set the runtimeVersion
when you define your training job request:
training_inputs = {'scaleTier': 'BASIC', 'packageUris': ['gs://my/trainer/path/package-0.0.0.tar.gz'], 'pythonModule': 'trainer.task' 'args': ['--arg1', 'value1', '--arg2', 'value2'], 'region': 'us-central1', 'jobDir': 'gs://my/training/job/directory', 'runtimeVersion': '2.11', 'pythonVersion': '3.7'} job_spec = {'jobId': my_job_name, 'trainingInput': training_inputs}
See more details about submitting a training job in the TrainingInput API.
Setting the Python version
Python 3.7 is available in runtime version 1.15 and later.
Older Python versions are available for certain runtime versions:
Python 3.5 is available when you use AI Platform Training runtime version 1.13 through 1.14.
Python 2.7 is available in runtime versions 1.15 and earlier.
The following example shows how to specify Python 3.7 for training. You can specify Python 3.5 or Python 2.7 in a similar way.
gcloud
To use Python 3.7 for training, specify --python-version 3.7
and use
runtime version 1.15 or later:
gcloud ai-platform jobs submit training my_job \ --module-name trainer.task \ --job-dir gs://my/training/job/directory \ --package-path /path/to/my/project/trainer \ --python-version 3.7 \ --region us-central1 \ --runtime-version 2.11
Python
To use Python 3.7 for training, set the runtimeVersion
to version '1.15'
or later and set pythonVersion
to '3.7'
:
training_inputs = {'scaleTier': 'BASIC', 'packageUris': ['gs://my/trainer/path/package-0.0.0.tar.gz'], 'pythonModule': 'trainer.task' 'args': ['--arg1', 'value1', '--arg2', 'value2'], 'region': 'us-central1', 'jobDir': 'gs://my/training/job/directory', 'runtimeVersion': '2.11', 'pythonVersion': '3.7'} job_spec = {'jobId': my_job_name, 'trainingInput': training_inputs}
See more details about submitting a training job in the TrainingInput API.
Using custom packages
There are three ways for you to change the packages on your training instances:
- building a custom container that pre-installs your dependencies on an image
- specifying PyPI packages as dependencies to your trainer package
- manually uploading package files (tarballs) and including their paths as training input
Building a custom container
Instead of using a runtime version, you can build a Docker container to include your dependencies. Learn more about how to use custom containers.
# Specifies base image and tag FROM image:tag WORKDIR /root # Installs additional packages RUN pip install pkg1 pkg2 pkg3 # Downloads training data RUN curl https://example-url/path-to-data/data-filename --output /root/data-filename # Copies the trainer code to the docker image. COPY your-path-to/model.py /root/model.py COPY your-path-to/task.py /root/task.py # Sets up the entry point to invoke the trainer. ENTRYPOINT ["python", "task.py"]
Including PyPI package dependencies
You can specify PyPI packages and their versions as dependencies to your trainer package using the normal setup tools process:
- In the top-level directory of your trainer application, include a
setup.py
file. When you call
setuptools.setup
in setup.py, pass a list of dependencies and optionally their versions as theinstall_requires
parameter. This examplesetup.py
file demonstrates the procedure:from setuptools import find_packages from setuptools import setup REQUIRED_PACKAGES = ['some_PyPI_package>=1.5', 'another_package==2.6'] setup( name='trainer', version='0.1', install_requires=REQUIRED_PACKAGES, packages=find_packages(), include_package_data=True, description='Generic example trainer package with dependencies.')
AI Platform Training forces reinstallation of packages, so you can override packages that are part of the runtime version's image with newer or older versions.
Uploading your own package files
You can include extra package files as part of your training job request. You
upload the packages to Cloud Storage and specify a list of packages to
be installed on each training instance. AI Platform Training installs
all packages with pip
. Packages designed for other package managers are not
supported.
gcloud
Use the --packages
flag when you run the
gcloud ai-platform jobs submit training
command. Set the value to a comma-separated list of the paths to all
additional packages. Note that the list can contain no whitespace between
entries.
gcloud ai-platform jobs submit training my_job \ --staging-bucket gs://my-bucket \ --package-path /path/to/my/project/trainer \ --module-name trainer.task \ --runtime-version 2.11 \ --python-version 3.7 \ --packages dep1.tar.gz,dep2.whl
Python
Add all additional packages to the list you use for the value of
packageUris
in the
TrainingInput
object.
training_inputs = {'scaleTier': 'BASIC', 'packageUris': ['gs://my/trainer/path/package-0.0.0.tar.gz', 'gs://my/dependencies/path/dep1.tar.gz', 'gs://my/dependencies/path/dep2.whl'], 'pythonModule': 'trainer.task' 'args': ['--arg1', 'value1', '--arg2', 'value2'], 'region': 'us-central1', 'jobDir': 'gs://my/training/job/directory', 'runtimeVersion': '2.11', 'pythonVersion': '3.7'} job_spec = {'jobId': my_job_name, 'trainingInput': training_inputs}
Specifying custom versions of TensorFlow for training
Using a more recent version of TensorFlow than the latest supported runtime version on AI Platform Training is possible for training, but not for prediction.
To use a version of TensorFlow that is not yet supported as a full AI Platform Training runtime version, include it as a custom dependency for your trainer using one of the following approaches:
Specify the TensorFlow version in your
setup.py
file as a PyPI dependency. Include it in your list of required packages as follows:REQUIRED_PACKAGES = ['tensorflow>=2.11']
Build a TensorFlow binary from sources, making sure to follow the instructions for TensorFlow with CPU support only. This process yields a pip package (.whl file) that you can include in your training job request by adding it to your list of packages.
Building a TensorFlow binary to include as a custom package is a more complex approach, but the advantage is that you can use the most recent TensorFlow updates when training your model.
What's next
- Review the list of supported runtime versions.