Projects, models, versions, and jobs

Many terms in machine learning (ML) mean different things in different contexts. This section defines some terms as they're used in this documentation.

Projects, models, versions, and jobs

Project
Your project is your Google Cloud project. It is the logical container for your deployed models and jobs. Each project that you use to develop AI Platform Training solutions must have AI Platform Training enabled. Your Google account can have access to multiple Google Cloud projects.
Model
In ML, a model represents the solution to a problem that you're trying to solve. In other words, it's the recipe for predicting a value from data. In AI Platform Training, a model is a logical container for individual versions of that solution. For example, let's say the problem you want to solve is predicting the sale price of houses given a set of data about previous sales. You create a model in AI Platform Training called housing_prices, and you try multiple machine learning techniques to solve the problem. At each stage, you can deploy versions of that model. Each version can be completely different from the others, but you can organize them under the same model if that suits your workflow.
Trained model
A trained model includes the state of your computational model and its settings after training.
Saved model
Most machine learning frameworks can serialize the information representing your trained model, and create a file as a saved model, which you can deploy for prediction in the cloud.
Model version
A model version, or just version, is an instance of a machine learning solution stored in the AI Platform Training model service. You make a version by passing a serialized trained model (as a saved model) to the service. When you make a version, you can also provide custom code (beta) for handling predictions.
Job
You interact with the services of AI Platform Training by initiating requests and jobs. Requests are regular web API requests that return a response as quickly as possible. Jobs are long-running operations that are processed asynchronously. AI Platform Training offers training jobs and batch prediction jobs. You submit a request to start the job and get a quick response that verifies the job status. Then you can request status periodically to track your job's progress.

Packaging, staging, exporting, and deploying models

You move models and data around, especially between your local environment and Cloud Storage, and between Cloud Storage and the AI Platform Training services. This documentation uses the following terms to mean specific operations in the process.

Package
You package your training application so that the AI Platform Training training service can install it on each training instance. By packaging the application, you make it into a standard Python distribution package. When deploying custom code for prediction (beta), you also package the code for handling predictions.
Stage
You stage your training application package in a Cloud Storage bucket that your project has access to. This enables the training service to access the package and copy it to all of the training instances. You also stage a saved model trained elsewhere in a Cloud Storage bucket that your project has access to. This enables the online prediction service to access the model and deploy it. If you deploy custom code for prediction (beta), you additionally stage the custom code package in Cloud Storage so the online prediction service can access it during deployment.
Export
In the context of machine learning models, this documentation uses export to mean the process of serializing your computational model and settings to file. You use your saved model and objects for exporting.
Deploy
You deploy a model version when you create a version resource. You specify an exported model (a saved model directory) and a model resource to assign the version to, and AI Platform Training hosts the version so that you can send predictions to it. If you deploy custom code for prediction (beta), you also provide a custom code package during deployment.

What's next