Google Cloud Platform
Prediction API

Hello Prediction!

This page gives a quick example of using the Prediction API that you can set up and run in 15 minutes. After trying out this example, you can read the full documentation to learn how to use it for your own specific needs.

  1. Prerequisites
  2. Scenario
  3. Upload training data
  4. Train the system
  5. Send queries
  6. Next steps


This tutorial assumes the following:


Imagine that your company receives emails requesting help in several different languages, and you want to route the email to someone with the appropriate language skills. The problem here is to detect whether a given phrase is English, Spanish, or French.

To do this, you must create some training data to train the Prediction service. This training data consists of several text entries, each labeled "English," "Spanish," or "French." After training the system on this data, you will be able to submit arbitrary words or phrases in any of those languages, and the Prediction service will categorize your data as being closest to one of them.

The high-level steps are as follows:

  1. Upload training data. You'll begin by downloading an example training data file that includes English, Spanish, and French language examples, and then uploading the file to your Google Cloud Storage account.
  2. Train the system. Next, you'll tell the Prediction API to load your training data from Google Cloud Storage and analyze the data. You'll also learn how to check the status of a training session.
  3. Send queries. After the training process is finished, you can send queries containing phrases in English, Spanish, or French, and the Prediction service will respond with its best guesses for the language of each phrase.

Upload training data

Begin by uploading an example training data file to Google Cloud Storage:

  1. Download the training data file (language_id.txt).

    This file contains English, French, and Spanish training data entries. The training data comprises a set of text snippets formatted as CSV in two columns. The first column is the language of the snippet, and the second column is the content.

  2. In the Google Developers Console, go to the Cloud Storage browser.
  3. Click Create bucket to create a new bucket, or select an existing bucket.
  4. On your bucket page, click Upload files and upload language_id.txt.

Train the system

Next, train the system against the example training data by using the Prediction API. In this tutorial, you'll use the Google APIs Explorer to make API calls. In your own applications, you should use one of the Prediction API client libraries.

To train the system:

  1. In your web browser, open the API Explorer page for the Prediction API.
  2. Set Authorize requests using OAuth 2.0. to On
  3. Select the trainedmodels.insert method.
  4. Set project to be your Developers Console project ID.
  5. Click the Request body box. A dropdown menu appears.
  6. Select and set the following properties:
    • id: An identifier for your training model, to be used for training and query requests. The identifier must be from 1 to 255 characters long, and can be any mix of lowercase letters (a-z), digits (0-9), and dashes and underscores (_-). For example: languageidentifier
    • storageDataLocation: The Cloud Storage path to your training file (<your_bucket>/language_id.txt).
  7. Click Execute to call the method and start training your model. The request and response appear at the bottom of the page.

Training is asynchronous: while the insert request will receive an immediate response, you'll still need to query the Prediction service to check the progress of the training session. In this tutorial, the example training data set is small, so training should take less than a minute.

To check training progress:

  1. On the API Explorer page for the Prediction API, select the trainedmodels.get method.
  2. Set project to your project ID.
  3. Set data to the training model ID you used in your insert request.
  4. Click Execute to call the method. The response appears at the bottom of the page.

Examine the response for the trainingStatus property. The call will return an HTTP 200 OK while training is in progress, with a trainingStatus value of "RUNNING". When the call returns a trainedmodels resource with a trainingStatus value of "DONE", training is finished, and you can start sending queries.

Here's an example response:

  "kind": "prediction#training",
  "id": "languageidentifier",
  "storageDataLocation": "mybucket/language_id.txt",
  "selfLink": "<your_project>/trainedmodels/languageidentifier",
  "created": "2013-04-10T21:54:08.840Z",
  "trainingComplete": "2013-04-10T21:54:11.504Z",
  "modelInfo": {
    "numberInstances": "420",
    "modelType": "classification",
    "numberLabels": "3",
    "classificationAccuracy": "0.95"
  "trainingStatus": "DONE"

Send queries

Now you're ready to send queries to your model. Queries are always in the format of a single row of training data, minus the first column. As described earlier, your training data had two columns: a language label, and a phrase in that language. As such, your query should consist of a phrase in a language that you want to identify. When you submit the query, the Prediction service replies with its best guess at the language of your phrase.

To send a query:

  1. Select the trainedmodels.predict method.
  2. Set project to be your project ID.
  3. Set id to be the trained model ID you created earlier.
  4. Click the Request body box and select input from the dropdown menu.
  5. Click the brackets under input and select csvInstance from the dropdown menu.
  6. Click the plus sign, and then enter a text string in English, French, or Spanish—for example, Muy Bueno.

  7. Click Execute to call the method.
  8. In the Response section, examine the response for the outputLabel property. This contains the API's best guess for the language of the string.

The reponse for "Muy Bueno" should look similar to the following:

 "kind": "prediction#output",
 "id": "languageidentifier",
 "selfLink": "<your_project>/trainedmodels/languageidentifier/predict",
 "outputLabel": "Spanish",
 "outputMulti": [
   "label": "English",
   "score": "0.000000"
   "label": "French",
   "score": "0.000000"
   "label": "Spanish",
   "score": "1.000000"

All score values are relative to each other. The scores of all available labels can be found under outputMulti, and the label with the highest score—here, Spanish—is available as outputLabel. You can read more about the scoring algorithm in the predict method property description for outputMulti[].score.

Next steps

Learn more about the Google Prediction API: