Building an image search application that uses the Cloud Vision API and AutoML Vision

This tutorial explores how to use the Vision API and AutoML Vision label detection to power your image search and classification application. When combined with other Google Cloud services, these services make it easy for you to:

  • Search within images for detected objects and scenes.
  • Classify images into different categories based on detected image labels.
  • Use image labels and categories as search facets.

For a broader discussion on how to use the Vision API to enhance search applications, see An image search application that uses the Vision API and AutoML Vision.

Using label detection to make images searchable

The Vision API label detection identifies broad object sets across thousands of different object categories. Object identifications are returned as labels, where each label consists of a string value (such as "dog" or "cat") and a confidence score associated with the string. The confidence score represents the Vision API's judgment of the label's accuracy.

AutoML Vision is ideal for custom image classification with user-provided, labeled training sets. If the Vision API label detection doesn't return appropriate labels for your categorization task, we recommend training a custom image model using AutoML Vision.

By combining label detection with a search index, you can make images searchable in new ways.


The tutorial demonstrates how to use Google Cloud to build and deploy a basic image search application enhanced by the Vision API. The tutorial shows how to:

  • Create a Cloud Storage bucket to store uploaded images.
  • Configure Pub/Sub notifications, which are triggered when a new image is added to the image storage bucket.
  • Deploy services to App Engine to deliver frontend and backend services, including integration with the Vision.
  • Deploy a basic image category prediction service using AI Platform.
  • Test the image search application using sample images.


This tutorial uses billable components of Google Cloud including:

Usage exceeding the Google Cloud Free Tier is charged according to the latest pricing. Following the instructions in this tutorial is estimated to cost less than $1.00.

Before you begin

  1. Sign in to your Google Account.

    If you don't already have one, sign up for a new account.

  2. In the Cloud Console, on the project selector page, select or create a Cloud project.

    Go to the project selector page

  3. Make sure that billing is enabled for your Google Cloud project. Learn how to confirm billing is enabled for your project.

  4. Enable the Cloud Vision, Cloud Pub/Sub, and Cloud Machine Learning APIs.

    Enable the APIs

If you don't have Cloud SDK installed, use Cloud Shell to deploy the application. Cloud Shell provides a shell environment with the Cloud SDK preinstalled.

Deploying the sample image search application

The sample image search application explores a range of functionality based on label detection. The sample application is divided into two parts:

  • Base search application

    Create a simple search application using Cloud Storage buckets, Pub/Sub notifications, and App Engine. The base search application demonstrates image search using keyword and faceted search. The base application demonstrates how to use the Vision to classify images by mapping their labels to specific categories.

  • Category prediction service

    Use AI Platform to extend the base search application with a prediction model. This model uses the semantic relatedness of image labels to create different categories for classification.

Label detection has a broad understanding of objects and scenes across diverse subject matters. However, your requirements might include categories that the Vision API doesn't detect. In this case, we recommend using AutoML Vision to train a custom image model with your specific categories. You can extend the sample search application to request custom label detection by using your own model or models. This is explained under Extending the sample application to use AutoML Vision later in this tutorial.

Overview of the search application

The base search application shows how to use Pub/Sub notifications to automatically trigger Vision label detection when new images arrive in a Cloud Storage bucket. By adding detected labels to the App Engine Search API, you enable users to search for images with a keyword search. In addition to keyword search, the Search API also provides a faceted search feature, which exposes returned Vision labels as search facets.

Label detection returns the most relevant labels as determined by the service, rather than specific categories you've defined. For example, your goal might be to classify images of housing interiors using the Bathroom, Kitchen, and Bedroom categories. In addition to exposing labels as search facets, the base search application demonstrates how to use these labels to classify images into predetermined categories using fixed label-to-category mappings. This classification method involves creating a label dictionary for each category, which maps specific Vision labels to one or more desired categories. For example, consider the following label-to-category mappings for Kitchen, Bedroom, and Bathroom:

Category Label1 Label2 Label3 Label4 Label5
Kitchen kitchen dishwasher oven sink stove
Bathroom bathroom shower toilet tile plumbing
Bedroom bedroom bed pillow nightstand mattress

For a given image, if the Vision API returns the labels "bed," "mattress," and "shower" with confidence scores 0.9, 0.7, and 0.3, the search application categorizes the image under Bedroom because this category's confidence score is the highest score calculated for the returned labels.

Category Label-to-category mapping Label confidence score Total category confidence score
Bedroom 'bed' ∈ {'bedroom', 'bed', 'pillow', 'nightstand', 'mattress'} 0.9 1.6
'mattress' ∈ {'bedroom', 'bed', 'pillow', 'nightstand', 'mattress'} 0.7
Bathroom 'shower' ∈ {'bathroom', 'shower', 'toilet', bathtub', 'plumbing'} 0.3 0.3
Kitchen No matches. - -

Deploy the search application

In the following steps, you do the following:

  • Use the Node package manager (npm) to install the Polymer library required by the web frontend.
  • Create a Cloud Storage bucket to store images.
  • Create Pub/Sub topics and subscriptions for storage notifications.
  • Deploy the search application to App Engine.

Before following these steps, make sure you have completed the prerequisites in the Before you begin section.

  1. Open Cloud Shell.

    GO TO Cloud Shell

  2. Use npm to install the Polymer library, which the base search application requires.

    npm install -g polymer-cli
  3. Use npm to install Bower, which you need to install the required Polymer web components.

    npm install -g bower
  4. Define your environment variables, which you use in later commands.

    PROJECT=$(gcloud config list project --format "value(core.project)")
  5. Create a Cloud Storage regional storage bucket to store images.

    gsutil mb -c regional -l ${REGION} -p ${PROJECT} gs://${GCS_IMAGE_BUCKET}
  6. Create a Pub/Sub topic.

    gcloud pubsub topics create ${PUBSUB_TOPIC} --project=${PROJECT}
  7. Create a Pub/Sub subscription.

    gcloud pubsub subscriptions create ${PUBSUB_SUBSCRIPTION} \
        --topic=${PUBSUB_TOPIC} \
        --push-endpoint=${PUSH_ENDPOINT} \
  8. Clone the tutorial sample code from GitHub.

    cd ${HOME}
    git clone
  9. To install dependencies for the App Engine web frontend, complete the following steps:

    1. Copy the Polymer web application to the frontend directory.

      cp -rv solutions-vision-search/third_party/polymer-starter/. solutions-vision-search/gae/frontend
    2. Install the required Polymer web components.

      cd solutions-vision-search/gae/frontend
      bower install
    3. Build the Polymer web application.

      polymer build
  10. Use pip to install dependencies for the App Engine backend service.

    cd ${HOME}/solutions-vision-search
    pip install -r gae/backend/requirements.txt -t gae/backend/lib
  11. Deploy the backend and frontend services to App Engine.

    gcloud app deploy \
        gae/frontend/app.yaml \
        gae/backend/app.yaml \
        gae/dispatch.yaml \
        --project ${PROJECT}
  12. Apply a notification configuration to the Cloud Storage bucket to send notifications to the backend service when new images arrive.

    gsutil notification create -f json -t projects/${PROJECT}/topics/${PUBSUB_TOPIC} gs://${GCS_IMAGE_BUCKET}
  13. After the App Engine services are deployed, copy test images to your Cloud Storage bucket.

    gsutil -m cp -v ${HOME}/solutions-vision-search/sample-images/*.jpg gs://${GCS_IMAGE_BUCKET}
  14. Enter the following commands to open your web browser and view the image search user interface:

    gcloud config set project ${PROJECT}
    gcloud app browse -s default

    In the Search Facets pane, you can use the facet values displayed under Image Label to refine the detected labels results. The values under Mapped Category allow refinement by categories generated using fixed label-to-category mappings.

    label-to-category mappings

Deploying the category prediction service

Classifying images using fixed label-to-category mappings is straightforward, but it's often difficult to anticipate all the potential Vision labels returned for images belonging to a specific category. Sometimes the returned labels won't exactly match the values that your application anticipates, but those labels are semantically related.

Overview of the category prediction service

Consider the following use case: you want to categorize images into Bathroom, Kitchen, and Bedroom. Depending on the image contents, the Vision API might return detected labels such as "bathtub," "cooking," and "bedding." The following table illustrates how an image whose detected label is "cooking" fails to match the Kitchen category because the label-to-category mapping doesn't include this word.

Category Label-to-Category Mapping Category Match
Kitchen 'cooking' ∈ {'kitchen', 'dishwasher', 'oven', 'sink', 'stove'} No

To solve this problem, you can take a more complicated but far more effective approach. This approach uses word embeddings to calculate a similarity score between image labels and predetermined categories. To accomplish this, this tutorial uses pretrained word embeddings from GloVe to convert labels (words) into real-number vectors. These individual word vectors are used to calculate per-image and per-category vectors. The semantic similarity between an image and a category is conveyed as a relationship between vectors, in this case, their cosine similarity.

In this simplistic case, the image vector consists of a single word, "cooking," converted into a real-number vector. The category vector is the sum of the converted vector labels that make up the Kitchen category: "kitchen," "dishwasher," "oven," "sink," and "stove." The following table illustrates how the similarity score for the Kitchen category is calculated as cosine_similarity(image_vector, category_vector).

Category Label-to-Category Similarity Cosine Similarity
Kitchen cosine_similarity(['-0.32068', '0.26405', '-1.071' ...], [-0.553273, 0.34074, -3.8644 ...]) 0.772

When you associate an image with the category that has the highest cosine similarity score, each image is classified even if the returned Vision labels don't precisely match a predetermined list of category words. For more information about how image and category vectors are calculated, see An image search application that uses the Vision API and AutoML Vision.

Deploy the category prediction service

In this section, you create a TensorFlow model that turns words (labels) into word vectors from GloVe and computes the similarity between image and category vectors. After deploying the model to AI Platform, the search application updates and calls AI Platform online prediction to predict the uploaded image categories.

In Cloud Shell, complete the following steps to deploy the category prediction service:

  1. Define your environment variables, which you use in later commands.

    PROJECT=$(gcloud config list project --format "value(core.project)")
  2. Create a Cloud Storage regional storage bucket to store the prediction model.

    gsutil mb -c regional -l ${MODEL_REGION} -p ${PROJECT} gs://${GCS_MODEL_BUCKET}
  3. Create a Cloud Storage regional storage bucket for staging the training code.

    gsutil mb -c regional -l ${MODEL_REGION} -p ${PROJECT} gs://${GCS_STAGING_BUCKET}
  4. Submit the training job to AI Platform.

    cd ${HOME}/solutions-vision-search
    gcloud ml-engine jobs submit training my_job \ --module-name=trainer.task \ --package-path=categorizer/trainer \ --runtime-version=1.2 \ --project=${PROJECT} \ --region=us-central1 \ --staging-bucket=gs://${GCS_STAGING_BUCKET} -- \ --gcs_output_path=gs://${GCS_MODEL_BUCKET}

    The training job typically completes in less than 10 minutes. You can inspect the job progress by streaming the log output.

    gcloud ml-engine jobs stream-logs my_job

    When the job is complete, the logging output looks similar to the following:

    INFO  2017-12-08 14:06:19 +1100    master-replica-0    Task completed successfully.
    INFO  2017-12-08 14:10:45 +1100    service             Job completed successfully.
  5. Create the prediction model in AI Platform.

    gcloud ml-engine models create $MODEL_NAME --regions=${MODEL_REGION} --project=${PROJECT}
  6. Create a new version of the prediction model called v1.

    gcloud ml-engine versions create v1 \
        --model=$MODEL_NAME \
        --origin=gs://${GCS_MODEL_BUCKET}/model \
        --runtime-version=1.2 \
  7. Update the App Engine backend service configuration to call the category prediction model when new images are added.

    On Linux:

    sed -i "s/USE_CATEGORY_PREDICTOR: false/USE_CATEGORY_PREDICTOR: true/" gae/backend/app.yaml

    On macOS:

    sed -i '' "s/USE_CATEGORY_PREDICTOR: false/USE_CATEGORY_PREDICTOR: true/" gae/backend/app.yaml
  8. Deploy the backend service to App Engine.

    gcloud app deploy gae/backend/app.yaml --project ${PROJECT}
  9. Delete and reload the sample images to trigger Pub/Sub notifications and process the images with the category prediction service.

    gsutil -m rm gs://${GCS_IMAGE_BUCKET}/*.jpg
    gsutil -m cp -v ${HOME}/solutions-vision-search/sample-images/*.jpg gs://${GCS_IMAGE_BUCKET}
  10. Navigate to the deployed application in your web browser.

    gcloud config set project ${PROJECT}
    gcloud app browse -s default

Testing the search app

Launch the deployed application. If everything is functioning properly, the Search Facets pane lists additional facet values in the Most Similar Category section.

Most Similar Category section

The Most Similar Category section contains the predetermined categories and the number of images associated with the categories as a result of AI Platform matching.

You can view the differences between Mapped and Most Similar categories by comparing the images returned for each category value, such as animals. In this example, the category prediction service successfully categorized five images that escaped fixed-label mapping: two images in nature, two in vehicles, and one in animals. These five images returned labels that failed to precisely match the fixed-category labels, but had sufficient semantic similarity for accurate AI Platform classification.

Extending the sample application to use AutoML Vision

AutoML Vision provides a simple graphical user interface (GUI) for you to train, evaluate, improve, and deploy models based on your own data. If Vision's label detection doesn't return suitable labels for your intended application, training a custom image model with user-defined labels is recommended. After training is completed, you use the AutoML Vision API to query your model.

AutoML Vision custom label detection can be used to further extend the base search application in different ways. Some examples include:

  • Extending faceted search. You can extend faceted search by adding user-defined labels from your AutoML Vision model or models. This approach is ideal if images can have multiple labels that are relevant, rather than a specific category. To extend the sample application, you can combine custom image labels with Vision labels (for example, under a single Image Labels facet), or present the custom labels as a separate search facet.

    Presenting custom labels as a separate search facet is ideal if your AutoML Vision model returns specialized labels for a particular subject or topic area. For example, a specialized model that detects and returns "dandelions", "daisies", and "tulips" as possible labels could be separated into a "flower" search facet. In practice, this approach is similar to creating a "flower" category with custom image labels from your model as possible category values.

  • Using custom labels as image categories. AutoML Vision custom label detection can be used for specialized image classification when the Vision API labels are not appropriate. In this scenario, the image labels in your training dataset and the categories required for search are the same. Because the custom labels in your model are user-defined and anticipatable by your application, it's not necessary to map different labels to specific categories.

Cleaning up

To avoid incurring charges to your Google Cloud Platform account for the resources used in this tutorial:

After you've finished the tutorial, clean up the resources you created on Google Cloud so you won't be billed for them in the future. The following sections describe how to delete these resources.

  1. In the Cloud Console, go to the Manage resources page.

    Go to the Manage resources page

  2. In the project list, select the project you want to delete and click Delete .
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

What's next