Detect Multiple Objects

The Cloud Vision API can detect and extract multiple objects in an image with Object Localization.

Object localization identifies multiple objects in an image and provides a LocalizedObjectAnnotation for each object in the image. Each LocalizedObjectAnnotation identifies information about the object, the position of the object, and rectangular bounds for the region of the image that contains the object.

Object localization identifies both significant and less-prominent objects in an image.

Object information is returned in English only. The Cloud Translation API can translate English labels into any of a number of other languages.

Sample image

For example, the API might return the following information and bounding location data for the objects in the image above:

Name mid Score Bounds
Bicycle wheel /m/01bqk0 0.89648587 (0.32076266, 0.78941387), (0.43812272, 0.78941387), (0.43812272, 0.97331065), (0.32076266, 0.97331065)
Bicycle /m/0199g 0.886761 (0.312, 0.6616471), (0.638353, 0.6616471), (0.638353, 0.9705882), (0.312, 0.9705882)
Bicycle wheel /m/01bqk0 0.6345275 (0.5125398, 0.760708), (0.6256646, 0.760708), (0.6256646, 0.94601655), (0.5125398, 0.94601655)
Picture frame /m/06z37_ 0.6207608 (0.79177403, 0.16160682), (0.97047985, 0.16160682), (0.97047985, 0.31348917), (0.79177403, 0.31348917)
Tire /m/0h9mv 0.55886006 (0.32076266, 0.78941387), (0.43812272, 0.78941387), (0.43812272, 0.97331065), (0.32076266, 0.97331065)
Door /m/02dgv 0.5160098 (0.77569866, 0.37104446), (0.9412425, 0.37104446), (0.9412425, 0.81507325), (0.77569866, 0.81507325)

mid contains a machine-generated identifier (MID) corresponding to a label's Google Knowledge Graph entry. For information on inspecting mid values, see the Google Knowledge Graph Search API documentation.

Object Localization Example

You can obtain object localization on an image by passing the image's filepath or public URL to Cloud Vision API.

Command-line

To make an object detection and localization request, make a POST request to the following endpoint:

POST https://vision.googleapis.com/v1p3beta1/images:annotate

Specify OBJECT_LOCALIZATION as the value of the features.type field.

Images can be passed in one of three ways: as a base64-encoded string; as a Google Cloud Storage URI; or as a publicly-accessible HTTPS or HTTP URL. See Making requests for more information.

See the AnnotateImageRequest reference documentation for more information on configuring the request body.

curl -X POST \
     -H "Authorization: Bearer "$(gcloud auth application-default print-access-token)  \
     -H "Content-Type: application/json; charset=utf-8" \
     --data "{
      'requests': [
        {
          'image': {
            'source': {
              'imageUri': 'https://cloud.google.com/vision/docs/images/bicycle_example.jpg'
            }
          },
          'features': [
            {
              'type': 'OBJECT_LOCALIZATION'
            }
          ]
        }
      ]
    }" "https://vision.googleapis.com/v1p3beta1/images:annotate"

Java

For more on installing and creating a Vision API client, refer to Vision API Client Libraries.

/**
 * Detects localized objects in the specified local image.
 *
 * @param filePath The path to the file to perform localized object detection on.
 * @param out A {@link PrintStream} to write detected objects to.
 * @throws Exception on errors while closing the client.
 * @throws IOException on Input/Output errors.
 */
public static void detectLocalizedObjects(String filePath, PrintStream out)
    throws Exception, IOException {
  List<AnnotateImageRequest> requests = new ArrayList<>();

  ByteString imgBytes = ByteString.readFrom(new FileInputStream(filePath));

  Image img = Image.newBuilder().setContent(imgBytes).build();
  AnnotateImageRequest request =
      AnnotateImageRequest.newBuilder()
          .addFeatures(Feature.newBuilder().setType(Type.OBJECT_LOCALIZATION))
          .setImage(img)
          .build();
  requests.add(request);

  // Perform the request
  try (ImageAnnotatorClient client = ImageAnnotatorClient.create()) {
    BatchAnnotateImagesResponse response = client.batchAnnotateImages(requests);
    List<AnnotateImageResponse> responses = response.getResponsesList();

    // Display the results
    for (AnnotateImageResponse res : responses) {
      for (LocalizedObjectAnnotation entity : res.getLocalizedObjectAnnotationsList()) {
        out.format("Object name: %s\n", entity.getName());
        out.format("Confidence: %s\n", entity.getScore());
        out.format("Normalized Vertices:\n");
        entity
            .getBoundingPoly()
            .getNormalizedVerticesList()
            .forEach(vertex -> out.format("- (%s, %s)\n", vertex.getX(), vertex.getY()));
      }
    }
  }
}

Node.js

For more on installing and creating a Vision API client, refer to Vision API Client Libraries.

// Imports the Google Cloud client libraries
const vision = require('@google-cloud/vision').v1p3beta1;
const fs = require('fs');

// Creates a client
const client = new vision.ImageAnnotatorClient();

/**
 * TODO(developer): Uncomment the following line before running the sample.
 */
// const fileName = `/path/to/localImage.png`;

const request = {
  image: {content: fs.readFileSync(fileName)},
};

client
  .objectLocalization(request)
  .then(results => {
    const objects = results[0].localizedObjectAnnotations;
    objects.forEach(object => {
      console.log(`Name: ${object.name}`);
      console.log(`Confidence: ${object.score}`);
      const veritices = object.boundingPoly.normalizedVertices;
      veritices.forEach(v => console.log(`x: ${v.x}, y:${v.y}`));
    });
  })
  .catch(err => {
    console.error('ERROR:', err);
  });

Python

Before you can run the code in this example, you must install the Beta Client Libraries and the Pillow (Python Imaging) libraries.

In your Python code, import the Beta Client Libraries and Pillow libraries as shown here:

from google.cloud import vision_v1p3beta1 as vision
from PIL import Image
from PIL import ImageDraw

The following example shows how to call the images:annotate method and specify request object localization.

def localize_objects(path):
    """Localize objects in the local image.

    Args:
    path: The path to the local file.
    """
    from google.cloud import vision_v1p3beta1 as vision
    client = vision.ImageAnnotatorClient()

    with open(path, 'rb') as image_file:
        content = image_file.read()
    image = vision.types.Image(content=content)

    objects = client.object_localization(
        image=image).localized_object_annotations

    print('Number of objects found: {}'.format(len(objects)))
    for object_ in objects:
        print('\n{} (confidence: {})'.format(object_.name, object_.score))
        print('Normalized bounding polygon vertices: ')
        for vertex in object_.bounding_poly.normalized_vertices:
            print(' - ({}, {})'.format(vertex.x, vertex.y))

You also have the option of specifying the Google Cloud Storage URI to the image you wish to obtain object localization on.

Command-line

To make an object detection and localization request, make a POST request to the following endpoint:

POST https://vision.googleapis.com/v1p3beta1/images:annotate

Specify OBJECT_LOCALIZATION as the value of the features.type field.

Images can be passed in one of three ways: as a base64-encoded string; as a Google Cloud Storage URI; or as a publicly-accessible HTTPS or HTTP URL. See Making requests for more information.

See the AnnotateImageRequest reference documentation for more information on configuring the request body.

curl -X POST \
     -H "Authorization: Bearer "$(gcloud auth application-default print-access-token)  \
     -H "Content-Type: application/json; charset=utf-8" \
     --data "{
      'requests': [
        {
          'image': {
            'source': {
              'imageUri': 'gs://YOUR_BUCKET_NAME/YOUR_FILE_NAME'
            }
          },
          'features': [
            {
              'type': 'OBJECT_LOCALIZATION'
            }
          ]
        }
      ]
    }" "https://vision.googleapis.com/v1p3beta1/images:annotate"

Java

For more on installing and creating a Vision API client, refer to Vision API Client Libraries.

/**
 * Detects localized objects in a remote image on Google Cloud Storage.
 *
 * @param gcsPath The path to the remote file on Google Cloud Storage to detect localized objects
 *     on.
 * @param out A {@link PrintStream} to write detected objects to.
 * @throws Exception on errors while closing the client.
 * @throws IOException on Input/Output errors.
 */
public static void detectLocalizedObjectsGcs(String gcsPath, PrintStream out)
    throws Exception, IOException {
  List<AnnotateImageRequest> requests = new ArrayList<>();

  ImageSource imgSource = ImageSource.newBuilder().setGcsImageUri(gcsPath).build();
  Image img = Image.newBuilder().setSource(imgSource).build();

  AnnotateImageRequest request =
      AnnotateImageRequest.newBuilder()
          .addFeatures(Feature.newBuilder().setType(Type.OBJECT_LOCALIZATION))
          .setImage(img)
          .build();
  requests.add(request);

  // Perform the request
  try (ImageAnnotatorClient client = ImageAnnotatorClient.create()) {
    BatchAnnotateImagesResponse response = client.batchAnnotateImages(requests);
    List<AnnotateImageResponse> responses = response.getResponsesList();
    client.close();
    // Display the results
    for (AnnotateImageResponse res : responses) {
      for (LocalizedObjectAnnotation entity : res.getLocalizedObjectAnnotationsList()) {
        out.format("Object name: %s\n", entity.getName());
        out.format("Confidence: %s\n", entity.getScore());
        out.format("Normalized Vertices:\n");
        entity
            .getBoundingPoly()
            .getNormalizedVerticesList()
            .forEach(vertex -> out.format("- (%s, %s)\n", vertex.getX(), vertex.getY()));
      }
    }
  }
}

Node.js

For more on installing and creating a Vision API client, refer to Vision API Client Libraries.

// Imports the Google Cloud client libraries
const vision = require('@google-cloud/vision').v1p3beta1;

// Creates a client
const client = new vision.ImageAnnotatorClient();

/**
 * TODO(developer): Uncomment the following line before running the sample.
 */
// const uri = `gs://bucket/bucketImage.png`;

client
  .objectLocalization(uri)
  .then(results => {
    const objects = results[0].localizedObjectAnnotations;
    objects.forEach(object => {
      console.log(`Name: ${object.name}`);
      console.log(`Confidence: ${object.score}`);
      const veritices = object.boundingPoly.normalizedVertices;
      veritices.forEach(v => console.log(`x: ${v.x}, y:${v.y}`));
    });
  })
  .catch(err => {
    console.error('ERROR:', err);
  });

Python

Before you can run the code in this example, you must install the Beta Client Libraries and the Pillow (Python Imaging) libraries.

In your Python code, import the Beta Client Libraries and Pillow libraries as shown here:

from google.cloud import vision_v1p3beta1 as vision
from PIL import Image
from PIL import ImageDraw

The following example shows how to call the images:annotate method and specify request object localization.

def localize_objects_uri(uri):
    """Localize objects in the image on Google Cloud Storage

    Args:
    uri: The path to the file in Google Cloud Storage (gs://...)
    """
    from google.cloud import vision_v1p3beta1 as vision
    client = vision.ImageAnnotatorClient()

    image = vision.types.Image()
    image.source.image_uri = uri

    objects = client.object_localization(
        image=image).localized_object_annotations

    print('Number of objects found: {}'.format(len(objects)))
    for object_ in objects:
        print('\n{} (confidence: {})'.format(object_.name, object_.score))
        print('Normalized bounding polygon vertices: ')
        for vertex in object_.bounding_poly.normalized_vertices:
            print(' - ({}, {})'.format(vertex.x, vertex.y))

Object Localization Response

The JSON response from an Object Localization request is in the following format:

{
  "responses": [
    {
      "localizedObjectAnnotations": [
        {
          "mid": "/m/01bqk0",
          "name": "Bicycle wheel",
          "score": 0.89648587,
          "boundingPoly": {
            "normalizedVertices": [
              {
                "x": 0.32076266,
                "y": 0.78941387
              },
              {
                "x": 0.43812272,
                "y": 0.78941387
              },
              {
                "x": 0.43812272,
                "y": 0.97331065
              },
              {
                "x": 0.32076266,
                "y": 0.97331065
              }
            ]
          }
        },
        {
          "mid": "/m/0199g",
          "name": "Bicycle",
          "score": 0.886761,
          "boundingPoly": {
            "normalizedVertices": [
              {
                "x": 0.312,
                "y": 0.6616471
              },
              {
                "x": 0.638353,
                "y": 0.6616471
              },
              {
                "x": 0.638353,
                "y": 0.9705882
              },
              {
                "x": 0.312,
                "y": 0.9705882
              }
            ]
          }
        },
        {
          "mid": "/m/01bqk0",
          "name": "Bicycle wheel",
          "score": 0.6345275,
          "boundingPoly": {
            "normalizedVertices": [
              {
                "x": 0.5125398,
                "y": 0.760708
              },
              {
                "x": 0.6256646,
                "y": 0.760708
              },
              {
                "x": 0.6256646,
                "y": 0.94601655
              },
              {
                "x": 0.5125398,
                "y": 0.94601655
              }
            ]
          }
        },
        {
          "mid": "/m/06z37_",
          "name": "Picture frame",
          "score": 0.6207608,
          "boundingPoly": {
            "normalizedVertices": [
              {
                "x": 0.79177403,
                "y": 0.16160682
              },
              {
                "x": 0.97047985,
                "y": 0.16160682
              },
              {
                "x": 0.97047985,
                "y": 0.31348917
              },
              {
                "x": 0.79177403,
                "y": 0.31348917
              }
            ]
          }
        },
        {
          "mid": "/m/0h9mv",
          "name": "Tire",
          "score": 0.55886006,
          "boundingPoly": {
            "normalizedVertices": [
              {
                "x": 0.32076266,
                "y": 0.78941387
              },
              {
                "x": 0.43812272,
                "y": 0.78941387
              },
              {
                "x": 0.43812272,
                "y": 0.97331065
              },
              {
                "x": 0.32076266,
                "y": 0.97331065
              }
            ]
          }
        },
        {
          "mid": "/m/02dgv",
          "name": "Door",
          "score": 0.5160098,
          "boundingPoly": {
            "normalizedVertices": [
              {
                "x": 0.77569866,
                "y": 0.37104446
              },
              {
                "x": 0.9412425,
                "y": 0.37104446
              },
              {
                "x": 0.9412425,
                "y": 0.81507325
              },
              {
                "x": 0.77569866,
                "y": 0.81507325
              }
            ]
          }
        }
      ]
    }
  ]
}
Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Vision API Documentation