Crop Hints Tutorial

Audience

The goal of this tutorial is help you develop applications using the Google Cloud Vision API Crop Hints feature. It assumes you are familiar with basic programming constructs and techniques. However, even if you are a beginning programmer, you should be able to follow along and run this tutorial without difficulty, then use the Cloud Vision API reference documentation to create basic applications.

This tutorial steps through a Vision API application, showing you how to make a call to the Vision API to use its Crop Hints feature.

Prerequisites

Python

Overview

This tutorial walks you through a basic Vision API application that uses a Crop Hints request. You can provide the image to be processed either through a Google Cloud Storage URI (Cloud Storage bucket location) or embedded in the request. A successful Crop Hints response returns the coordinates for a bounding box cropped around the dominant object or face in the image.

Code listing

As you read the code, we recommend that you follow along by referring to the Cloud Vision API Python reference.

"""
Crop Hints application.
Run the script on an image to return a Cloud Storage URI to an optimized
cropped image, e.g.:
    ./crop_hints.py 
"""
import argparse
import base64
import json

from googleapiclient import discovery
from oauth2client.client import GoogleCredentials
from PIL import Image, ImageDraw

def main(image_file):
    """Run a CROP_HINTS request on a single image"""

    credentials = GoogleCredentials.get_application_default()
    service = discovery.build('vision', 'v1', credentials=credentials)

    with open(image_file, 'rb') as image:
        image_content = base64.b64encode(image.read())
        service_request = service.images().annotate(body={
            'requests': [{
                'image': {
                    'content': image_content.decode('UTF-8')
                },
                'features': [{
                    'type': 'CROP_HINTS'
                }],
                'imageContext': {
                    'cropHintsParams': {
                        'aspectRatios': [
                            1.77
                        ]
                    }
                }
            }]
        })

        response = service_request.execute()
        print json.dumps(response)

        cropToHint(image_file, response['responses'][0]
                 ['cropHintsAnnotation']['cropHints'][0]
                 ['boundingPoly']['vertices'])
        return
        drawHint(image_file, response['responses'][0]
                 ['cropHintsAnnotation']['cropHints'][0]
                 ['boundingPoly']['vertices'])

def hasOrZero(the_dict, key):
    if key in the_dict:
        return the_dict[key]
    else:
        return 0

def drawHint(image_file, v):
    """ Draws a border around the image using the hints in the vector list."""
    im = Image.open(image_file)
    draw = ImageDraw.Draw(im)

    draw.line([hasOrZero(v[0], 'x'), hasOrZero(v[0], 'y'),
              hasOrZero(v[1], 'x'), hasOrZero(v[1], 'y')], fill='red', width=3)
    draw.line([hasOrZero(v[1], 'x'), hasOrZero(v[1], 'y'),
              hasOrZero(v[2], 'x'), hasOrZero(v[2], 'y')], fill='red', width=3)
    draw.line([hasOrZero(v[2], 'x'), hasOrZero(v[2], 'y'),
              hasOrZero(v[3], 'x'), hasOrZero(v[3], 'y')], fill='red', width=3)
    draw.line([hasOrZero(v[3], 'x'), hasOrZero(v[3], 'y'),
              hasOrZero(v[0], 'x'), hasOrZero(v[0], 'y')], fill='red', width=3)

    del draw
    im.show()

def cropToHint(image_file, v):
    """ Crops the image using the hints in the vector list."""
    print json.dumps(v)
    im = Image.open(image_file)
    im2 = im.crop((hasOrZero(v[0], 'x'), hasOrZero(v[0], 'y'),
            hasOrZero(v[2], 'x') - 1, hasOrZero(v[2], 'y') - 1))
    im2.show()

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('image_file', help='The image to crop.')
    args = parser.parse_args()
    main(args.image_file)

A closer look

Importing libraries

import argparse
import base64
import json

from googleapiclient import discovery
from oauth2client.client import GoogleCredentials
from PIL import Image, ImageDraw

We import standard libraries:

  • argparse to allow the application to accept input filenames as arguments
  • base64 to encode the image data as JSON text
  • json to print the response

Other imports:

  • The discovery module within the googleapiclient library holds the directory of our API calls.
  • The GoogleCredentials module within the oauth2client.client library handles authentication to the service.
  • The Image and ImageDraw modules from the Python Imaging Library (PIL) library to draw a boundary box on the input image.

Running the application

def main(gcs_uri):
  '''Run crop hints request on a single image'''
  ...
if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('image_file', help='The image to crop.')
    args = parser.parse_args()
    main(args.image_file)

Here, we simply parse the passed-in argument that specifies the local image filename, and pass it to the main() function.

Authenticating to the API

    credentials = GoogleCredentials.get_application_default()
    service = discovery.build('vision', 'v1', credentials=credentials)

Before communicating with the Vision API service, you must authenticate your service using previously acquired credentials. Within an application, the simplest way to obtain credentials is to use Application Default Credentials (ADC). We obtain the Application Default Credentials using the get_application_default() method. By default, this method will attempt to obtain credentials from the GOOGLE_APPLICATION_CREDENTIALS environment variable, which should be set to point to your service account's JSON key file (see Set Up a Service Account for more information.)

We then build the API for our service by calling the discovery module, which builds the Vision API, providing us with its annotate() method.

Constructing and sending the request

  service_request = service.images().annotate(body={
            'requests': [{
                'image': {
                    'content': image_content.decode('UTF-8')
                },
                'features': [{
                    'type': 'CROP_HINTS'
                }],
                'imageContext': {
                    'cropHintsParams': {
                        'aspectRatios': [
                            1.77
                        ]
                    }
                }
            }]
        })
        response = service_request.execute()

Now that our Vision API service is ready, we can construct a request to the service. Requests to the Google Cloud Vision API are provided as JSON objects. See the Vision API Reference for complete information on the structure of a request.

This code snippet performs the following tasks:

  1. Constructs the JSON for a POST request to the images().annotate() method.
  2. Injects the local image file, binary 64-encoded, into the request.
  3. Indicates that our annotate method should perform CROP_HINTS.
  4. Sends the request and returns the API result returned in response.

Printing the response and displaying the CropHints bounding box

        print json.dumps(response)

        drawHint(image_file, response['responses'][0]
                 ['cropHintsAnnotation']['cropHints'][0]
                 ['boundingPoly']['vertices'])

Once the operation has been completed successfully, the API response will contain the bounding box coordinates of one or more cropHints. The drawHint method draws lines around the CropHints bounding box, then displays the image.

Running the application

To run the application, you can download this cat.jpg file (you may need to right-click the link), then pass the location where you downloaded the file on on your local machine to the tutorial application (crop_hints.py).

Here is the Python command, followed by console output, which displays the JSON cropHintsAnnotation response. This response includes the coordinates of the cropHints bounding box. We requested a crop area with a 1.77 width-to-height aspect ratio, and the returned top-left, bottom-right x,y coordinates of the crop rectangle are 0,336, 1100,967.

python crop_hints.py cat.jpeg
{
 "responses": [
  {
   "cropHintsAnnotation": {
    "cropHints": [
     {
      "boundingPoly": {
       "vertices": [
        {
         "y": 336
        },
        {
         "x": 1100,
         "y": 336
        },
        {
         "x": 1100,
         "y": 967
        },
        {
         "y": 967
        }
       ]
      },
      "confidence": 0.79999995,
      "importanceFraction": 0.69
     }
    ]
   }
  }
 ]
}

And here is the cropped image.

Congratulations! You've run the Google Cloud Vision Crop Hints API to return the optimized bounding box coordinates around the dominant object detected in the image!

Send feedback about...

Google Cloud Vision API Documentation