Label Detection Tutorial

Audience

This tutorial is designed to let you quickly start exploring and developing applications with the Cloud Video Intelligence API. It is designed for people familiar with basic programming, though even without much programming knowledge, you should be able to follow along. Having walked through this tutorial, you should be able to use the Reference documentation to create your own basic applications.

This tutorial steps through a Video Intelligence API application using Python code. The purpose here is not to explain the Python client libraries, but to explain how to make calls to the Video Intelligence API. Applications in Java and Node.js are essentially similar.

If you're looking for a code-only example or an example in another language, check out the companion how-to guide.

Prerequisites

This tutorial has several prerequisites:

Annotating a video using label detection

This tutorial walks you through a basic Video API application, using a LABEL_DETECTION request. A LABEL_DETECTION request annotates a video with labels (or "tags") that are selected based on the image content. For example, a video of a train at a crossing may produce labels such as "train", "transportation", "railroad crossing", etc.

We'll show the entire code first. Note that we have removed most comments from this code in order to show you how brief it is. We'll provide more comments as we walk through the code.

import argparse

from google.cloud import videointelligence


def analyze_labels(path):
    """ Detects labels given a GCS path. """
    video_client = videointelligence.VideoIntelligenceServiceClient()
    features = [videointelligence.enums.Feature.LABEL_DETECTION]
    operation = video_client.annotate_video(path, features=features)
    print('\nProcessing video for label annotations:')

    result = operation.result(timeout=90)
    print('\nFinished processing.')

    segment_labels = result.annotation_results[0].segment_label_annotations
    for i, segment_label in enumerate(segment_labels):
        print('Video label description: {}'.format(
            segment_label.entity.description))
        for category_entity in segment_label.category_entities:
            print('\tLabel category description: {}'.format(
                category_entity.description))

        for i, segment in enumerate(segment_label.segments):
            start_time = (segment.segment.start_time_offset.seconds +
                          segment.segment.start_time_offset.nanos / 1e9)
            end_time = (segment.segment.end_time_offset.seconds +
                        segment.segment.end_time_offset.nanos / 1e9)
            positions = '{}s to {}s'.format(start_time, end_time)
            confidence = segment.confidence
            print('\tSegment {}: {}'.format(i, positions))
            print('\tConfidence: {}'.format(confidence))
        print('\n')


if __name__ == '__main__':
    parser = argparse.ArgumentParser(
        description=__doc__,
        formatter_class=argparse.RawDescriptionHelpFormatter)
    parser.add_argument('path', help='GCS file path for label detection.')
    args = parser.parse_args()

    analyze_labels(args.path)

This simple application performs the following tasks:

  • Imports the libraries necessary to run the application
  • Takes a video file stored in Google Cloud Storage URI as an argument and passes it to the main() function
  • Gets credentials to run the Video Intelligence API service
  • Creates a video annotation request to send to the video service
  • Sends the request and returns a long running operation
  • Loops over the long running operation until the video is processed and return values are available
  • Parses the response for the service and displays it to the user

We'll go over these steps in more detail below.

Importing libraries

import argparse

from google.cloud import videointelligence

We import some standard libraries: argparse to allow the application to accept input filenames as arguments and sys for formatting output while waiting for API responses. We also import time to run some simple wait loops.

For using the Cloud Video Intelligence API, we'll also want to import the google.cloud.videointelligence_v1 and its enumeration class, which holds the directory of our API calls.

Running the application

parser = argparse.ArgumentParser(
    description=__doc__,
    formatter_class=argparse.RawDescriptionHelpFormatter)
parser.add_argument('path', help='GCS file path for label detection.')
args = parser.parse_args()

analyze_labels(args.path)

Here, we simply parse the passed argument for the Google Cloud Storage URI of the video filename and pass it to the main() function.

Authenticating to the API

Before communicating with the Video Intelligence API service, you need to authenticate your service using previously acquired credentials. Within an application, the simplest way to obtain credentials is to use Application Default Credentials (ADC). By default, ADC will attempt to obtain credentials from the GOOGLE_APPLICATION_CREDENTIALS environment file, which should be set to point to your service account's JSON key file. (You should have set up your service account and environment to use ADC in the Quickstart.

Constructing the request

video_client = videointelligence.VideoIntelligenceServiceClient()
features = [videointelligence.enums.Feature.LABEL_DETECTION]
operation = video_client.annotate_video(path, features=features)

Now that our Video Intelligence API service is ready, we can construct a request to that service. Requests to the Cloud Video Intelligence API are provided as JSON objects. See the Video Intelligence API Reference for complete information on the specific structure of such a request.

This code snippet performs the following tasks:

  1. Constructs the JSON for a POST request to the annotate_video() method.
  2. Injects the Google Cloud Storage location of our passed video filename into the request.
  3. Indicates that the annotate method should perform LABEL_DETECTION.

Checking the operation

result = operation.result(timeout=90)
print('\nFinished processing.')

Using the existing operation request for our existing operation, we construct a while loop to periodically check the state of that operation. Once our operation has indicated that the operation is done, we break out of the loop and can parse the response.

Parsing the response

segment_labels = result.annotation_results[0].segment_label_annotations
for i, segment_label in enumerate(segment_labels):
    print('Video label description: {}'.format(
        segment_label.entity.description))
    for category_entity in segment_label.category_entities:
        print('\tLabel category description: {}'.format(
            category_entity.description))

    for i, segment in enumerate(segment_label.segments):
        start_time = (segment.segment.start_time_offset.seconds +
                      segment.segment.start_time_offset.nanos / 1e9)
        end_time = (segment.segment.end_time_offset.seconds +
                    segment.segment.end_time_offset.nanos / 1e9)
        positions = '{}s to {}s'.format(start_time, end_time)
        confidence = segment.confidence
        print('\tSegment {}: {}'.format(i, positions))
        print('\tConfidence: {}'.format(confidence))
    print('\n')

Once the operation has been completed, our response will contain result within an AnnotateVideoResponse, which consists of a list of annotationResults, one for each video sent in the request. Because we sent only one video in the request, we take the first segmentLabelAnnotations of the results. We then loop through all the labels in segmentLabelAnnotations. For the purpose of this tutorial, we only display video-level annotations. To identify video-level annotations, we pull segment_label_annotations data from the results. Each segment label annotation includes a description (segment_label.description), a list of entity categories (category_entity.description) and where they occur in segments by start and end time offsets from the beginning of the video.

{
   "name":"us-west1.12089999971048628582",
   "metadata":{
      "@type":"type.googleapis.com/google.cloud.videointelligence.v1.AnnotateVideoProgress",
      "annotationProgress":[
         {
            "inputUri":"/cloud-ml-sandbox/video/chicago.mp4",
            "updateTime":"2017-01-31T01:49:52.498015Z",
            "startTime":"2017-01-31T01:49:43.056481Z"
         }
      ]
   },
   "done": true,
   "response":{
      "@type":"type.googleapis.com/google.cloud.videointelligence.v1.AnnotateVideoResponse",
      "annotationResults":[
         {
            "inputUri":"/cloud-ml-sandbox/video/chicago.mp4",
            "segmentLabelAnnotations": [
              {
                "entity": {
                  "entityId": "/m/01yrx",
                  "languageCode": "en-US"
                },
                "segments": [
                  {
                    "segment": {
                      "startTimeOffset": "0s",
                      "endTimeOffset": "14.833664s"
                    },
                    "confidence": 0.98509187
                  }
                ]
              },
               ...
            ]
         }
      ]
   }
}

Because we sent only one video in the request, we will take the first description of the first result and print that description.

Running your application

To run your application, simply pass it the Google Cloud Storage URI of a video:

$ python label_det.py gs://cloud-ml-sandbox/video/chicago.mp4
Operation us-west1.4757250774497581229 started: 2017-01-30T01:46:30.158989Z
Operation processing ...
The video has been successfully processed.

Video label description: urban area
        Label category description: city
        Segment 0: 0.0s to 38.752016s
        Confidence: 0.946980476379

Video label description: traffic
        Segment 0: 0.0s to 38.752016s
        Confidence: 0.94105899334

Video label description: vehicle
        Segment 0: 0.0s to 38.752016s
        Confidence: 0.919958174229
...
 

Congratulations! You've performed an annotation task using the Cloud Video Intelligence API!

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Video Intelligence API Documentation