Objekte verfolgen

Mit Objekt-Tracking können in einem Eingabevideo mehrere Objekte erkannt werden. Rufen Sie die Methode annotate auf und geben Sie im Feld features den Wert OBJECT_TRACKING ein, um eine Objekt-Tracking-Anfrage zu stellen.

Durch die Objekt-Tracking-Anfrage wird ein Video mit Labels (Tags) für Elemente gekennzeichnet, die in dem angegebenen Video oder den Videosegmenten erkannt wurden. Zum Beispiel können für ein Video von Fahrzeugen, die über eine Ampel fahren, Labels wie "Auto", "LKW", "Fahrrad", "Reifen", "Ampel", "Fenster" usw. erzeugt werden. Jedes Label kann mehrere Begrenzungsrahmen enthalten, wobei jedem Begrenzungsrahmen ein Zeitsegment mit einem Zeitstempel zugeordnet ist, der die Versatzzeit vom Beginn des Videos angibt. Die Anmerkung enthält außerdem zusätzliche Elementinformationen wie eine Element-ID, mit der Sie weitere Informationen zu diesem Element in der Google Knowledge Graph Search API finden können.

Objekt-Tracking und Labelerkennung im Vergleich

Objekt-Tracking unterscheidet sich von der Labelerkennung dadurch, dass die Labelerkennung Labels ohne Markierungsrahmen bereitstellt, während Objekt-Tracking das Vorhandensein einzelner abgrenzbarer Objekte in einem bestimmten Video sowie den Markierungsrahmen für jedes Objekt erkennt.

Objekt-Tracking für ein Video in Google Cloud Storage anfordern

Die folgenden Beispiele zeigen Objekt-Tracking für eine Datei in Google Cloud Storage.

Protokoll

Ausführliche Informationen finden Sie im API-Endpunkt videos:annotate.

Stellen Sie eine POST-Anfrage an den Endpoint v1p2beta1/videos:annotate, um ein Objekt-Tracking durchzuführen.

In diesem Beispiel wird mit dem Befehl gcloud auth application-default print-access-token ein Zugriffstoken für ein Dienstkonto abgerufen, das für das Projekt mithilfe der Cloud SDK der Google Cloud Platform eingerichtet wurde. Eine Anleitung zum Installieren des Cloud SDK und zum Einrichten eines Projekts mit einem Dienstkonto finden Sie in der Schnellstartanleitung.

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     --data "{
      'input_uri': 'gs://cloud-ml-sandbox/video/chicago.mp4',
      'features': ['OBJECT_TRACKING'],
    }" "https://videointelligence.googleapis.com/v1/videos:annotate"

Eine erfolgreiche Anmerkungsanfrage an Video Intelligence gibt als Antwort ein einzelnes Namensfeld zurück:

{
  "name": "us-west1.12088456132466233141"
}

Dieser Name steht für einen lang andauernden Vorgang, der mit der API v1.operations abgefragt werden kann.

Ersetzen Sie zum Abrufen der Vorgangsergebnisse NAME im folgenden Befehl durch den Wert von name in Ihrem vorherigen Ergebnis:

curl -X GET -H "Content-Type: application/json" \
-H "Authorization: Bearer  $(gcloud auth application-default print-access-token)" \
"https://videointelligence.googleapis.com/v1/operations/your-operation-name"

Wenn der Vorgang abgeschlossen ist, wird als Antwort done: true zurückgegeben und Sie sollten eine Liste mit Anmerkungen des Objekt-Tracking erhalten.

Anmerkungen des Objekt-Trackings werden in einer objectAnnotations-Liste zurückgegeben.

{
  "name": "us-west1.13724933311138859628",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.videointelligence.v1.AnnotateVideoProgress",
    "annotationProgress": [
      {
        "inputUri": "/cloud-ml-sandbox/video/chicago.mp4",
        "progressPercent": 100,
        "startTime": "2018-06-21T16:56:46.755199Z",
        "updateTime": "2018-06-21T16:59:17.911197Z"
      }
    ]
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.cloud.videointelligence.v1.AnnotateVideoResponse",
    "annotationResults": [
      {
        "inputUri": "/cloud-ml-sandbox/video/chicago.mp4",
        "objectAnnotations": [
          {
            "entity": {
              "entityId": "/m/0k4j",
              "description": "car",
              "languageCode": "en-US"
            },
            "frames": [
              {
                "normalizedBoundingBox": {
                  "left": 0.2672763,
                  "top": 0.5677657,
                  "right": 0.4388713,
                  "bottom": 0.7623171
                },
                "timeOffset": "0s"
              },
              {
                "normalizedBoundingBox": {
                  "left": 0.26920167,
                  "top": 0.5659805,
                  "right": 0.44331276,
                  "bottom": 0.76780635
                },
                "timeOffset": "0.100495s"
              },
           ...
              {
                "normalizedBoundingBox": {
                  "left": 0.83573246,
                  "top": 0.6645812,
                  "right": 1,
                  "bottom": 0.99865407
                },
                "timeOffset": "2.311402s"
              }
            ],
            "segment": {
              "startTimeOffset": "0s",
              "endTimeOffset": "2.311402s"
            },
            "confidence": 0.99488896
          },
        ...
          {
            "entity": {
              "entityId": "/m/0cgh4",
              "description": "building",
              "languageCode": "en-US"
            },
            "frames": [
              {
                "normalizedBoundingBox": {
                  "left": 0.12340179,
                  "top": 0.010383379,
                  "right": 0.21914443,
                  "bottom": 0.5591795
                },
                "timeOffset": "0s"
              },
              {
                "normalizedBoundingBox": {
                  "left": 0.12340179,
                  "top": 0.009684974,
                  "right": 0.22915152,
                  "bottom": 0.56070584
                },
                "timeOffset": "0.100495s"
              },
           ...
              {
                "normalizedBoundingBox": {
                  "left": 0.12340179,
                  "top": 0.008624528,
                  "right": 0.22723165,
                  "bottom": 0.56158626
                },
                "timeOffset": "0.401983s"
              }
            ],
            "segment": {
              "startTimeOffset": "0s",
              "endTimeOffset": "0.401983s"
            },
            "confidence": 0.33914912
          },
       ...
          {
            "entity": {
              "entityId": "/m/0cgh4",
              "description": "building",
              "languageCode": "en-US"
            },
            "frames": [
              {
                "normalizedBoundingBox": {
                  "left": 0.79324204,
                  "top": 0.0006896425,
                  "right": 0.99659824,
                  "bottom": 0.5324423
                },
                "timeOffset": "37.585421s"
              },
              {
                "normalizedBoundingBox": {
                  "left": 0.78935236,
                  "top": 0.0011992548,
                  "right": 0.99659824,
                  "bottom": 0.5374946
                },
                "timeOffset": "37.685917s"
              },
           ...
              {
                "normalizedBoundingBox": {
                  "left": 0.79404694,
                  "right": 0.99659824,
                  "bottom": 0.5280966
                },
                "timeOffset": "38.590379s"
              }
            ],
            "segment": {
              "startTimeOffset": "37.585421s",
              "endTimeOffset": "38.590379s"
            },
            "confidence": 0.3415429
          }
        ]
      }
    ]
  }
}

C#

public static object TrackObjectGcs(string gcsUri)
{
    var client = VideoIntelligenceServiceClient.Create();
    var request = new AnnotateVideoRequest
    {
        InputUri = gcsUri,
        Features = { Feature.ObjectTracking },
        // It is recommended to use location_id as 'us-east1' for the
        // best latency due to different types of processors used in
        // this region and others.
        LocationId = "us-east1"
    };

    Console.WriteLine("\nProcessing video for object annotations.");
    var op = client.AnnotateVideo(request).PollUntilCompleted();

    Console.WriteLine("\nFinished processing.\n");

    // Retrieve first result because a single video was processed.
    var objectAnnotations = op.Result.AnnotationResults[0]
                              .ObjectAnnotations;

    // Get only the first annotation for demo purposes
    var objAnnotation = objectAnnotations[0];

    Console.WriteLine(
        $"Entity description: {objAnnotation.Entity.Description}");

    if (objAnnotation.Entity.EntityId != null)
    {
        Console.WriteLine(
            $"Entity id: {objAnnotation.Entity.EntityId}");
    }

    Console.Write($"Segment: ");
    Console.WriteLine(
        String.Format("{0}s to {1}s",
                      objAnnotation.Segment.StartTimeOffset.Seconds +
                      objAnnotation.Segment.StartTimeOffset.Nanos / 1e9,
                      objAnnotation.Segment.EndTimeOffset.Seconds +
                      objAnnotation.Segment.EndTimeOffset.Nanos / 1e9));

    Console.WriteLine($"Confidence: {objAnnotation.Confidence}");

    // Here we print only the bounding box of the first frame in this segment
    var frame = objAnnotation.Frames[0];
    var box = frame.NormalizedBoundingBox;
    Console.WriteLine(
        String.Format("Time offset of the first frame: {0}s",
                      frame.TimeOffset.Seconds +
                      frame.TimeOffset.Nanos / 1e9));
    Console.WriteLine("Bounding box positions:");
    Console.WriteLine($"\tleft   : {box.Left}");
    Console.WriteLine($"\ttop    : {box.Top}");
    Console.WriteLine($"\tright  : {box.Right}");
    Console.WriteLine($"\tbottom : {box.Bottom}");

    return 0;
}

Go

import (
	"fmt"
	"io"
	"log"

	"context"

	"github.com/golang/protobuf/ptypes"

	video "cloud.google.com/go/videointelligence/apiv1"
	videopb "google.golang.org/genproto/googleapis/cloud/videointelligence/v1"
)

// objectTrackingGCS analyzes a video and extracts entities with their bounding boxes.
func objectTrackingGCS(w io.Writer, gcsURI string) error {
	// gcsURI := "gs://cloud-samples-data/video/cat.mp4"

	ctx := context.Background()

	// Creates a client.
	client, err := video.NewClient(ctx)
	if err != nil {
		log.Fatalf("Failed to create client: %v", err)
	}

	op, err := client.AnnotateVideo(ctx, &videopb.AnnotateVideoRequest{
		InputUri: gcsURI,
		Features: []videopb.Feature{
			videopb.Feature_OBJECT_TRACKING,
		},
	})
	if err != nil {
		log.Fatalf("Failed to start annotation job: %v", err)
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		log.Fatalf("Failed to annotate: %v", err)
	}

	// Only one video was processed, so get the first result.
	result := resp.GetAnnotationResults()[0]

	for _, annotation := range result.ObjectAnnotations {
		fmt.Fprintf(w, "Description: %q\n", annotation.Entity.GetDescription())
		if len(annotation.Entity.EntityId) > 0 {
			fmt.Fprintf(w, "\tEntity ID: %q\n", annotation.Entity.GetEntityId())
		}

		segment := annotation.GetSegment()
		start, _ := ptypes.Duration(segment.GetStartTimeOffset())
		end, _ := ptypes.Duration(segment.GetEndTimeOffset())
		fmt.Fprintf(w, "\tSegment: %v to %v\n", start, end)

		fmt.Fprintf(w, "\tConfidence: %f\n", annotation.GetConfidence())

		// Here we print only the bounding box of the first frame in this segment.
		frame := annotation.GetFrames()[0]
		seconds := float32(frame.GetTimeOffset().GetSeconds())
		nanos := float32(frame.GetTimeOffset().GetNanos())
		fmt.Fprintf(w, "\tTime offset of the first frame: %fs\n", seconds+nanos/1e9)

		box := frame.GetNormalizedBoundingBox()
		fmt.Fprintf(w, "\tBounding box position:\n")
		fmt.Fprintf(w, "\t\tleft  : %f\n", box.GetLeft())
		fmt.Fprintf(w, "\t\ttop   : %f\n", box.GetTop())
		fmt.Fprintf(w, "\t\tright : %f\n", box.GetRight())
		fmt.Fprintf(w, "\t\tbottom: %f\n", box.GetBottom())
	}

	return nil
}

Java

/**
 * Track objects in a video.
 *
 * @param gcsUri the path to the video file to analyze.
 */
public static VideoAnnotationResults trackObjectsGcs(String gcsUri) throws Exception {
  try (VideoIntelligenceServiceClient client = VideoIntelligenceServiceClient.create()) {
    // Create the request
    AnnotateVideoRequest request = AnnotateVideoRequest.newBuilder()
        .setInputUri(gcsUri)
        .addFeatures(Feature.OBJECT_TRACKING)
        .setLocationId("us-east1")
        .build();

    // asynchronously perform object tracking on videos
    OperationFuture<AnnotateVideoResponse, AnnotateVideoProgress> future =
        client.annotateVideoAsync(request);

    System.out.println("Waiting for operation to complete...");
    // The first result is retrieved because a single video was processed.
    AnnotateVideoResponse response = future.get(300, TimeUnit.SECONDS);
    VideoAnnotationResults results = response.getAnnotationResults(0);

    // Get only the first annotation for demo purposes.
    ObjectTrackingAnnotation annotation = results.getObjectAnnotations(0);
    System.out.println("Confidence: " + annotation.getConfidence());

    if (annotation.hasEntity()) {
      Entity entity = annotation.getEntity();
      System.out.println("Entity description: " + entity.getDescription());
      System.out.println("Entity id:: " + entity.getEntityId());
    }

    if (annotation.hasSegment()) {
      VideoSegment videoSegment = annotation.getSegment();
      Duration startTimeOffset = videoSegment.getStartTimeOffset();
      Duration endTimeOffset = videoSegment.getEndTimeOffset();
      // Display the segment time in seconds, 1e9 converts nanos to seconds
      System.out.println(String.format(
          "Segment: %.2fs to %.2fs",
          startTimeOffset.getSeconds() + startTimeOffset.getNanos() / 1e9,
          endTimeOffset.getSeconds() + endTimeOffset.getNanos() / 1e9));
    }

    // Here we print only the bounding box of the first frame in this segment.
    ObjectTrackingFrame frame = annotation.getFrames(0);
    // Display the offset time in seconds, 1e9 converts nanos to seconds
    Duration timeOffset = frame.getTimeOffset();
    System.out.println(String.format(
        "Time offset of the first frame: %.2fs",
        timeOffset.getSeconds() + timeOffset.getNanos() / 1e9));

    // Display the bounding box of the detected object
    NormalizedBoundingBox normalizedBoundingBox = frame.getNormalizedBoundingBox();
    System.out.println("Bounding box position:");
    System.out.println("\tleft: " + normalizedBoundingBox.getLeft());
    System.out.println("\ttop: " + normalizedBoundingBox.getTop());
    System.out.println("\tright: " + normalizedBoundingBox.getRight());
    System.out.println("\tbottom: " + normalizedBoundingBox.getBottom());
    return results;
  }
}

Node.js

// Imports the Google Cloud Video Intelligence library
const Video = require('@google-cloud/video-intelligence').v1p2beta1;

// Creates a client
const video = new Video.VideoIntelligenceServiceClient();

/**
 * TODO(developer): Uncomment the following line before running the sample.
 */
// const gcsUri = 'GCS URI of the video to analyze, e.g. gs://my-bucket/my-video.mp4';

const request = {
  inputUri: gcsUri,
  features: ['OBJECT_TRACKING'],
  //recommended to use us-east1 for the best latency due to different types of processors used in this region and others
  locationId: 'us-east1',
};
// Detects objects in a video
const [operation] = await video.annotateVideo(request);
const results = await operation.promise();
console.log('Waiting for operation to complete...');
//Gets annotations for video
const annotations = results[0].annotationResults[0];
const objects = annotations.objectAnnotations;
objects.forEach(object => {
  console.log(`Entity description:  ${object.entity.description}`);
  console.log(`Entity id: ${object.entity.entityId}`);
  const time = object.segment;
  if (time.startTimeOffset.seconds === undefined) {
    time.startTimeOffset.seconds = 0;
  }
  if (time.startTimeOffset.nanos === undefined) {
    time.startTimeOffset.nanos = 0;
  }
  if (time.endTimeOffset.seconds === undefined) {
    time.endTimeOffset.seconds = 0;
  }
  if (time.endTimeOffset.nanos === undefined) {
    time.endTimeOffset.nanos = 0;
  }
  console.log(
    `Segment: ${time.startTimeOffset.seconds}` +
      `.${(time.startTimeOffset.nanos / 1e6).toFixed(0)}s to ${
        time.endTimeOffset.seconds
      }.` +
      `${(time.endTimeOffset.nanos / 1e6).toFixed(0)}s`
  );
  console.log(`Confidence: ${object.confidence}`);
  const frame = object.frames[0];
  const box = frame.normalizedBoundingBox;
  const timeOffset = frame.timeOffset;
  if (timeOffset.seconds === undefined) {
    timeOffset.seconds = 0;
  }
  if (timeOffset.nanos === undefined) {
    timeOffset.nanos = 0;
  }
  console.log(
    `Time offset for the first frame: ${timeOffset.seconds}` +
      `.${(timeOffset.nanos / 1e6).toFixed(0)}s`
  );
  console.log(`Bounding box position:`);
  console.log(`\tleft   :${box.left}`);
  console.log(`\ttop    :${box.top}`);
  console.log(`\tright  :${box.right}`);
  console.log(`\tbottom :${box.bottom}`);
});

Python

"""Object Tracking."""
from google.cloud import videointelligence_v1p2beta1 as videointelligence

# It is recommended to use location_id as 'us-east1' for the best latency
# due to different types of processors used in this region and others.
video_client = videointelligence.VideoIntelligenceServiceClient()
features = [videointelligence.enums.Feature.OBJECT_TRACKING]
operation = video_client.annotate_video(
    input_uri=gcs_uri, features=features, location_id='us-east1')
print('\nProcessing video for object annotations.')

result = operation.result(timeout=300)
print('\nFinished processing.\n')

# The first result is retrieved because a single video was processed.
object_annotations = result.annotation_results[0].object_annotations

# Get only the first annotation for demo purposes.
object_annotation = object_annotations[0]
print('Entity description: {}'.format(
    object_annotation.entity.description))
if object_annotation.entity.entity_id:
    print('Entity id: {}'.format(object_annotation.entity.entity_id))

print('Segment: {}s to {}s'.format(
    object_annotation.segment.start_time_offset.seconds +
    object_annotation.segment.start_time_offset.nanos / 1e9,
    object_annotation.segment.end_time_offset.seconds +
    object_annotation.segment.end_time_offset.nanos / 1e9))

print('Confidence: {}'.format(object_annotation.confidence))

# Here we print only the bounding box of the first frame in this segment
frame = object_annotation.frames[0]
box = frame.normalized_bounding_box
print('Time offset of the first frame: {}s'.format(
    frame.time_offset.seconds + frame.time_offset.nanos / 1e9))
print('Bounding box position:')
print('\tleft  : {}'.format(box.left))
print('\ttop   : {}'.format(box.top))
print('\tright : {}'.format(box.right))
print('\tbottom: {}'.format(box.bottom))
print('\n')

Ruby

# path = "Path to a video file on Google Cloud Storage: gs://bucket/video.mp4"

require "google/cloud/video_intelligence"

video = Google::Cloud::VideoIntelligence.new

# Register a callback during the method call
operation = video.annotate_video input_uri: path, features: [:OBJECT_TRACKING] do |operation|
  raise operation.results.message? if operation.error?
  puts "Finished Processing."

  object_annotations = operation.results.annotation_results.first.object_annotations

  object_annotations.each do |object_annotation|
    puts "Entity description: #{object_annotation.entity.description}"
    puts "Entity id: #{object_annotation.entity.entity_id}" if object_annotation.entity.entity_id

    object_segment = object_annotation.segment
    start_time = (object_segment.start_time_offset.seconds +
                   object_segment.start_time_offset.nanos / 1e9)
    end_time =   (object_segment.end_time_offset.seconds +
                   object_segment.end_time_offset.nanos / 1e9)
    puts "Segment: #{start_time}s to #{end_time}s"

    puts "Confidence: #{object_annotation.confidence}"

    # Print information about the first frame of the segment.
    frame = object_annotation.frames.first
    box = frame.normalized_bounding_box

    time_offset = (frame.time_offset.seconds +
                    frame.time_offset.nanos / 1e9)
    puts "Time offset for the first frame: #{time_offset}s"

    puts "Bounding box position:"
    puts "\tleft  : #{box.left}"
    puts "\ttop   : #{box.top}"
    puts "\tright : #{box.right}"
    puts "\tbottom: #{box.bottom}\n"
  end
end

puts "Processing video for object tracking:"
operation.wait_until_done!

Objekt-Tracking für Video aus einer lokalen Datei anfordern

Die folgenden Beispiele zeigen Objekt-Tracking für eine lokal gespeicherte Datei.

Protokoll

Ausführliche Informationen finden Sie im API-Endpunkt videos:annotate.

Wenn Sie ein Objekt-Tracking durchführen wollen, dann codieren Sie das Video zuerst mit Base64. Fügen Sie das Video dann in das Feld input_content einer POST-Anfrage ein und stellen Sie diese an den Endpoint v1p2beta1/videos:annotate. Weitere Informationen zur Base64-Codierung finden Sie unter Base64-Codierung.

In diesem Beispiel wird mit dem Befehl gcloud auth application-default print-access-token ein Zugriffstoken für ein Dienstkonto abgerufen, das für das Projekt mithilfe der Cloud SDK der Google Cloud Platform eingerichtet wurde. Eine Anleitung zum Installieren des Cloud SDK und zum Einrichten eines Projekts mit einem Dienstkonto finden Sie in der Schnellstartanleitung.

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     --data "{
      'input_content': 'UklGRg41AwBBVkkgTElTVAwBAABoZHJsYXZpaDgAAAA1ggAAxPMBAAAAAAAQCAA...',
      'features': ['OBJECT_TRACKING'],
    }" "https://videointelligence.googleapis.com/v1/videos:annotate"

Eine erfolgreiche Anmerkungsanfrage an Video Intelligence gibt als Antwort ein einzelnes Namensfeld zurück:

{
  "name": "us-west1.12088456132466233141"
}

Dieser Name steht für einen lang andauernden Vorgang, der mit der API v1.operations abgefragt werden kann.

Ersetzen Sie zum Abrufen der Vorgangsergebnisse NAME im folgenden Befehl durch den Wert von name in Ihrem vorherigen Ergebnis:

curl -X GET -H "Content-Type: application/json" \
-H "Authorization: Bearer  $(gcloud auth application-default print-access-token)" \
"https://videointelligence.googleapis.com/v1/operations/your-operation-name"

Anmerkungen des Objekt-Trackings werden in einer objectAnnotations-Liste zurückgegeben.

{
  "name": "us-west1.13724933311138859628",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.videointelligence.v1.AnnotateVideoProgress",
    "annotationProgress": [
      {
        "inputUri": "/cloud-ml-sandbox/video/chicago.mp4",
        "progressPercent": 100,
        "startTime": "2018-06-21T16:56:46.755199Z",
        "updateTime": "2018-06-21T16:59:17.911197Z"
      }
    ]
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.cloud.videointelligence.v1.AnnotateVideoResponse",
    "annotationResults": [
      {
        "inputUri": "/cloud-ml-sandbox/video/chicago.mp4",
        "objectAnnotations": [
          {
            "entity": {
              "entityId": "/m/0k4j",
              "description": "car",
              "languageCode": "en-US"
            },
            "frames": [
              {
                "normalizedBoundingBox": {
                  "left": 0.2672763,
                  "top": 0.5677657,
                  "right": 0.4388713,
                  "bottom": 0.7623171
                },
                "timeOffset": "0s"
              },
              {
                "normalizedBoundingBox": {
                  "left": 0.26920167,
                  "top": 0.5659805,
                  "right": 0.44331276,
                  "bottom": 0.76780635
                },
                "timeOffset": "0.100495s"
              },
           ...
              {
                "normalizedBoundingBox": {
                  "left": 0.83573246,
                  "top": 0.6645812,
                  "right": 1,
                  "bottom": 0.99865407
                },
                "timeOffset": "2.311402s"
              }
            ],
            "segment": {
              "startTimeOffset": "0s",
              "endTimeOffset": "2.311402s"
            },
            "confidence": 0.99488896
          },
        ...
          {
            "entity": {
              "entityId": "/m/0cgh4",
              "description": "building",
              "languageCode": "en-US"
            },
            "frames": [
              {
                "normalizedBoundingBox": {
                  "left": 0.12340179,
                  "top": 0.010383379,
                  "right": 0.21914443,
                  "bottom": 0.5591795
                },
                "timeOffset": "0s"
              },
              {
                "normalizedBoundingBox": {
                  "left": 0.12340179,
                  "top": 0.009684974,
                  "right": 0.22915152,
                  "bottom": 0.56070584
                },
                "timeOffset": "0.100495s"
              },
           ...
              {
                "normalizedBoundingBox": {
                  "left": 0.12340179,
                  "top": 0.008624528,
                  "right": 0.22723165,
                  "bottom": 0.56158626
                },
                "timeOffset": "0.401983s"
              }
            ],
            "segment": {
              "startTimeOffset": "0s",
              "endTimeOffset": "0.401983s"
            },
            "confidence": 0.33914912
          },
       ...
          {
            "entity": {
              "entityId": "/m/0cgh4",
              "description": "building",
              "languageCode": "en-US"
            },
            "frames": [
              {
                "normalizedBoundingBox": {
                  "left": 0.79324204,
                  "top": 0.0006896425,
                  "right": 0.99659824,
                  "bottom": 0.5324423
                },
                "timeOffset": "37.585421s"
              },
              {
                "normalizedBoundingBox": {
                  "left": 0.78935236,
                  "top": 0.0011992548,
                  "right": 0.99659824,
                  "bottom": 0.5374946
                },
                "timeOffset": "37.685917s"
              },
           ...
              {
                "normalizedBoundingBox": {
                  "left": 0.79404694,
                  "right": 0.99659824,
                  "bottom": 0.5280966
                },
                "timeOffset": "38.590379s"
              }
            ],
            "segment": {
              "startTimeOffset": "37.585421s",
              "endTimeOffset": "38.590379s"
            },
            "confidence": 0.3415429
          }
        ]
      }
    ]
  }
}

C#

public static object TrackObject(string filePath)
{
    var client = VideoIntelligenceServiceClient.Create();
    var request = new AnnotateVideoRequest
    {
        InputContent = Google.Protobuf.ByteString.CopyFrom(File.ReadAllBytes(filePath)),
        Features = { Feature.ObjectTracking },
        // It is recommended to use location_id as 'us-east1' for the
        // best latency due to different types of processors used in
        // this region and others.
        LocationId = "us-east1"
    };

    Console.WriteLine("\nProcessing video for object annotations.");
    var op = client.AnnotateVideo(request).PollUntilCompleted();

    Console.WriteLine("\nFinished processing.\n");

    // Retrieve first result because a single video was processed.
    var objectAnnotations = op.Result.AnnotationResults[0]
                              .ObjectAnnotations;

    // Get only the first annotation for demo purposes
    var objAnnotation = objectAnnotations[0];

    Console.WriteLine(
        $"Entity description: {objAnnotation.Entity.Description}");

    if (objAnnotation.Entity.EntityId != null)
    {
        Console.WriteLine(
            $"Entity id: {objAnnotation.Entity.EntityId}");
    }

    Console.Write($"Segment: ");
    Console.WriteLine(
        String.Format("{0}s to {1}s",
                      objAnnotation.Segment.StartTimeOffset.Seconds +
                      objAnnotation.Segment.StartTimeOffset.Nanos / 1e9,
                      objAnnotation.Segment.EndTimeOffset.Seconds +
                      objAnnotation.Segment.EndTimeOffset.Nanos / 1e9));

    Console.WriteLine($"Confidence: {objAnnotation.Confidence}");

    // Here we print only the bounding box of the first frame in this segment
    var frame = objAnnotation.Frames[0];
    var box = frame.NormalizedBoundingBox;
    Console.WriteLine(
        String.Format("Time offset of the first frame: {0}s",
                      frame.TimeOffset.Seconds +
                      frame.TimeOffset.Nanos / 1e9));
    Console.WriteLine("Bounding box positions:");
    Console.WriteLine($"\tleft   : {box.Left}");
    Console.WriteLine($"\ttop    : {box.Top}");
    Console.WriteLine($"\tright  : {box.Right}");
    Console.WriteLine($"\tbottom : {box.Bottom}");

    return 0;
}

Go

import (
	"fmt"
	"io"
	"io/ioutil"
	"log"

	"context"

	"github.com/golang/protobuf/ptypes"

	video "cloud.google.com/go/videointelligence/apiv1"
	videopb "google.golang.org/genproto/googleapis/cloud/videointelligence/v1"
)

// objectTracking analyzes a video and extracts entities with their bounding boxes.
func objectTracking(w io.Writer, filename string) error {
	// filename := "resources/cat.mp4"

	ctx := context.Background()

	// Creates a client.
	client, err := video.NewClient(ctx)
	if err != nil {
		log.Fatalf("Failed to create client: %v", err)
	}

	fileBytes, err := ioutil.ReadFile(filename)
	if err != nil {
		return err
	}

	op, err := client.AnnotateVideo(ctx, &videopb.AnnotateVideoRequest{
		InputContent: fileBytes,
		Features: []videopb.Feature{
			videopb.Feature_OBJECT_TRACKING,
		},
	})
	if err != nil {
		log.Fatalf("Failed to start annotation job: %v", err)
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		log.Fatalf("Failed to annotate: %v", err)
	}

	// Only one video was processed, so get the first result.
	result := resp.GetAnnotationResults()[0]

	for _, annotation := range result.ObjectAnnotations {
		fmt.Fprintf(w, "Description: %q\n", annotation.Entity.GetDescription())
		if len(annotation.Entity.EntityId) > 0 {
			fmt.Fprintf(w, "\tEntity ID: %q\n", annotation.Entity.GetEntityId())
		}

		segment := annotation.GetSegment()
		start, _ := ptypes.Duration(segment.GetStartTimeOffset())
		end, _ := ptypes.Duration(segment.GetEndTimeOffset())
		fmt.Fprintf(w, "\tSegment: %v to %v\n", start, end)

		fmt.Fprintf(w, "\tConfidence: %f\n", annotation.GetConfidence())

		// Here we print only the bounding box of the first frame in this segment.
		frame := annotation.GetFrames()[0]
		seconds := float32(frame.GetTimeOffset().GetSeconds())
		nanos := float32(frame.GetTimeOffset().GetNanos())
		fmt.Fprintf(w, "\tTime offset of the first frame: %fs\n", seconds+nanos/1e9)

		box := frame.GetNormalizedBoundingBox()
		fmt.Fprintf(w, "\tBounding box position:\n")
		fmt.Fprintf(w, "\t\tleft  : %f\n", box.GetLeft())
		fmt.Fprintf(w, "\t\ttop   : %f\n", box.GetTop())
		fmt.Fprintf(w, "\t\tright : %f\n", box.GetRight())
		fmt.Fprintf(w, "\t\tbottom: %f\n", box.GetBottom())
	}

	return nil
}

Java

/**
 * Track objects in a video.
 *
 * @param filePath the path to the video file to analyze.
 */
public static VideoAnnotationResults trackObjects(String filePath) throws Exception {
  try (VideoIntelligenceServiceClient client = VideoIntelligenceServiceClient.create()) {
    // Read file
    Path path = Paths.get(filePath);
    byte[] data = Files.readAllBytes(path);

    // Create the request
    AnnotateVideoRequest request = AnnotateVideoRequest.newBuilder()
        .setInputContent(ByteString.copyFrom(data))
        .addFeatures(Feature.OBJECT_TRACKING)
        .setLocationId("us-east1")
        .build();

    // asynchronously perform object tracking on videos
    OperationFuture<AnnotateVideoResponse, AnnotateVideoProgress> future =
        client.annotateVideoAsync(request);

    System.out.println("Waiting for operation to complete...");
    // The first result is retrieved because a single video was processed.
    AnnotateVideoResponse response = future.get(300, TimeUnit.SECONDS);
    VideoAnnotationResults results = response.getAnnotationResults(0);

    // Get only the first annotation for demo purposes.
    ObjectTrackingAnnotation annotation = results.getObjectAnnotations(0);
    System.out.println("Confidence: " + annotation.getConfidence());

    if (annotation.hasEntity()) {
      Entity entity = annotation.getEntity();
      System.out.println("Entity description: " + entity.getDescription());
      System.out.println("Entity id:: " + entity.getEntityId());
    }

    if (annotation.hasSegment()) {
      VideoSegment videoSegment = annotation.getSegment();
      Duration startTimeOffset = videoSegment.getStartTimeOffset();
      Duration endTimeOffset = videoSegment.getEndTimeOffset();
      // Display the segment time in seconds, 1e9 converts nanos to seconds
      System.out.println(String.format(
          "Segment: %.2fs to %.2fs",
          startTimeOffset.getSeconds() + startTimeOffset.getNanos() / 1e9,
          endTimeOffset.getSeconds() + endTimeOffset.getNanos() / 1e9));
    }

    // Here we print only the bounding box of the first frame in this segment.
    ObjectTrackingFrame frame = annotation.getFrames(0);
    // Display the offset time in seconds, 1e9 converts nanos to seconds
    Duration timeOffset = frame.getTimeOffset();
    System.out.println(String.format(
        "Time offset of the first frame: %.2fs",
        timeOffset.getSeconds() + timeOffset.getNanos() / 1e9));

    // Display the bounding box of the detected object
    NormalizedBoundingBox normalizedBoundingBox = frame.getNormalizedBoundingBox();
    System.out.println("Bounding box position:");
    System.out.println("\tleft: " + normalizedBoundingBox.getLeft());
    System.out.println("\ttop: " + normalizedBoundingBox.getTop());
    System.out.println("\tright: " + normalizedBoundingBox.getRight());
    System.out.println("\tbottom: " + normalizedBoundingBox.getBottom());
    return results;
  }
}

Node.js

// Imports the Google Cloud Video Intelligence library
const Video = require('@google-cloud/video-intelligence').v1p2beta1;
const fs = require('fs');
const util = require('util');
// Creates a client
const video = new Video.VideoIntelligenceServiceClient();
/**
 * TODO(developer): Uncomment the following line before running the sample.
 */
// const path = 'Local file to analyze, e.g. ./my-file.mp4';

// Reads a local video file and converts it to base64
const file = await util.promisify(fs.readFile)(path);
const inputContent = file.toString('base64');

const request = {
  inputContent: inputContent,
  features: ['OBJECT_TRACKING'],
  //recommended to use us-east1 for the best latency due to different types of processors used in this region and others
  locationId: 'us-east1',
};
// Detects objects in a video
const [operation] = await video.annotateVideo(request);
const results = await operation.promise();
console.log('Waiting for operation to complete...');
//Gets annotations for video
const annotations = results[0].annotationResults[0];
const objects = annotations.objectAnnotations;
objects.forEach(object => {
  console.log(`Entity description:  ${object.entity.description}`);
  console.log(`Entity id: ${object.entity.entityId}`);
  const time = object.segment;
  if (time.startTimeOffset.seconds === undefined) {
    time.startTimeOffset.seconds = 0;
  }
  if (time.startTimeOffset.nanos === undefined) {
    time.startTimeOffset.nanos = 0;
  }
  if (time.endTimeOffset.seconds === undefined) {
    time.endTimeOffset.seconds = 0;
  }
  if (time.endTimeOffset.nanos === undefined) {
    time.endTimeOffset.nanos = 0;
  }
  console.log(
    `Segment: ${time.startTimeOffset.seconds}` +
      `.${(time.startTimeOffset.nanos / 1e6).toFixed(0)}s to ${
        time.endTimeOffset.seconds
      }.` +
      `${(time.endTimeOffset.nanos / 1e6).toFixed(0)}s`
  );
  console.log(`Confidence: ${object.confidence}`);
  const frame = object.frames[0];
  const box = frame.normalizedBoundingBox;
  const timeOffset = frame.timeOffset;
  if (timeOffset.seconds === undefined) {
    timeOffset.seconds = 0;
  }
  if (timeOffset.nanos === undefined) {
    timeOffset.nanos = 0;
  }
  console.log(
    `Time offset for the first frame: ${timeOffset.seconds}` +
      `.${(timeOffset.nanos / 1e6).toFixed(0)}s`
  );
  console.log(`Bounding box position:`);
  console.log(`\tleft   :${box.left}`);
  console.log(`\ttop    :${box.top}`);
  console.log(`\tright  :${box.right}`);
  console.log(`\tbottom :${box.bottom}`);
});

Python

"""Object Tracking."""
from google.cloud import videointelligence_v1p2beta1 as videointelligence

video_client = videointelligence.VideoIntelligenceServiceClient()
features = [videointelligence.enums.Feature.OBJECT_TRACKING]

with io.open(path, 'rb') as file:
    input_content = file.read()

# It is recommended to use location_id as 'us-east1' for the best latency
# due to different types of processors used in this region and others.
operation = video_client.annotate_video(
    input_content=input_content, features=features, location_id='us-east1')
print('\nProcessing video for object annotations.')

result = operation.result(timeout=300)
print('\nFinished processing.\n')

# The first result is retrieved because a single video was processed.
object_annotations = result.annotation_results[0].object_annotations

# Get only the first annotation for demo purposes.
object_annotation = object_annotations[0]
print('Entity description: {}'.format(
    object_annotation.entity.description))
if object_annotation.entity.entity_id:
    print('Entity id: {}'.format(object_annotation.entity.entity_id))

print('Segment: {}s to {}s'.format(
    object_annotation.segment.start_time_offset.seconds +
    object_annotation.segment.start_time_offset.nanos / 1e9,
    object_annotation.segment.end_time_offset.seconds +
    object_annotation.segment.end_time_offset.nanos / 1e9))

print('Confidence: {}'.format(object_annotation.confidence))

# Here we print only the bounding box of the first frame in this segment
frame = object_annotation.frames[0]
box = frame.normalized_bounding_box
print('Time offset of the first frame: {}s'.format(
    frame.time_offset.seconds + frame.time_offset.nanos / 1e9))
print('Bounding box position:')
print('\tleft  : {}'.format(box.left))
print('\ttop   : {}'.format(box.top))
print('\tright : {}'.format(box.right))
print('\tbottom: {}'.format(box.bottom))
print('\n')

Ruby

# "Path to a local video file: path/to/file.mp4"

require "google/cloud/video_intelligence"

video = Google::Cloud::VideoIntelligence.new

video_contents = File.binread path

# Register a callback during the method call
operation = video.annotate_video input_content: video_contents, features: [:OBJECT_TRACKING] do |operation|
  raise operation.results.message? if operation.error?
  puts "Finished Processing."

  object_annotations = operation.results.annotation_results.first.object_annotations

  object_annotations.each do |object_annotation|
    puts "Entity description: #{object_annotation.entity.description}"
    puts "Entity id: #{object_annotation.entity.entity_id}" if object_annotation.entity.entity_id

    object_segment = object_annotation.segment
    start_time = (object_segment.start_time_offset.seconds +
                   object_segment.start_time_offset.nanos / 1e9)
    end_time =   (object_segment.end_time_offset.seconds +
                   object_segment.end_time_offset.nanos / 1e9)
    puts "Segment: #{start_time}s to #{end_time}s"

    puts "Confidence: #{object_annotation.confidence}"

    # Print information about the first frame of the segment.
    frame = object_annotation.frames.first
    box = frame.normalized_bounding_box

    time_offset = (frame.time_offset.seconds +
                    frame.time_offset.nanos / 1e9)
    puts "Time offset for the first frame: #{time_offset}s"

    puts "Bounding box position:"
    puts "\tleft  : #{box.left}"
    puts "\ttop   : #{box.top}"
    puts "\tright : #{box.right}"
    puts "\tbottom: #{box.bottom}\n"
  end
end

puts "Processing video for object tracking:"
operation.wait_until_done!