オブジェクト トラッキング

オブジェクト トラッキングは、入力された動画で検出された複数のオブジェクトを追跡します。オブジェクト トラッキングをリクエストするには、annotate メソッドを呼び出し、features フィールドで OBJECT_TRACKING を指定します。

オブジェクト トラッキング リクエストでは、提供した動画または動画セグメントで検出されたエンティティに対応するラベル(タグ)で、動画にアノテーションが付けられます。たとえば、信号を通過する車両の動画では、「car」、「truck」、「bike」、「tires」、「lights」、「window」などのラベルが生成されます。各ラベルには一連の境界ボックスがあり、各境界ボックスには時間セグメントが関連付けられています。時間セグメントには、動画の先頭からの時間を示す時間オフセット(タイムスタンプ)が含まれています。アノテーションには追加のエンティティ情報も含まれます。たとえば、この中のエンティティ ID を使用すると、Google Knowledge Graph Search API でエンティティの詳細を確認できます。

オブジェクト トラッキングとラベル検出

オブジェクト トラッキングは、ラベル検出とは異なります。ラベル検出はラベルに境界ボックスがありませんが、オブジェクト トラッキングは、特定の動画内に個別にボックス化できるオブジェクトがあれば、それを境界ボックスとともに検出します。

Google Cloud Storage 上の動画のオブジェクト トラッキングをリクエストする

次のサンプルは、Google Cloud Storage にあるファイルのオブジェクト トラッキングを示しています。

プロトコル

詳細については、videos:annotate API エンドポイントをご覧ください。

オブジェクト トラッキングを実行するには、v1p2beta1/videos:annotate エンドポイントに POST リクエストを行います。

この例では、gcloud auth application-default print-access-token コマンドを使用して、Google Cloud Platform Cloud SDK を使用するプロジェクト用に設定されたサービス アカウントのアクセス トークンを取得します。Cloud SDK のインストール、サービス アカウントを使用したプロジェクトの設定については、クイックスタートをご覧ください。

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     --data "{
      'input_uri': 'gs://cloud-ml-sandbox/video/chicago.mp4',
      'features': ['OBJECT_TRACKING'],
    }" "https://videointelligence.googleapis.com/v1/videos:annotate"

Video Intelligence のアノテーション リクエストに成功すると、次のような name フィールドを含むレスポンスが返されます。

{
  "name": "us-west1.12088456132466233141"
}

この名前は長時間実行オペレーションを表しています。v1.operations API を使用すると、長時間実行オペレーションをクエリできます。

オペレーションの結果を取得するには、以下のコマンドの NAME を前の結果の name の値で置き換えます。

curl -X GET -H "Content-Type: application/json" \
-H "Authorization: Bearer  $(gcloud auth application-default print-access-token)" \
"https://videointelligence.googleapis.com/v1/operations/your-operation-name"

オペレーションが終了すると、レスポンスは done: true をレポートし、オブジェクト トラッキング アノテーションのリストが返されます。

オブジェクト トラッキング アノテーションは、objectAnnotations リストとして返されます。

{
  "name": "us-west1.13724933311138859628",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.videointelligence.v1.AnnotateVideoProgress",
    "annotationProgress": [
      {
        "inputUri": "/cloud-ml-sandbox/video/chicago.mp4",
        "progressPercent": 100,
        "startTime": "2018-06-21T16:56:46.755199Z",
        "updateTime": "2018-06-21T16:59:17.911197Z"
      }
    ]
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.cloud.videointelligence.v1.AnnotateVideoResponse",
    "annotationResults": [
      {
        "inputUri": "/cloud-ml-sandbox/video/chicago.mp4",
        "objectAnnotations": [
          {
            "entity": {
              "entityId": "/m/0k4j",
              "description": "car",
              "languageCode": "en-US"
            },
            "frames": [
              {
                "normalizedBoundingBox": {
                  "left": 0.2672763,
                  "top": 0.5677657,
                  "right": 0.4388713,
                  "bottom": 0.7623171
                },
                "timeOffset": "0s"
              },
              {
                "normalizedBoundingBox": {
                  "left": 0.26920167,
                  "top": 0.5659805,
                  "right": 0.44331276,
                  "bottom": 0.76780635
                },
                "timeOffset": "0.100495s"
              },
           ...
              {
                "normalizedBoundingBox": {
                  "left": 0.83573246,
                  "top": 0.6645812,
                  "right": 1,
                  "bottom": 0.99865407
                },
                "timeOffset": "2.311402s"
              }
            ],
            "segment": {
              "startTimeOffset": "0s",
              "endTimeOffset": "2.311402s"
            },
            "confidence": 0.99488896
          },
        ...
          {
            "entity": {
              "entityId": "/m/0cgh4",
              "description": "building",
              "languageCode": "en-US"
            },
            "frames": [
              {
                "normalizedBoundingBox": {
                  "left": 0.12340179,
                  "top": 0.010383379,
                  "right": 0.21914443,
                  "bottom": 0.5591795
                },
                "timeOffset": "0s"
              },
              {
                "normalizedBoundingBox": {
                  "left": 0.12340179,
                  "top": 0.009684974,
                  "right": 0.22915152,
                  "bottom": 0.56070584
                },
                "timeOffset": "0.100495s"
              },
           ...
              {
                "normalizedBoundingBox": {
                  "left": 0.12340179,
                  "top": 0.008624528,
                  "right": 0.22723165,
                  "bottom": 0.56158626
                },
                "timeOffset": "0.401983s"
              }
            ],
            "segment": {
              "startTimeOffset": "0s",
              "endTimeOffset": "0.401983s"
            },
            "confidence": 0.33914912
          },
       ...
          {
            "entity": {
              "entityId": "/m/0cgh4",
              "description": "building",
              "languageCode": "en-US"
            },
            "frames": [
              {
                "normalizedBoundingBox": {
                  "left": 0.79324204,
                  "top": 0.0006896425,
                  "right": 0.99659824,
                  "bottom": 0.5324423
                },
                "timeOffset": "37.585421s"
              },
              {
                "normalizedBoundingBox": {
                  "left": 0.78935236,
                  "top": 0.0011992548,
                  "right": 0.99659824,
                  "bottom": 0.5374946
                },
                "timeOffset": "37.685917s"
              },
           ...
              {
                "normalizedBoundingBox": {
                  "left": 0.79404694,
                  "right": 0.99659824,
                  "bottom": 0.5280966
                },
                "timeOffset": "38.590379s"
              }
            ],
            "segment": {
              "startTimeOffset": "37.585421s",
              "endTimeOffset": "38.590379s"
            },
            "confidence": 0.3415429
          }
        ]
      }
    ]
  }
}

C#

public static object TrackObjectGcs(string gcsUri)
{
    var client = VideoIntelligenceServiceClient.Create();
    var request = new AnnotateVideoRequest
    {
        InputUri = gcsUri,
        Features = { Feature.ObjectTracking },
        // It is recommended to use location_id as 'us-east1' for the
        // best latency due to different types of processors used in
        // this region and others.
        LocationId = "us-east1"
    };

    Console.WriteLine("\nProcessing video for object annotations.");
    var op = client.AnnotateVideo(request).PollUntilCompleted();

    Console.WriteLine("\nFinished processing.\n");

    // Retrieve first result because a single video was processed.
    var objectAnnotations = op.Result.AnnotationResults[0]
                              .ObjectAnnotations;

    // Get only the first annotation for demo purposes
    var objAnnotation = objectAnnotations[0];

    Console.WriteLine(
        $"Entity description: {objAnnotation.Entity.Description}");

    if (objAnnotation.Entity.EntityId != null)
    {
        Console.WriteLine(
            $"Entity id: {objAnnotation.Entity.EntityId}");
    }

    Console.Write($"Segment: ");
    Console.WriteLine(
        String.Format("{0}s to {1}s",
                      objAnnotation.Segment.StartTimeOffset.Seconds +
                      objAnnotation.Segment.StartTimeOffset.Nanos / 1e9,
                      objAnnotation.Segment.EndTimeOffset.Seconds +
                      objAnnotation.Segment.EndTimeOffset.Nanos / 1e9));

    Console.WriteLine($"Confidence: {objAnnotation.Confidence}");

    // Here we print only the bounding box of the first frame in this segment
    var frame = objAnnotation.Frames[0];
    var box = frame.NormalizedBoundingBox;
    Console.WriteLine(
        String.Format("Time offset of the first frame: {0}s",
                      frame.TimeOffset.Seconds +
                      frame.TimeOffset.Nanos / 1e9));
    Console.WriteLine("Bounding box positions:");
    Console.WriteLine($"\tleft   : {box.Left}");
    Console.WriteLine($"\ttop    : {box.Top}");
    Console.WriteLine($"\tright  : {box.Right}");
    Console.WriteLine($"\tbottom : {box.Bottom}");

    return 0;
}

Go

import (
	"fmt"
	"io"
	"log"

	"context"

	"github.com/golang/protobuf/ptypes"

	video "cloud.google.com/go/videointelligence/apiv1"
	videopb "google.golang.org/genproto/googleapis/cloud/videointelligence/v1"
)

// objectTrackingGCS analyzes a video and extracts entities with their bounding boxes.
func objectTrackingGCS(w io.Writer, gcsURI string) error {
	// gcsURI := "gs://cloud-samples-data/video/cat.mp4"

	ctx := context.Background()

	// Creates a client.
	client, err := video.NewClient(ctx)
	if err != nil {
		log.Fatalf("Failed to create client: %v", err)
	}

	op, err := client.AnnotateVideo(ctx, &videopb.AnnotateVideoRequest{
		InputUri: gcsURI,
		Features: []videopb.Feature{
			videopb.Feature_OBJECT_TRACKING,
		},
	})
	if err != nil {
		log.Fatalf("Failed to start annotation job: %v", err)
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		log.Fatalf("Failed to annotate: %v", err)
	}

	// Only one video was processed, so get the first result.
	result := resp.GetAnnotationResults()[0]

	for _, annotation := range result.ObjectAnnotations {
		fmt.Fprintf(w, "Description: %q\n", annotation.Entity.GetDescription())
		if len(annotation.Entity.EntityId) > 0 {
			fmt.Fprintf(w, "\tEntity ID: %q\n", annotation.Entity.GetEntityId())
		}

		segment := annotation.GetSegment()
		start, _ := ptypes.Duration(segment.GetStartTimeOffset())
		end, _ := ptypes.Duration(segment.GetEndTimeOffset())
		fmt.Fprintf(w, "\tSegment: %v to %v\n", start, end)

		fmt.Fprintf(w, "\tConfidence: %f\n", annotation.GetConfidence())

		// Here we print only the bounding box of the first frame in this segment.
		frame := annotation.GetFrames()[0]
		seconds := float32(frame.GetTimeOffset().GetSeconds())
		nanos := float32(frame.GetTimeOffset().GetNanos())
		fmt.Fprintf(w, "\tTime offset of the first frame: %fs\n", seconds+nanos/1e9)

		box := frame.GetNormalizedBoundingBox()
		fmt.Fprintf(w, "\tBounding box position:\n")
		fmt.Fprintf(w, "\t\tleft  : %f\n", box.GetLeft())
		fmt.Fprintf(w, "\t\ttop   : %f\n", box.GetTop())
		fmt.Fprintf(w, "\t\tright : %f\n", box.GetRight())
		fmt.Fprintf(w, "\t\tbottom: %f\n", box.GetBottom())
	}

	return nil
}

Java

/**
 * Track objects in a video.
 *
 * @param gcsUri the path to the video file to analyze.
 */
public static VideoAnnotationResults trackObjectsGcs(String gcsUri) throws Exception {
  try (VideoIntelligenceServiceClient client = VideoIntelligenceServiceClient.create()) {
    // Create the request
    AnnotateVideoRequest request = AnnotateVideoRequest.newBuilder()
        .setInputUri(gcsUri)
        .addFeatures(Feature.OBJECT_TRACKING)
        .setLocationId("us-east1")
        .build();

    // asynchronously perform object tracking on videos
    OperationFuture<AnnotateVideoResponse, AnnotateVideoProgress> future =
        client.annotateVideoAsync(request);

    System.out.println("Waiting for operation to complete...");
    // The first result is retrieved because a single video was processed.
    AnnotateVideoResponse response = future.get(300, TimeUnit.SECONDS);
    VideoAnnotationResults results = response.getAnnotationResults(0);

    // Get only the first annotation for demo purposes.
    ObjectTrackingAnnotation annotation = results.getObjectAnnotations(0);
    System.out.println("Confidence: " + annotation.getConfidence());

    if (annotation.hasEntity()) {
      Entity entity = annotation.getEntity();
      System.out.println("Entity description: " + entity.getDescription());
      System.out.println("Entity id:: " + entity.getEntityId());
    }

    if (annotation.hasSegment()) {
      VideoSegment videoSegment = annotation.getSegment();
      Duration startTimeOffset = videoSegment.getStartTimeOffset();
      Duration endTimeOffset = videoSegment.getEndTimeOffset();
      // Display the segment time in seconds, 1e9 converts nanos to seconds
      System.out.println(String.format(
          "Segment: %.2fs to %.2fs",
          startTimeOffset.getSeconds() + startTimeOffset.getNanos() / 1e9,
          endTimeOffset.getSeconds() + endTimeOffset.getNanos() / 1e9));
    }

    // Here we print only the bounding box of the first frame in this segment.
    ObjectTrackingFrame frame = annotation.getFrames(0);
    // Display the offset time in seconds, 1e9 converts nanos to seconds
    Duration timeOffset = frame.getTimeOffset();
    System.out.println(String.format(
        "Time offset of the first frame: %.2fs",
        timeOffset.getSeconds() + timeOffset.getNanos() / 1e9));

    // Display the bounding box of the detected object
    NormalizedBoundingBox normalizedBoundingBox = frame.getNormalizedBoundingBox();
    System.out.println("Bounding box position:");
    System.out.println("\tleft: " + normalizedBoundingBox.getLeft());
    System.out.println("\ttop: " + normalizedBoundingBox.getTop());
    System.out.println("\tright: " + normalizedBoundingBox.getRight());
    System.out.println("\tbottom: " + normalizedBoundingBox.getBottom());
    return results;
  }
}

Node.js

// Imports the Google Cloud Video Intelligence library
const Video = require('@google-cloud/video-intelligence').v1p2beta1;

// Creates a client
const video = new Video.VideoIntelligenceServiceClient();

/**
 * TODO(developer): Uncomment the following line before running the sample.
 */
// const gcsUri = 'GCS URI of the video to analyze, e.g. gs://my-bucket/my-video.mp4';

const request = {
  inputUri: gcsUri,
  features: ['OBJECT_TRACKING'],
  //recommended to use us-east1 for the best latency due to different types of processors used in this region and others
  locationId: 'us-east1',
};
// Detects objects in a video
const [operation] = await video.annotateVideo(request);
const results = await operation.promise();
console.log('Waiting for operation to complete...');
//Gets annotations for video
const annotations = results[0].annotationResults[0];
const objects = annotations.objectAnnotations;
objects.forEach(object => {
  console.log(`Entity description:  ${object.entity.description}`);
  console.log(`Entity id: ${object.entity.entityId}`);
  const time = object.segment;
  if (time.startTimeOffset.seconds === undefined) {
    time.startTimeOffset.seconds = 0;
  }
  if (time.startTimeOffset.nanos === undefined) {
    time.startTimeOffset.nanos = 0;
  }
  if (time.endTimeOffset.seconds === undefined) {
    time.endTimeOffset.seconds = 0;
  }
  if (time.endTimeOffset.nanos === undefined) {
    time.endTimeOffset.nanos = 0;
  }
  console.log(
    `Segment: ${time.startTimeOffset.seconds}` +
      `.${(time.startTimeOffset.nanos / 1e6).toFixed(0)}s to ${
        time.endTimeOffset.seconds
      }.` +
      `${(time.endTimeOffset.nanos / 1e6).toFixed(0)}s`
  );
  console.log(`Confidence: ${object.confidence}`);
  const frame = object.frames[0];
  const box = frame.normalizedBoundingBox;
  const timeOffset = frame.timeOffset;
  if (timeOffset.seconds === undefined) {
    timeOffset.seconds = 0;
  }
  if (timeOffset.nanos === undefined) {
    timeOffset.nanos = 0;
  }
  console.log(
    `Time offset for the first frame: ${timeOffset.seconds}` +
      `.${(timeOffset.nanos / 1e6).toFixed(0)}s`
  );
  console.log(`Bounding box position:`);
  console.log(`\tleft   :${box.left}`);
  console.log(`\ttop    :${box.top}`);
  console.log(`\tright  :${box.right}`);
  console.log(`\tbottom :${box.bottom}`);
});

Python

"""Object Tracking."""
from google.cloud import videointelligence_v1p2beta1 as videointelligence

# It is recommended to use location_id as 'us-east1' for the best latency
# due to different types of processors used in this region and others.
video_client = videointelligence.VideoIntelligenceServiceClient()
features = [videointelligence.enums.Feature.OBJECT_TRACKING]
operation = video_client.annotate_video(
    input_uri=gcs_uri, features=features, location_id='us-east1')
print('\nProcessing video for object annotations.')

result = operation.result(timeout=300)
print('\nFinished processing.\n')

# The first result is retrieved because a single video was processed.
object_annotations = result.annotation_results[0].object_annotations

# Get only the first annotation for demo purposes.
object_annotation = object_annotations[0]
print('Entity description: {}'.format(
    object_annotation.entity.description))
if object_annotation.entity.entity_id:
    print('Entity id: {}'.format(object_annotation.entity.entity_id))

print('Segment: {}s to {}s'.format(
    object_annotation.segment.start_time_offset.seconds +
    object_annotation.segment.start_time_offset.nanos / 1e9,
    object_annotation.segment.end_time_offset.seconds +
    object_annotation.segment.end_time_offset.nanos / 1e9))

print('Confidence: {}'.format(object_annotation.confidence))

# Here we print only the bounding box of the first frame in this segment
frame = object_annotation.frames[0]
box = frame.normalized_bounding_box
print('Time offset of the first frame: {}s'.format(
    frame.time_offset.seconds + frame.time_offset.nanos / 1e9))
print('Bounding box position:')
print('\tleft  : {}'.format(box.left))
print('\ttop   : {}'.format(box.top))
print('\tright : {}'.format(box.right))
print('\tbottom: {}'.format(box.bottom))
print('\n')

Ruby

# path = "Path to a video file on Google Cloud Storage: gs://bucket/video.mp4"

require "google/cloud/video_intelligence"

video = Google::Cloud::VideoIntelligence.new

# Register a callback during the method call
operation = video.annotate_video input_uri: path, features: [:OBJECT_TRACKING] do |operation|
  raise operation.results.message? if operation.error?
  puts "Finished Processing."

  object_annotations = operation.results.annotation_results.first.object_annotations

  object_annotations.each do |object_annotation|
    puts "Entity description: #{object_annotation.entity.description}"
    puts "Entity id: #{object_annotation.entity.entity_id}" if object_annotation.entity.entity_id

    object_segment = object_annotation.segment
    start_time = (object_segment.start_time_offset.seconds +
                   object_segment.start_time_offset.nanos / 1e9)
    end_time =   (object_segment.end_time_offset.seconds +
                   object_segment.end_time_offset.nanos / 1e9)
    puts "Segment: #{start_time}s to #{end_time}s"

    puts "Confidence: #{object_annotation.confidence}"

    # Print information about the first frame of the segment.
    frame = object_annotation.frames.first
    box = frame.normalized_bounding_box

    time_offset = (frame.time_offset.seconds +
                    frame.time_offset.nanos / 1e9)
    puts "Time offset for the first frame: #{time_offset}s"

    puts "Bounding box position:"
    puts "\tleft  : #{box.left}"
    puts "\ttop   : #{box.top}"
    puts "\tright : #{box.right}"
    puts "\tbottom: #{box.bottom}\n"
  end
end

puts "Processing video for object tracking:"
operation.wait_until_done!

ローカル ファイルから動画のオブジェクト トラッキングをリクエストする

次のサンプルは、ローカルに保存されたファイルのオブジェクト トラッキングを示しています。

プロトコル

詳細については、videos:annotate API エンドポイントをご覧ください。

オブジェクト トラッキングを実行するには、動画を Base64 でエンコードし、Base64 でエンコードされたそのエンコード後の出力を v1p2beta1/videos:annotate エンドポイントに対する POST リクエストの input_content フィールドに含めます。base64 エンコードの詳細については、Base64 エンコードをご覧ください。

この例では、gcloud auth application-default print-access-token コマンドを使用して、Google Cloud Platform Cloud SDK を使用するプロジェクト用に設定されたサービス アカウントのアクセス トークンを取得します。Cloud SDK のインストール、サービス アカウントを使用したプロジェクトの設定については、クイックスタートをご覧ください。

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     --data "{
      'input_content': 'UklGRg41AwBBVkkgTElTVAwBAABoZHJsYXZpaDgAAAA1ggAAxPMBAAAAAAAQCAA...',
      'features': ['OBJECT_TRACKING'],
    }" "https://videointelligence.googleapis.com/v1/videos:annotate"

Video Intelligence のアノテーション リクエストに成功すると、次のような name フィールドを含むレスポンスが返されます。

{
  "name": "us-west1.12088456132466233141"
}

この名前は長時間実行オペレーションを表しています。v1.operations API を使用すると、長時間実行オペレーションをクエリできます。

オペレーションの結果を取得するには、以下のコマンドの NAME を前の結果の name の値で置き換えます。

curl -X GET -H "Content-Type: application/json" \
-H "Authorization: Bearer  $(gcloud auth application-default print-access-token)" \
"https://videointelligence.googleapis.com/v1/operations/your-operation-name"

オブジェクト トラッキング アノテーションは、objectAnnotations リストとして返されます。

{
  "name": "us-west1.13724933311138859628",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.videointelligence.v1.AnnotateVideoProgress",
    "annotationProgress": [
      {
        "inputUri": "/cloud-ml-sandbox/video/chicago.mp4",
        "progressPercent": 100,
        "startTime": "2018-06-21T16:56:46.755199Z",
        "updateTime": "2018-06-21T16:59:17.911197Z"
      }
    ]
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.cloud.videointelligence.v1.AnnotateVideoResponse",
    "annotationResults": [
      {
        "inputUri": "/cloud-ml-sandbox/video/chicago.mp4",
        "objectAnnotations": [
          {
            "entity": {
              "entityId": "/m/0k4j",
              "description": "car",
              "languageCode": "en-US"
            },
            "frames": [
              {
                "normalizedBoundingBox": {
                  "left": 0.2672763,
                  "top": 0.5677657,
                  "right": 0.4388713,
                  "bottom": 0.7623171
                },
                "timeOffset": "0s"
              },
              {
                "normalizedBoundingBox": {
                  "left": 0.26920167,
                  "top": 0.5659805,
                  "right": 0.44331276,
                  "bottom": 0.76780635
                },
                "timeOffset": "0.100495s"
              },
           ...
              {
                "normalizedBoundingBox": {
                  "left": 0.83573246,
                  "top": 0.6645812,
                  "right": 1,
                  "bottom": 0.99865407
                },
                "timeOffset": "2.311402s"
              }
            ],
            "segment": {
              "startTimeOffset": "0s",
              "endTimeOffset": "2.311402s"
            },
            "confidence": 0.99488896
          },
        ...
          {
            "entity": {
              "entityId": "/m/0cgh4",
              "description": "building",
              "languageCode": "en-US"
            },
            "frames": [
              {
                "normalizedBoundingBox": {
                  "left": 0.12340179,
                  "top": 0.010383379,
                  "right": 0.21914443,
                  "bottom": 0.5591795
                },
                "timeOffset": "0s"
              },
              {
                "normalizedBoundingBox": {
                  "left": 0.12340179,
                  "top": 0.009684974,
                  "right": 0.22915152,
                  "bottom": 0.56070584
                },
                "timeOffset": "0.100495s"
              },
           ...
              {
                "normalizedBoundingBox": {
                  "left": 0.12340179,
                  "top": 0.008624528,
                  "right": 0.22723165,
                  "bottom": 0.56158626
                },
                "timeOffset": "0.401983s"
              }
            ],
            "segment": {
              "startTimeOffset": "0s",
              "endTimeOffset": "0.401983s"
            },
            "confidence": 0.33914912
          },
       ...
          {
            "entity": {
              "entityId": "/m/0cgh4",
              "description": "building",
              "languageCode": "en-US"
            },
            "frames": [
              {
                "normalizedBoundingBox": {
                  "left": 0.79324204,
                  "top": 0.0006896425,
                  "right": 0.99659824,
                  "bottom": 0.5324423
                },
                "timeOffset": "37.585421s"
              },
              {
                "normalizedBoundingBox": {
                  "left": 0.78935236,
                  "top": 0.0011992548,
                  "right": 0.99659824,
                  "bottom": 0.5374946
                },
                "timeOffset": "37.685917s"
              },
           ...
              {
                "normalizedBoundingBox": {
                  "left": 0.79404694,
                  "right": 0.99659824,
                  "bottom": 0.5280966
                },
                "timeOffset": "38.590379s"
              }
            ],
            "segment": {
              "startTimeOffset": "37.585421s",
              "endTimeOffset": "38.590379s"
            },
            "confidence": 0.3415429
          }
        ]
      }
    ]
  }
}

C#

public static object TrackObject(string filePath)
{
    var client = VideoIntelligenceServiceClient.Create();
    var request = new AnnotateVideoRequest
    {
        InputContent = Google.Protobuf.ByteString.CopyFrom(File.ReadAllBytes(filePath)),
        Features = { Feature.ObjectTracking },
        // It is recommended to use location_id as 'us-east1' for the
        // best latency due to different types of processors used in
        // this region and others.
        LocationId = "us-east1"
    };

    Console.WriteLine("\nProcessing video for object annotations.");
    var op = client.AnnotateVideo(request).PollUntilCompleted();

    Console.WriteLine("\nFinished processing.\n");

    // Retrieve first result because a single video was processed.
    var objectAnnotations = op.Result.AnnotationResults[0]
                              .ObjectAnnotations;

    // Get only the first annotation for demo purposes
    var objAnnotation = objectAnnotations[0];

    Console.WriteLine(
        $"Entity description: {objAnnotation.Entity.Description}");

    if (objAnnotation.Entity.EntityId != null)
    {
        Console.WriteLine(
            $"Entity id: {objAnnotation.Entity.EntityId}");
    }

    Console.Write($"Segment: ");
    Console.WriteLine(
        String.Format("{0}s to {1}s",
                      objAnnotation.Segment.StartTimeOffset.Seconds +
                      objAnnotation.Segment.StartTimeOffset.Nanos / 1e9,
                      objAnnotation.Segment.EndTimeOffset.Seconds +
                      objAnnotation.Segment.EndTimeOffset.Nanos / 1e9));

    Console.WriteLine($"Confidence: {objAnnotation.Confidence}");

    // Here we print only the bounding box of the first frame in this segment
    var frame = objAnnotation.Frames[0];
    var box = frame.NormalizedBoundingBox;
    Console.WriteLine(
        String.Format("Time offset of the first frame: {0}s",
                      frame.TimeOffset.Seconds +
                      frame.TimeOffset.Nanos / 1e9));
    Console.WriteLine("Bounding box positions:");
    Console.WriteLine($"\tleft   : {box.Left}");
    Console.WriteLine($"\ttop    : {box.Top}");
    Console.WriteLine($"\tright  : {box.Right}");
    Console.WriteLine($"\tbottom : {box.Bottom}");

    return 0;
}

Go

import (
	"fmt"
	"io"
	"io/ioutil"
	"log"

	"context"

	"github.com/golang/protobuf/ptypes"

	video "cloud.google.com/go/videointelligence/apiv1"
	videopb "google.golang.org/genproto/googleapis/cloud/videointelligence/v1"
)

// objectTracking analyzes a video and extracts entities with their bounding boxes.
func objectTracking(w io.Writer, filename string) error {
	// filename := "resources/cat.mp4"

	ctx := context.Background()

	// Creates a client.
	client, err := video.NewClient(ctx)
	if err != nil {
		log.Fatalf("Failed to create client: %v", err)
	}

	fileBytes, err := ioutil.ReadFile(filename)
	if err != nil {
		return err
	}

	op, err := client.AnnotateVideo(ctx, &videopb.AnnotateVideoRequest{
		InputContent: fileBytes,
		Features: []videopb.Feature{
			videopb.Feature_OBJECT_TRACKING,
		},
	})
	if err != nil {
		log.Fatalf("Failed to start annotation job: %v", err)
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		log.Fatalf("Failed to annotate: %v", err)
	}

	// Only one video was processed, so get the first result.
	result := resp.GetAnnotationResults()[0]

	for _, annotation := range result.ObjectAnnotations {
		fmt.Fprintf(w, "Description: %q\n", annotation.Entity.GetDescription())
		if len(annotation.Entity.EntityId) > 0 {
			fmt.Fprintf(w, "\tEntity ID: %q\n", annotation.Entity.GetEntityId())
		}

		segment := annotation.GetSegment()
		start, _ := ptypes.Duration(segment.GetStartTimeOffset())
		end, _ := ptypes.Duration(segment.GetEndTimeOffset())
		fmt.Fprintf(w, "\tSegment: %v to %v\n", start, end)

		fmt.Fprintf(w, "\tConfidence: %f\n", annotation.GetConfidence())

		// Here we print only the bounding box of the first frame in this segment.
		frame := annotation.GetFrames()[0]
		seconds := float32(frame.GetTimeOffset().GetSeconds())
		nanos := float32(frame.GetTimeOffset().GetNanos())
		fmt.Fprintf(w, "\tTime offset of the first frame: %fs\n", seconds+nanos/1e9)

		box := frame.GetNormalizedBoundingBox()
		fmt.Fprintf(w, "\tBounding box position:\n")
		fmt.Fprintf(w, "\t\tleft  : %f\n", box.GetLeft())
		fmt.Fprintf(w, "\t\ttop   : %f\n", box.GetTop())
		fmt.Fprintf(w, "\t\tright : %f\n", box.GetRight())
		fmt.Fprintf(w, "\t\tbottom: %f\n", box.GetBottom())
	}

	return nil
}

Java

/**
 * Track objects in a video.
 *
 * @param filePath the path to the video file to analyze.
 */
public static VideoAnnotationResults trackObjects(String filePath) throws Exception {
  try (VideoIntelligenceServiceClient client = VideoIntelligenceServiceClient.create()) {
    // Read file
    Path path = Paths.get(filePath);
    byte[] data = Files.readAllBytes(path);

    // Create the request
    AnnotateVideoRequest request = AnnotateVideoRequest.newBuilder()
        .setInputContent(ByteString.copyFrom(data))
        .addFeatures(Feature.OBJECT_TRACKING)
        .setLocationId("us-east1")
        .build();

    // asynchronously perform object tracking on videos
    OperationFuture<AnnotateVideoResponse, AnnotateVideoProgress> future =
        client.annotateVideoAsync(request);

    System.out.println("Waiting for operation to complete...");
    // The first result is retrieved because a single video was processed.
    AnnotateVideoResponse response = future.get(300, TimeUnit.SECONDS);
    VideoAnnotationResults results = response.getAnnotationResults(0);

    // Get only the first annotation for demo purposes.
    ObjectTrackingAnnotation annotation = results.getObjectAnnotations(0);
    System.out.println("Confidence: " + annotation.getConfidence());

    if (annotation.hasEntity()) {
      Entity entity = annotation.getEntity();
      System.out.println("Entity description: " + entity.getDescription());
      System.out.println("Entity id:: " + entity.getEntityId());
    }

    if (annotation.hasSegment()) {
      VideoSegment videoSegment = annotation.getSegment();
      Duration startTimeOffset = videoSegment.getStartTimeOffset();
      Duration endTimeOffset = videoSegment.getEndTimeOffset();
      // Display the segment time in seconds, 1e9 converts nanos to seconds
      System.out.println(String.format(
          "Segment: %.2fs to %.2fs",
          startTimeOffset.getSeconds() + startTimeOffset.getNanos() / 1e9,
          endTimeOffset.getSeconds() + endTimeOffset.getNanos() / 1e9));
    }

    // Here we print only the bounding box of the first frame in this segment.
    ObjectTrackingFrame frame = annotation.getFrames(0);
    // Display the offset time in seconds, 1e9 converts nanos to seconds
    Duration timeOffset = frame.getTimeOffset();
    System.out.println(String.format(
        "Time offset of the first frame: %.2fs",
        timeOffset.getSeconds() + timeOffset.getNanos() / 1e9));

    // Display the bounding box of the detected object
    NormalizedBoundingBox normalizedBoundingBox = frame.getNormalizedBoundingBox();
    System.out.println("Bounding box position:");
    System.out.println("\tleft: " + normalizedBoundingBox.getLeft());
    System.out.println("\ttop: " + normalizedBoundingBox.getTop());
    System.out.println("\tright: " + normalizedBoundingBox.getRight());
    System.out.println("\tbottom: " + normalizedBoundingBox.getBottom());
    return results;
  }
}

Node.js

// Imports the Google Cloud Video Intelligence library
const Video = require('@google-cloud/video-intelligence').v1p2beta1;
const fs = require('fs');
const util = require('util');
// Creates a client
const video = new Video.VideoIntelligenceServiceClient();
/**
 * TODO(developer): Uncomment the following line before running the sample.
 */
// const path = 'Local file to analyze, e.g. ./my-file.mp4';

// Reads a local video file and converts it to base64
const file = await util.promisify(fs.readFile)(path);
const inputContent = file.toString('base64');

const request = {
  inputContent: inputContent,
  features: ['OBJECT_TRACKING'],
  //recommended to use us-east1 for the best latency due to different types of processors used in this region and others
  locationId: 'us-east1',
};
// Detects objects in a video
const [operation] = await video.annotateVideo(request);
const results = await operation.promise();
console.log('Waiting for operation to complete...');
//Gets annotations for video
const annotations = results[0].annotationResults[0];
const objects = annotations.objectAnnotations;
objects.forEach(object => {
  console.log(`Entity description:  ${object.entity.description}`);
  console.log(`Entity id: ${object.entity.entityId}`);
  const time = object.segment;
  if (time.startTimeOffset.seconds === undefined) {
    time.startTimeOffset.seconds = 0;
  }
  if (time.startTimeOffset.nanos === undefined) {
    time.startTimeOffset.nanos = 0;
  }
  if (time.endTimeOffset.seconds === undefined) {
    time.endTimeOffset.seconds = 0;
  }
  if (time.endTimeOffset.nanos === undefined) {
    time.endTimeOffset.nanos = 0;
  }
  console.log(
    `Segment: ${time.startTimeOffset.seconds}` +
      `.${(time.startTimeOffset.nanos / 1e6).toFixed(0)}s to ${
        time.endTimeOffset.seconds
      }.` +
      `${(time.endTimeOffset.nanos / 1e6).toFixed(0)}s`
  );
  console.log(`Confidence: ${object.confidence}`);
  const frame = object.frames[0];
  const box = frame.normalizedBoundingBox;
  const timeOffset = frame.timeOffset;
  if (timeOffset.seconds === undefined) {
    timeOffset.seconds = 0;
  }
  if (timeOffset.nanos === undefined) {
    timeOffset.nanos = 0;
  }
  console.log(
    `Time offset for the first frame: ${timeOffset.seconds}` +
      `.${(timeOffset.nanos / 1e6).toFixed(0)}s`
  );
  console.log(`Bounding box position:`);
  console.log(`\tleft   :${box.left}`);
  console.log(`\ttop    :${box.top}`);
  console.log(`\tright  :${box.right}`);
  console.log(`\tbottom :${box.bottom}`);
});

Python

"""Object Tracking."""
from google.cloud import videointelligence_v1p2beta1 as videointelligence

video_client = videointelligence.VideoIntelligenceServiceClient()
features = [videointelligence.enums.Feature.OBJECT_TRACKING]

with io.open(path, 'rb') as file:
    input_content = file.read()

# It is recommended to use location_id as 'us-east1' for the best latency
# due to different types of processors used in this region and others.
operation = video_client.annotate_video(
    input_content=input_content, features=features, location_id='us-east1')
print('\nProcessing video for object annotations.')

result = operation.result(timeout=300)
print('\nFinished processing.\n')

# The first result is retrieved because a single video was processed.
object_annotations = result.annotation_results[0].object_annotations

# Get only the first annotation for demo purposes.
object_annotation = object_annotations[0]
print('Entity description: {}'.format(
    object_annotation.entity.description))
if object_annotation.entity.entity_id:
    print('Entity id: {}'.format(object_annotation.entity.entity_id))

print('Segment: {}s to {}s'.format(
    object_annotation.segment.start_time_offset.seconds +
    object_annotation.segment.start_time_offset.nanos / 1e9,
    object_annotation.segment.end_time_offset.seconds +
    object_annotation.segment.end_time_offset.nanos / 1e9))

print('Confidence: {}'.format(object_annotation.confidence))

# Here we print only the bounding box of the first frame in this segment
frame = object_annotation.frames[0]
box = frame.normalized_bounding_box
print('Time offset of the first frame: {}s'.format(
    frame.time_offset.seconds + frame.time_offset.nanos / 1e9))
print('Bounding box position:')
print('\tleft  : {}'.format(box.left))
print('\ttop   : {}'.format(box.top))
print('\tright : {}'.format(box.right))
print('\tbottom: {}'.format(box.bottom))
print('\n')

Ruby

# "Path to a local video file: path/to/file.mp4"

require "google/cloud/video_intelligence"

video = Google::Cloud::VideoIntelligence.new

video_contents = File.binread path

# Register a callback during the method call
operation = video.annotate_video input_content: video_contents, features: [:OBJECT_TRACKING] do |operation|
  raise operation.results.message? if operation.error?
  puts "Finished Processing."

  object_annotations = operation.results.annotation_results.first.object_annotations

  object_annotations.each do |object_annotation|
    puts "Entity description: #{object_annotation.entity.description}"
    puts "Entity id: #{object_annotation.entity.entity_id}" if object_annotation.entity.entity_id

    object_segment = object_annotation.segment
    start_time = (object_segment.start_time_offset.seconds +
                   object_segment.start_time_offset.nanos / 1e9)
    end_time =   (object_segment.end_time_offset.seconds +
                   object_segment.end_time_offset.nanos / 1e9)
    puts "Segment: #{start_time}s to #{end_time}s"

    puts "Confidence: #{object_annotation.confidence}"

    # Print information about the first frame of the segment.
    frame = object_annotation.frames.first
    box = frame.normalized_bounding_box

    time_offset = (frame.time_offset.seconds +
                    frame.time_offset.nanos / 1e9)
    puts "Time offset for the first frame: #{time_offset}s"

    puts "Bounding box position:"
    puts "\tleft  : #{box.left}"
    puts "\ttop   : #{box.top}"
    puts "\tright : #{box.right}"
    puts "\tbottom: #{box.bottom}\n"
  end
end

puts "Processing video for object tracking:"
operation.wait_until_done!

このページは役立ちましたか?評価をお願いいたします。

フィードバックを送信...

Cloud Video Intelligence API ドキュメント