動画の対ラベル分析

Video Intelligence API は、LABEL_DETECTION 機能を使用して動画映像に表示されるエンティティを識別できます。この機能は、物体、場所、活動、動物の種類、商品などを識別できます。

分析は以下のように区分化できます。

フレームレベル:
各エンティティがフレーム内で識別され、ラベル付けされます（1 秒あたり 1 フレームのサンプリング）。
ショットレベル:
ショットはすべてのセグメント（または動画）内で自動的に検出されます。その後、各ショット内でエンティティが識別され、ラベル付けされます。
セグメントレベル:
ユーザーが選択した動画セグメントを分析用に指定できます。このためには、アノテーションのために開始と終了の時間オフセットを指定します（VideoSegment を参照）。その後、各セグメント内でエンティティが識別され、ラベル付けされます。セグメントが指定されていない場合は、動画全体が 1 つのセグメントとして扱われます。

ローカルファイルにアノテーションを付ける

この例では、ローカルファイルに対して動画分析を実行し、ラベルを検出します。

詳細については、Python のチュートリアルをご覧ください。

REST

プロセスリクエストを送信する

POST リクエストを videos:annotate メソッドに送信する方法を以下に示します。LabelDetectionMode を構成して、ショットレベルのアノテーションやフレームレベルのアノテーションを構成できます。SHOT_AND_FRAME_MODE を使用することをおすすめします。この例では、Google Cloud CLI を使用するプロジェクト用に設定されたサービスアカウントのアクセストークンを使用します。Google Cloud CLI のインストール、サービスアカウントでのプロジェクトの設定、アクセストークンの取得を行う手順については、Video Intelligence のクイックスタートをご覧ください。

リクエストのデータを使用する前に、次のように置き換えます。

BASE64_ENCODED_CONTENT: 動画を Base64 エンコードデータとして提供します。データを Base64 に変換する方法の手順をご覧ください。
PROJECT_NUMBER: Google Cloud プロジェクトの数値識別子。

HTTP メソッドと URL:

POST https://videointelligence.googleapis.com/v1/videos:annotate

リクエストの本文（JSON）:

{
  "inputContent": "BASE64_ENCODED_CONTENT",
  "features": ["LABEL_DETECTION"],
}

リクエストを送信するには、次のいずれかのオプションを展開します。

curl（Linux、macOS、Cloud Shell）

注: 次のコマンドは、gcloud init または gcloud auth login を実行して、ユーザーアカウントで gcloud CLI にログインしているか、Cloud Shell を使用して自動的に gcloud CLI にログインしていることを前提としています。gcloud auth list を実行すると、現在アクティブなアカウントを確認できます。

リクエスト本文を request.json という名前のファイルに保存して、次のコマンドを実行します。

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: PROJECT_NUMBER" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://videointelligence.googleapis.com/v1/videos:annotate"

PowerShell（Windows）

注: 次のコマンドは、gcloud init または gcloud auth login を実行して、ご自分のユーザーアカウントで gcloud CLI にログインしていることを前提としています。gcloud auth list を実行すると、現在アクティブなアカウントを確認できます。

リクエスト本文を request.json という名前のファイルに保存して、次のコマンドを実行します。

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_NUMBER" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://videointelligence.googleapis.com/v1/videos:annotate" | Select-Object -Expand Content

次のような JSON レスポンスが返されます。

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/operations/OPERATION_ID"
}

リクエストが成功すると、Video Intelligence はオペレーションの名前を返します。

結果を取得する

リクエストの結果を取得するには、GET リクエストを projects.locations.operations リソースに送信します。このようなリクエストを送信する方法は次のとおりです。

リクエストのデータを使用する前に、次のように置き換えます。

OPERATION_NAME: Video Intelligence API によって返されるオペレーションの名前。オペレーション名の形式は projects/PROJECT_NUMBER/locations/LOCATION_ID/operations/OPERATION_ID です。
PROJECT_NUMBER: Google Cloud プロジェクトの数値識別子。

HTTP メソッドと URL:

GET https://videointelligence.googleapis.com/v1/OPERATION_NAME

リクエストを送信するには、次のいずれかのオプションを展開します。

curl（Linux、macOS、Cloud Shell）

次のコマンドを実行します。

curl -X GET \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: PROJECT_NUMBER" \
     "https://videointelligence.googleapis.com/v1/OPERATION_NAME"

PowerShell（Windows）

次のコマンドを実行します。

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_NUMBER" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://videointelligence.googleapis.com/v1/OPERATION_NAME" | Select-Object -Expand Content

次のような JSON レスポンスが返されます。

レスポンス

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.videointelligence.v1.AnnotateVideoProgress",
    "annotationProgress": [
      {
        "progressPercent": 100,
        "startTime": "2019-03-12T19:36:09.110351Z",
        "updateTime": "2019-03-12T19:36:17.519069Z"
      }
    ]
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.cloud.videointelligence.v1.AnnotateVideoResponse",
    "annotationResults": [
      {
        "segmentLabelAnnotations": [
          {
            "entity": {
              "entityId": "/m/01prls",
              "description": "land vehicle",
              "languageCode": "en-US"
            },
            "categoryEntities": [
              {
                "entityId": "/m/07yv9",
                "description": "vehicle",
                "languageCode": "en-US"
              }
            ],
            "segments": [
              {
                "segment": {
                  "startTimeOffset": "0s",
                  "endTimeOffset": "38.757872s"
                },
                "confidence": 0.6614419
              }
            ]
          },
          {
            "entity": {
              "entityId": "/m/039jbq",
              "description": "urban area",
              "languageCode": "en-US"
            },
            "categoryEntities": [
              {
                "entityId": "/m/01n32",
                "description": "city",
                "languageCode": "en-US"
              }
            ],
            "segments": [
              {
                "segment": {
                  "startTimeOffset": "0s",
                  "endTimeOffset": "38.757872s"
                },
                "confidence": 0.92337775
              }
            ]
          },
          ...

Go


func label(w io.Writer, file string) error {
	ctx := context.Background()
	client, err := video.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("video.NewClient: %w", err)
	}
	defer client.Close()

	fileBytes, err := os.ReadFile(file)
	if err != nil {
		return err
	}

	op, err := client.AnnotateVideo(ctx, &videopb.AnnotateVideoRequest{
		Features: []videopb.Feature{
			videopb.Feature_LABEL_DETECTION,
		},
		InputContent: fileBytes,
	})
	if err != nil {
		return fmt.Errorf("AnnotateVideo: %w", err)
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		return fmt.Errorf("Wait: %w", err)
	}

	printLabels := func(labels []*videopb.LabelAnnotation) {
		for _, label := range labels {
			fmt.Fprintf(w, "\tDescription: %s\n", label.Entity.Description)
			for _, category := range label.CategoryEntities {
				fmt.Fprintf(w, "\t\tCategory: %s\n", category.Description)
			}
			for _, segment := range label.Segments {
				start, _ := ptypes.Duration(segment.Segment.StartTimeOffset)
				end, _ := ptypes.Duration(segment.Segment.EndTimeOffset)
				fmt.Fprintf(w, "\t\tSegment: %s to %s\n", start, end)
			}
		}
	}

	// A single video was processed. Get the first result.
	result := resp.AnnotationResults[0]

	fmt.Fprintln(w, "SegmentLabelAnnotations:")
	printLabels(result.SegmentLabelAnnotations)
	fmt.Fprintln(w, "ShotLabelAnnotations:")
	printLabels(result.ShotLabelAnnotations)
	fmt.Fprintln(w, "FrameLabelAnnotations:")
	printLabels(result.FrameLabelAnnotations)

	return nil
}

Java

// Instantiate a com.google.cloud.videointelligence.v1.VideoIntelligenceServiceClient
try (VideoIntelligenceServiceClient client = VideoIntelligenceServiceClient.create()) {
  // Read file and encode into Base64
  Path path = Paths.get(filePath);
  byte[] data = Files.readAllBytes(path);

  AnnotateVideoRequest request =
      AnnotateVideoRequest.newBuilder()
          .setInputContent(ByteString.copyFrom(data))
          .addFeatures(Feature.LABEL_DETECTION)
          .build();
  // Create an operation that will contain the response when the operation completes.
  OperationFuture<AnnotateVideoResponse, AnnotateVideoProgress> response =
      client.annotateVideoAsync(request);

  System.out.println("Waiting for operation to complete...");
  for (VideoAnnotationResults results : response.get().getAnnotationResultsList()) {
    // process video / segment level label annotations
    System.out.println("Locations: ");
    for (LabelAnnotation labelAnnotation : results.getSegmentLabelAnnotationsList()) {
      System.out.println("Video label: " + labelAnnotation.getEntity().getDescription());
      // categories
      for (Entity categoryEntity : labelAnnotation.getCategoryEntitiesList()) {
        System.out.println("Video label category: " + categoryEntity.getDescription());
      }
      // segments
      for (LabelSegment segment : labelAnnotation.getSegmentsList()) {
        double startTime =
            segment.getSegment().getStartTimeOffset().getSeconds()
                + segment.getSegment().getStartTimeOffset().getNanos() / 1e9;
        double endTime =
            segment.getSegment().getEndTimeOffset().getSeconds()
                + segment.getSegment().getEndTimeOffset().getNanos() / 1e9;
        System.out.printf("Segment location: %.3f:%.2f\n", startTime, endTime);
        System.out.println("Confidence: " + segment.getConfidence());
      }
    }

    // process shot label annotations
    for (LabelAnnotation labelAnnotation : results.getShotLabelAnnotationsList()) {
      System.out.println("Shot label: " + labelAnnotation.getEntity().getDescription());
      // categories
      for (Entity categoryEntity : labelAnnotation.getCategoryEntitiesList()) {
        System.out.println("Shot label category: " + categoryEntity.getDescription());
      }
      // segments
      for (LabelSegment segment : labelAnnotation.getSegmentsList()) {
        double startTime =
            segment.getSegment().getStartTimeOffset().getSeconds()
                + segment.getSegment().getStartTimeOffset().getNanos() / 1e9;
        double endTime =
            segment.getSegment().getEndTimeOffset().getSeconds()
                + segment.getSegment().getEndTimeOffset().getNanos() / 1e9;
        System.out.printf("Segment location: %.3f:%.2f\n", startTime, endTime);
        System.out.println("Confidence: " + segment.getConfidence());
      }
    }

    // process frame label annotations
    for (LabelAnnotation labelAnnotation : results.getFrameLabelAnnotationsList()) {
      System.out.println("Frame label: " + labelAnnotation.getEntity().getDescription());
      // categories
      for (Entity categoryEntity : labelAnnotation.getCategoryEntitiesList()) {
        System.out.println("Frame label category: " + categoryEntity.getDescription());
      }
      // segments
      for (LabelSegment segment : labelAnnotation.getSegmentsList()) {
        double startTime =
            segment.getSegment().getStartTimeOffset().getSeconds()
                + segment.getSegment().getStartTimeOffset().getNanos() / 1e9;
        double endTime =
            segment.getSegment().getEndTimeOffset().getSeconds()
                + segment.getSegment().getEndTimeOffset().getNanos() / 1e9;
        System.out.printf("Segment location: %.3f:%.2f\n", startTime, endTime);
        System.out.println("Confidence: " + segment.getConfidence());
      }
    }
  }
}

Node.js

// Imports the Google Cloud Video Intelligence library + Node's fs library
const video = require('@google-cloud/video-intelligence').v1;
const fs = require('fs');
const util = require('util');

// Creates a client
const client = new video.VideoIntelligenceServiceClient();

/**
 * TODO(developer): Uncomment the following line before running the sample.
 */
// const path = 'Local file to analyze, e.g. ./my-file.mp4';

// Reads a local video file and converts it to base64
const readFile = util.promisify(fs.readFile);
const file = await readFile(path);
const inputContent = file.toString('base64');

// Constructs request
const request = {
  inputContent: inputContent,
  features: ['LABEL_DETECTION'],
};

// Detects labels in a video
const [operation] = await client.annotateVideo(request);
console.log('Waiting for operation to complete...');
const [operationResult] = await operation.promise();
// Gets annotations for video
const annotations = operationResult.annotationResults[0];

const labels = annotations.segmentLabelAnnotations;
labels.forEach(label => {
  console.log(`Label ${label.entity.description} occurs at:`);
  label.segments.forEach(segment => {
    const time = segment.segment;
    if (time.startTimeOffset.seconds === undefined) {
      time.startTimeOffset.seconds = 0;
    }
    if (time.startTimeOffset.nanos === undefined) {
      time.startTimeOffset.nanos = 0;
    }
    if (time.endTimeOffset.seconds === undefined) {
      time.endTimeOffset.seconds = 0;
    }
    if (time.endTimeOffset.nanos === undefined) {
      time.endTimeOffset.nanos = 0;
    }
    console.log(
      `\tStart: ${time.startTimeOffset.seconds}` +
        `.${(time.startTimeOffset.nanos / 1e6).toFixed(0)}s`
    );
    console.log(
      `\tEnd: ${time.endTimeOffset.seconds}.` +
        `${(time.endTimeOffset.nanos / 1e6).toFixed(0)}s`
    );
    console.log(`\tConfidence: ${segment.confidence}`);
  });
});

Python

Python 用 Video Intelligence API クライアントライブラリのインストールと使用方法の詳細については、Video Intelligence API クライアントライブラリをご覧ください。

"""Detect labels given a file path."""
video_client = videointelligence.VideoIntelligenceServiceClient()
features = [videointelligence.Feature.LABEL_DETECTION]

with io.open(path, "rb") as movie:
    input_content = movie.read()

operation = video_client.annotate_video(
    request={"features": features, "input_content": input_content}
)
print("\nProcessing video for label annotations:")

result = operation.result(timeout=90)
print("\nFinished processing.")

# Process video/segment level label annotations
segment_labels = result.annotation_results[0].segment_label_annotations
for i, segment_label in enumerate(segment_labels):
    print("Video label description: {}".format(segment_label.entity.description))
    for category_entity in segment_label.category_entities:
        print(
            "\tLabel category description: {}".format(category_entity.description)
        )

    for i, segment in enumerate(segment_label.segments):
        start_time = (
            segment.segment.start_time_offset.seconds
            + segment.segment.start_time_offset.microseconds / 1e6
        )
        end_time = (
            segment.segment.end_time_offset.seconds
            + segment.segment.end_time_offset.microseconds / 1e6
        )
        positions = "{}s to {}s".format(start_time, end_time)
        confidence = segment.confidence
        print("\tSegment {}: {}".format(i, positions))
        print("\tConfidence: {}".format(confidence))
    print("\n")

# Process shot level label annotations
shot_labels = result.annotation_results[0].shot_label_annotations
for i, shot_label in enumerate(shot_labels):
    print("Shot label description: {}".format(shot_label.entity.description))
    for category_entity in shot_label.category_entities:
        print(
            "\tLabel category description: {}".format(category_entity.description)
        )

    for i, shot in enumerate(shot_label.segments):
        start_time = (
            shot.segment.start_time_offset.seconds
            + shot.segment.start_time_offset.microseconds / 1e6
        )
        end_time = (
            shot.segment.end_time_offset.seconds
            + shot.segment.end_time_offset.microseconds / 1e6
        )
        positions = "{}s to {}s".format(start_time, end_time)
        confidence = shot.confidence
        print("\tSegment {}: {}".format(i, positions))
        print("\tConfidence: {}".format(confidence))
    print("\n")

# Process frame level label annotations
frame_labels = result.annotation_results[0].frame_label_annotations
for i, frame_label in enumerate(frame_labels):
    print("Frame label description: {}".format(frame_label.entity.description))
    for category_entity in frame_label.category_entities:
        print(
            "\tLabel category description: {}".format(category_entity.description)
        )

    # Each frame_label_annotation has many frames,
    # here we print information only about the first frame.
    frame = frame_label.frames[0]
    time_offset = frame.time_offset.seconds + frame.time_offset.microseconds / 1e6
    print("\tFirst frame time offset: {}s".format(time_offset))
    print("\tFirst frame confidence: {}".format(frame.confidence))
    print("\n")

その他の言語

C#: クライアントライブラリページの C# の設定手順を実行してから、.NET の Video Intelligence のリファレンスドキュメントをご覧ください。

PHP: クライアントライブラリページの PHP の設定手順を実行してから、PHP の Video Intelligence のリファレンスドキュメントをご覧ください。

Ruby: クライアントライブラリページの Ruby の設定手順を実行してから、Ruby の Video Intelligence のリファレンスドキュメントをご覧ください。

Cloud Storage 上のファイルにアノテーションを付ける

Cloud Storage にあるファイルのラベルに対して動画分析を実行する例を次に示します。

REST

プロセスリクエストを送信する

POST リクエストを annotate メソッドに送信する方法を以下に示します。この例では、Google Cloud CLI を使用するプロジェクト用に設定されたサービスアカウントのアクセストークンを使用します。Google Cloud CLI のインストール、サービスアカウントでのプロジェクトの設定、アクセストークンの取得を行う手順については、Video Intelligence のクイックスタートをご覧ください。

リクエストのデータを使用する前に、次のように置き換えます。

INPUT_URI: アノテーションを付けるファイルを含む Cloud Storage バケット（ファイル名を含む）。gs:// で始まる必要があります。
PROJECT_NUMBER: Google Cloud プロジェクトの数値識別子。

HTTP メソッドと URL:

POST https://videointelligence.googleapis.com/v1/videos:annotate

リクエストの本文（JSON）:

{
  "inputUri": "INPUT_URI",
  "features": ["LABEL_DETECTION"],
}

リクエストを送信するには、次のいずれかのオプションを展開します。

curl（Linux、macOS、Cloud Shell）

リクエスト本文を request.json という名前のファイルに保存して、次のコマンドを実行します。

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: PROJECT_NUMBER" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://videointelligence.googleapis.com/v1/videos:annotate"

PowerShell（Windows）

リクエスト本文を request.json という名前のファイルに保存して、次のコマンドを実行します。

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_NUMBER" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://videointelligence.googleapis.com/v1/videos:annotate" | Select-Object -Expand Content

次のような JSON レスポンスが返されます。

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/operations/OPERATION_ID"
}

リクエストが成功すると、Video Intelligence はオペレーションの名前を返します。

結果を取得する

リクエストのデータを使用する前に、次のように置き換えます。

OPERATION_NAME: Video Intelligence API によって返されるオペレーションの名前。オペレーション名の形式は projects/PROJECT_NUMBER/locations/LOCATION_ID/operations/OPERATION_ID です。
PROJECT_NUMBER: Google Cloud プロジェクトの数値識別子。

HTTP メソッドと URL:

GET https://videointelligence.googleapis.com/v1/OPERATION_NAME

リクエストを送信するには、次のいずれかのオプションを展開します。

curl（Linux、macOS、Cloud Shell）

次のコマンドを実行します。

curl -X GET \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: PROJECT_NUMBER" \
     "https://videointelligence.googleapis.com/v1/OPERATION_NAME"

PowerShell（Windows）

次のコマンドを実行します。

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_NUMBER" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://videointelligence.googleapis.com/v1/OPERATION_NAME" | Select-Object -Expand Content

次のような JSON レスポンスが返されます。

レスポンス

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.videointelligence.v1.AnnotateVideoProgress",
    "annotationProgress": [
      {
        "inputUri": "INPUT_URI",
        "progressPercent": 100,
        "startTime": "2019-11-13T19:25:56.206335Z",
        "updateTime": "2019-11-13T19:26:16.215615Z"
      }
    ]
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.cloud.videointelligence.v1.AnnotateVideoResponse",
    "annotationResults": [
      {
        "inputUri": "INPUT_URI",
        "segmentLabelAnnotations": [
          {
            "entity": {
              "entityId": "/m/01vkl",
              "description": "circle",
              "languageCode": "en-US"
            },
            "categoryEntities": [
              {
                "entityId": "/m/016nqd",
                "description": "shape",
                "languageCode": "en-US"
              }
            ],
            "segments": [
              {
                "segment": {
                  "startTimeOffset": "0s",
                  "endTimeOffset": "16.416666s"
                },
                "confidence": 0.36535457
              }
            ]
          },
          ...
        ]
     }
    ]
  }
}

アノテーションの結果をダウンロードする

アノテーションを、送信元バケットから送信先バケットにコピーします（ファイルとオブジェクトのコピーをご覧ください）。

gcloud storage cp gcs_uri gs://my-bucket

注: 出力 GCS URI がユーザーによって指定された場合、アノテーションはその GCS URI に格納されます。

Go


func labelURI(w io.Writer, file string) error {
	ctx := context.Background()
	client, err := video.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("video.NewClient: %w", err)
	}
	defer client.Close()

	op, err := client.AnnotateVideo(ctx, &videopb.AnnotateVideoRequest{
		Features: []videopb.Feature{
			videopb.Feature_LABEL_DETECTION,
		},
		InputUri: file,
	})
	if err != nil {
		return fmt.Errorf("AnnotateVideo: %w", err)
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		return fmt.Errorf("Wait: %w", err)
	}

	printLabels := func(labels []*videopb.LabelAnnotation) {
		for _, label := range labels {
			fmt.Fprintf(w, "\tDescription: %s\n", label.Entity.Description)
			for _, category := range label.CategoryEntities {
				fmt.Fprintf(w, "\t\tCategory: %s\n", category.Description)
			}
			for _, segment := range label.Segments {
				start, _ := ptypes.Duration(segment.Segment.StartTimeOffset)
				end, _ := ptypes.Duration(segment.Segment.EndTimeOffset)
				fmt.Fprintf(w, "\t\tSegment: %s to %s\n", start, end)
			}
		}
	}

	// A single video was processed. Get the first result.
	result := resp.AnnotationResults[0]

	fmt.Fprintln(w, "SegmentLabelAnnotations:")
	printLabels(result.SegmentLabelAnnotations)
	fmt.Fprintln(w, "ShotLabelAnnotations:")
	printLabels(result.ShotLabelAnnotations)
	fmt.Fprintln(w, "FrameLabelAnnotations:")
	printLabels(result.FrameLabelAnnotations)

	return nil
}

Java

// Instantiate a com.google.cloud.videointelligence.v1.VideoIntelligenceServiceClient
try (VideoIntelligenceServiceClient client = VideoIntelligenceServiceClient.create()) {
  // Provide path to file hosted on GCS as "gs://bucket-name/..."
  AnnotateVideoRequest request =
      AnnotateVideoRequest.newBuilder()
          .setInputUri(gcsUri)
          .addFeatures(Feature.LABEL_DETECTION)
          .build();
  // Create an operation that will contain the response when the operation completes.
  OperationFuture<AnnotateVideoResponse, AnnotateVideoProgress> response =
      client.annotateVideoAsync(request);

  System.out.println("Waiting for operation to complete...");
  for (VideoAnnotationResults results : response.get().getAnnotationResultsList()) {
    // process video / segment level label annotations
    System.out.println("Locations: ");
    for (LabelAnnotation labelAnnotation : results.getSegmentLabelAnnotationsList()) {
      System.out.println("Video label: " + labelAnnotation.getEntity().getDescription());
      // categories
      for (Entity categoryEntity : labelAnnotation.getCategoryEntitiesList()) {
        System.out.println("Video label category: " + categoryEntity.getDescription());
      }
      // segments
      for (LabelSegment segment : labelAnnotation.getSegmentsList()) {
        double startTime =
            segment.getSegment().getStartTimeOffset().getSeconds()
                + segment.getSegment().getStartTimeOffset().getNanos() / 1e9;
        double endTime =
            segment.getSegment().getEndTimeOffset().getSeconds()
                + segment.getSegment().getEndTimeOffset().getNanos() / 1e9;
        System.out.printf("Segment location: %.3f:%.3f\n", startTime, endTime);
        System.out.println("Confidence: " + segment.getConfidence());
      }
    }

    // process shot label annotations
    for (LabelAnnotation labelAnnotation : results.getShotLabelAnnotationsList()) {
      System.out.println("Shot label: " + labelAnnotation.getEntity().getDescription());
      // categories
      for (Entity categoryEntity : labelAnnotation.getCategoryEntitiesList()) {
        System.out.println("Shot label category: " + categoryEntity.getDescription());
      }
      // segments
      for (LabelSegment segment : labelAnnotation.getSegmentsList()) {
        double startTime =
            segment.getSegment().getStartTimeOffset().getSeconds()
                + segment.getSegment().getStartTimeOffset().getNanos() / 1e9;
        double endTime =
            segment.getSegment().getEndTimeOffset().getSeconds()
                + segment.getSegment().getEndTimeOffset().getNanos() / 1e9;
        System.out.printf("Segment location: %.3f:%.3f\n", startTime, endTime);
        System.out.println("Confidence: " + segment.getConfidence());
      }
    }

    // process frame label annotations
    for (LabelAnnotation labelAnnotation : results.getFrameLabelAnnotationsList()) {
      System.out.println("Frame label: " + labelAnnotation.getEntity().getDescription());
      // categories
      for (Entity categoryEntity : labelAnnotation.getCategoryEntitiesList()) {
        System.out.println("Frame label category: " + categoryEntity.getDescription());
      }
      // segments
      for (LabelSegment segment : labelAnnotation.getSegmentsList()) {
        double startTime =
            segment.getSegment().getStartTimeOffset().getSeconds()
                + segment.getSegment().getStartTimeOffset().getNanos() / 1e9;
        double endTime =
            segment.getSegment().getEndTimeOffset().getSeconds()
                + segment.getSegment().getEndTimeOffset().getNanos() / 1e9;
        System.out.printf("Segment location: %.3f:%.2f\n", startTime, endTime);
        System.out.println("Confidence: " + segment.getConfidence());
      }
    }
  }
}

Node.js

// Imports the Google Cloud Video Intelligence library
const video = require('@google-cloud/video-intelligence').v1;

// Creates a client
const client = new video.VideoIntelligenceServiceClient();

/**
 * TODO(developer): Uncomment the following line before running the sample.
 */
// const gcsUri = 'GCS URI of the video to analyze, e.g. gs://my-bucket/my-video.mp4';

const request = {
  inputUri: gcsUri,
  features: ['LABEL_DETECTION'],
};

// Detects labels in a video
const [operation] = await client.annotateVideo(request);
console.log('Waiting for operation to complete...');
const [operationResult] = await operation.promise();

// Gets annotations for video
const annotations = operationResult.annotationResults[0];

const labels = annotations.segmentLabelAnnotations;
labels.forEach(label => {
  console.log(`Label ${label.entity.description} occurs at:`);
  label.segments.forEach(segment => {
    const time = segment.segment;
    if (time.startTimeOffset.seconds === undefined) {
      time.startTimeOffset.seconds = 0;
    }
    if (time.startTimeOffset.nanos === undefined) {
      time.startTimeOffset.nanos = 0;
    }
    if (time.endTimeOffset.seconds === undefined) {
      time.endTimeOffset.seconds = 0;
    }
    if (time.endTimeOffset.nanos === undefined) {
      time.endTimeOffset.nanos = 0;
    }
    console.log(
      `\tStart: ${time.startTimeOffset.seconds}` +
        `.${(time.startTimeOffset.nanos / 1e6).toFixed(0)}s`
    );
    console.log(
      `\tEnd: ${time.endTimeOffset.seconds}.` +
        `${(time.endTimeOffset.nanos / 1e6).toFixed(0)}s`
    );
    console.log(`\tConfidence: ${segment.confidence}`);
  });
});

Python

"""Detects labels given a GCS path."""
video_client = videointelligence.VideoIntelligenceServiceClient()
features = [videointelligence.Feature.LABEL_DETECTION]

mode = videointelligence.LabelDetectionMode.SHOT_AND_FRAME_MODE
config = videointelligence.LabelDetectionConfig(label_detection_mode=mode)
context = videointelligence.VideoContext(label_detection_config=config)

operation = video_client.annotate_video(
    request={"features": features, "input_uri": path, "video_context": context}
)
print("\nProcessing video for label annotations:")

result = operation.result(timeout=180)
print("\nFinished processing.")

# Process video/segment level label annotations
segment_labels = result.annotation_results[0].segment_label_annotations
for i, segment_label in enumerate(segment_labels):
    print("Video label description: {}".format(segment_label.entity.description))
    for category_entity in segment_label.category_entities:
        print(
            "\tLabel category description: {}".format(category_entity.description)
        )

    for i, segment in enumerate(segment_label.segments):
        start_time = (
            segment.segment.start_time_offset.seconds
            + segment.segment.start_time_offset.microseconds / 1e6
        )
        end_time = (
            segment.segment.end_time_offset.seconds
            + segment.segment.end_time_offset.microseconds / 1e6
        )
        positions = "{}s to {}s".format(start_time, end_time)
        confidence = segment.confidence
        print("\tSegment {}: {}".format(i, positions))
        print("\tConfidence: {}".format(confidence))
    print("\n")

# Process shot level label annotations
shot_labels = result.annotation_results[0].shot_label_annotations
for i, shot_label in enumerate(shot_labels):
    print("Shot label description: {}".format(shot_label.entity.description))
    for category_entity in shot_label.category_entities:
        print(
            "\tLabel category description: {}".format(category_entity.description)
        )

    for i, shot in enumerate(shot_label.segments):
        start_time = (
            shot.segment.start_time_offset.seconds
            + shot.segment.start_time_offset.microseconds / 1e6
        )
        end_time = (
            shot.segment.end_time_offset.seconds
            + shot.segment.end_time_offset.microseconds / 1e6
        )
        positions = "{}s to {}s".format(start_time, end_time)
        confidence = shot.confidence
        print("\tSegment {}: {}".format(i, positions))
        print("\tConfidence: {}".format(confidence))
    print("\n")

# Process frame level label annotations
frame_labels = result.annotation_results[0].frame_label_annotations
for i, frame_label in enumerate(frame_labels):
    print("Frame label description: {}".format(frame_label.entity.description))
    for category_entity in frame_label.category_entities:
        print(
            "\tLabel category description: {}".format(category_entity.description)
        )

    # Each frame_label_annotation has many frames,
    # here we print information only about the first frame.
    frame = frame_label.frames[0]
    time_offset = frame.time_offset.seconds + frame.time_offset.microseconds / 1e6
    print("\tFirst frame time offset: {}s".format(time_offset))
    print("\tFirst frame confidence: {}".format(frame.confidence))
    print("\n")

その他の言語

C#: クライアントライブラリページの C# の設定手順を実行してから、.NET の Video Intelligence のリファレンスドキュメントをご覧ください。

PHP: クライアントライブラリページの PHP の設定手順を実行してから、PHP の Video Intelligence のリファレンスドキュメントをご覧ください。

Ruby: クライアントライブラリページの Ruby の設定手順を実行してから、Ruby の Video Intelligence のリファレンスドキュメントをご覧ください。

動画の対ラベル分析

ローカル ファイルにアノテーションを付ける

REST

プロセス リクエストを送信する

curl（Linux、macOS、Cloud Shell）

PowerShell（Windows）

結果を取得する

curl（Linux、macOS、Cloud Shell）

PowerShell（Windows）

レスポンス

Go

Java

Node.js

Python

その他の言語

Cloud Storage 上のファイルにアノテーションを付ける

REST

プロセス リクエストを送信する

curl（Linux、macOS、Cloud Shell）

PowerShell（Windows）

結果を取得する

curl（Linux、macOS、Cloud Shell）

PowerShell（Windows）

レスポンス

アノテーションの結果をダウンロードする

Go

Java

Node.js

Python

その他の言語

ローカルファイルにアノテーションを付ける

プロセスリクエストを送信する

プロセスリクエストを送信する