Rastrear objetos

O rastreamento de objetos acompanha objetos detectados em um vídeo de entrada. Para fazer uma solicitação de rastreamento de objetos, chame o método annotate e especifique OBJECT_TRACKING no campo features.

Uma solicitação de rastreamento de objeto anota o vídeo com os rótulos apropriados de entidades e locais espaciais detectados em um vídeo ou segmentos de vídeo. Por exemplo, um vídeo de veículos atravessando um sinal de trânsito pode produzir rótulos como "carro", "caminhão", "bicicleta", "pneus", "luzes", "janela" e assim por diante. Cada rótulo pode incluir uma série de caixas delimitadoras, com cada caixa delimitadora tendo um segmento de tempo associado que contém uma diferença de horário que indica a diferença de duração desde o início do vídeo. A anotação também contém outras informações da entidade, inclusive um ID da entidade que pode ser usado para saber mais sobre a entidade na API Google Knowledge Graph Search (em inglês).

Rastreamento de objetos vs. detecção de rótulos

O rastreamento de objetos é diferente da detecção de rótulos. A detecção de rótulos fornece rótulos sem caixas delimitadoras, enquanto o rastreamento de objetos fornece os rótulos dos objetos individuais presentes em um determinado vídeo com a caixa delimitadora de cada instância de objeto em cada etapa de tempo.

Várias instâncias do mesmo tipo de objeto são atribuídas a diferentes instâncias da mensagem ObjectTrackingAnnotation, em que todas as ocorrências de uma determinada faixa de objeto são mantidas na própria instância de ObjectTrackingAnnotation. Por exemplo, se houver um carro vermelho e um carro azul por cinco segundos em um vídeo, a solicitação de rastreamento retornará duas instâncias de ObjectTrackingAnnotation. A primeira instância conterá a localização de um dos dois carros, por exemplo, o carro vermelho, enquanto a segunda conterá os locais do outro.

Solicitar rastreamento de objetos para um vídeo no Cloud Storage

As amostras a seguir demonstram o rastreamento de objetos em um arquivo localizado no Cloud Storage.

REST

Enviar a solicitação de processo

Veja a seguir como enviar uma solicitação POST para o método annotate. Nele, é usado o token de acesso de uma conta de serviço configurada para o projeto usando a Google Cloud CLI. Para instruções sobre como instalar a Google Cloud CLI, configurar um projeto com uma conta de serviço e conseguir um token de acesso, consulte o guia de início rápido do Video Intelligence.

Antes de usar os dados da solicitação, faça as substituições a seguir:

INPUT_URI: STORAGE_URI
Exemplo:
"inputUri": "gs://cloud-videointelligence-demo/assistant.mp4",
PROJECT_NUMBER: o identificador numérico do projeto do Google Cloud

Método HTTP e URL:

POST https://videointelligence.googleapis.com/v1/videos:annotate

Corpo JSON da solicitação:

{
  "inputUri": "STORAGE_URI",
  "features": ["OBJECT_TRACKING"]
}

Para enviar a solicitação, expanda uma destas opções:

curl (Linux, macOS ou Cloud Shell)

Observação: o comando a seguir pressupõe que você fez login na CLI gcloud com sua conta de usuário executando gcloud init ou gcloud auth login, ou usando o Cloud Shell, que faz login automaticamente na CLI gcloud. É possível verificar a conta ativa atual executando gcloud auth list.

Salve o corpo da solicitação em um arquivo com o nome request.json e execute o comando a seguir:

curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "x-goog-user-project: PROJECT_NUMBER" \
    -H "Content-Type: application/json; charset=utf-8" \
    -d @request.json \
    "https://videointelligence.googleapis.com/v1/videos:annotate"

PowerShell (Windows)

Salve o corpo da solicitação em um arquivo com o nome request.json e execute o comando a seguir:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_NUMBER" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://videointelligence.googleapis.com/v1/videos:annotate" | Select-Object -Expand Content

Você receberá uma resposta JSON semelhante a esta:

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/operations/OPERATION_ID"
}

Se a solicitação for bem-sucedida, a API Video Intelligence retornará o name da sua operação. O exemplo acima mostra um exemplo dessa resposta, em que PROJECT_NUMBER é o número do projeto e OPERATION_ID é o ID da operação de longa duração criado para a solicitação.

Ver os resultados

Para receber os resultados da solicitação, envie um GET usando o nome da operação retornado da chamada para videos:annotate, conforme mostrado no exemplo a seguir.

Antes de usar os dados da solicitação, faça as substituições a seguir:

OPERATION_NAME: o nome da operação, conforme retornado pela API Video Intelligence. O nome da operação tem o formato projects/PROJECT_NUMBER/locations/LOCATION_ID/operations/OPERATION_ID.
PROJECT_NUMBER: o identificador numérico do projeto do Google Cloud

Método HTTP e URL:

GET https://videointelligence.googleapis.com/v1/OPERATION_NAME

Para enviar a solicitação, expanda uma destas opções:

curl (Linux, macOS ou Cloud Shell)

execute o seguinte comando:

curl -X GET \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "x-goog-user-project: PROJECT_NUMBER" \
    "https://videointelligence.googleapis.com/v1/OPERATION_NAME"

PowerShell (Windows)

execute o seguinte comando:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_NUMBER" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://videointelligence.googleapis.com/v1/OPERATION_NAME" | Select-Object -Expand Content

Você receberá uma resposta JSON semelhante a esta:

Resposta

// Object tracking annotations are returned as a objectAnnotations list.
{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.videointelligence.v1.AnnotateVideoProgress",
    "annotationProgress": [
      {
        "inputUri": "/cloud-ml-sandbox/video/chicago.mp4",
        "progressPercent": 100,
        "startTime": "2019-12-21T16:56:46.755199Z",
        "updateTime": "2019-12-21T16:59:17.911197Z"
      }
    ]
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.cloud.videointelligence.v1.AnnotateVideoResponse",
    "annotationResults": [
      {
        "inputUri": "/cloud-ml-sandbox/video/chicago.mp4",
        "objectAnnotations": [
          {
            "entity": {
              "entityId": "/m/0k4j",
              "description": "car",
              "languageCode": "en-US"
            },
            "frames": [
              {
                "normalizedBoundingBox": {
                  "left": 0.2672763,
                  "top": 0.5677657,
                  "right": 0.4388713,
                  "bottom": 0.7623171
                },
                "timeOffset": "0s"
              },
              {
                "normalizedBoundingBox": {
                  "left": 0.26920167,
                  "top": 0.5659805,
                  "right": 0.44331276,
                  "bottom": 0.76780635
                },
                "timeOffset": "0.100495s"
              },
           ...
              {
                "normalizedBoundingBox": {
                  "left": 0.83573246,
                  "top": 0.6645812,
                  "right": 1,
                  "bottom": 0.99865407
                },
                "timeOffset": "2.311402s"
              }
            ],
            "segment": {
              "startTimeOffset": "0s",
              "endTimeOffset": "2.311402s"
            },
            "confidence": 0.99488896
          },
        ...
          {
            "entity": {
              "entityId": "/m/0cgh4",
              "description": "building",
              "languageCode": "en-US"
            },
            "frames": [
              {
                "normalizedBoundingBox": {
                  "left": 0.12340179,
                  "top": 0.010383379,
                  "right": 0.21914443,
                  "bottom": 0.5591795
                },
                "timeOffset": "0s"
              },
              {
                "normalizedBoundingBox": {
                  "left": 0.12340179,
                  "top": 0.009684974,
                  "right": 0.22915152,
                  "bottom": 0.56070584
                },
                "timeOffset": "0.100495s"
              },
           ...
              {
                "normalizedBoundingBox": {
                  "left": 0.12340179,
                  "top": 0.008624528,
                  "right": 0.22723165,
                  "bottom": 0.56158626
                },
                "timeOffset": "0.401983s"
              }
            ],
            "segment": {
              "startTimeOffset": "0s",
              "endTimeOffset": "0.401983s"
            },
            "confidence": 0.33914912
          },
       ...
          {
            "entity": {
              "entityId": "/m/0cgh4",
              "description": "building",
              "languageCode": "en-US"
            },
            "frames": [
              {
                "normalizedBoundingBox": {
                  "left": 0.79324204,
                  "top": 0.0006896425,
                  "right": 0.99659824,
                  "bottom": 0.5324423
                },
                "timeOffset": "37.585421s"
              },
              {
                "normalizedBoundingBox": {
                  "left": 0.78935236,
                  "top": 0.0011992548,
                  "right": 0.99659824,
                  "bottom": 0.5374946
                },
                "timeOffset": "37.685917s"
              },
           ...
              {
                "normalizedBoundingBox": {
                  "left": 0.79404694,
                  "right": 0.99659824,
                  "bottom": 0.5280966
                },
                "timeOffset": "38.590379s"
              }
            ],
            "segment": {
              "startTimeOffset": "37.585421s",
              "endTimeOffset": "38.590379s"
            },
            "confidence": 0.3415429
          }
        ]
      }
    ]
  }
}

Fazer o download dos resultados da anotação

Copie a anotação da origem e a cole no bucket de destino: consulte Copiar arquivos e objetos

gsutil cp gcs_uri gs://my-bucket

Observação: se o URI de saída do GCS for fornecido pelo usuário, a anotação será armazenada nesse URI.

Go


import (
	"context"
	"fmt"
	"io"

	video "cloud.google.com/go/videointelligence/apiv1"
	videopb "cloud.google.com/go/videointelligence/apiv1/videointelligencepb"
	"github.com/golang/protobuf/ptypes"
)

// objectTrackingGCS analyzes a video and extracts entities with their bounding boxes.
func objectTrackingGCS(w io.Writer, gcsURI string) error {
	// gcsURI := "gs://cloud-samples-data/video/cat.mp4"

	ctx := context.Background()

	// Creates a client.
	client, err := video.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("video.NewClient: %w", err)
	}
	defer client.Close()

	op, err := client.AnnotateVideo(ctx, &videopb.AnnotateVideoRequest{
		InputUri: gcsURI,
		Features: []videopb.Feature{
			videopb.Feature_OBJECT_TRACKING,
		},
	})
	if err != nil {
		return fmt.Errorf("AnnotateVideo: %w", err)
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		return fmt.Errorf("Wait: %w", err)
	}

	// Only one video was processed, so get the first result.
	result := resp.GetAnnotationResults()[0]

	for _, annotation := range result.ObjectAnnotations {
		fmt.Fprintf(w, "Description: %q\n", annotation.Entity.GetDescription())
		if len(annotation.Entity.EntityId) > 0 {
			fmt.Fprintf(w, "\tEntity ID: %q\n", annotation.Entity.GetEntityId())
		}

		segment := annotation.GetSegment()
		start, _ := ptypes.Duration(segment.GetStartTimeOffset())
		end, _ := ptypes.Duration(segment.GetEndTimeOffset())
		fmt.Fprintf(w, "\tSegment: %v to %v\n", start, end)

		fmt.Fprintf(w, "\tConfidence: %f\n", annotation.GetConfidence())

		// Here we print only the bounding box of the first frame in this segment.
		frame := annotation.GetFrames()[0]
		seconds := float32(frame.GetTimeOffset().GetSeconds())
		nanos := float32(frame.GetTimeOffset().GetNanos())
		fmt.Fprintf(w, "\tTime offset of the first frame: %fs\n", seconds+nanos/1e9)

		box := frame.GetNormalizedBoundingBox()
		fmt.Fprintf(w, "\tBounding box position:\n")
		fmt.Fprintf(w, "\t\tleft  : %f\n", box.GetLeft())
		fmt.Fprintf(w, "\t\ttop   : %f\n", box.GetTop())
		fmt.Fprintf(w, "\t\tright : %f\n", box.GetRight())
		fmt.Fprintf(w, "\t\tbottom: %f\n", box.GetBottom())
	}

	return nil
}

Java

/**
 * Track objects in a video.
 *
 * @param gcsUri the path to the video file to analyze.
 */
public static VideoAnnotationResults trackObjectsGcs(String gcsUri) throws Exception {
  try (VideoIntelligenceServiceClient client = VideoIntelligenceServiceClient.create()) {
    // Create the request
    AnnotateVideoRequest request =
        AnnotateVideoRequest.newBuilder()
            .setInputUri(gcsUri)
            .addFeatures(Feature.OBJECT_TRACKING)
            .setLocationId("us-east1")
            .build();

    // asynchronously perform object tracking on videos
    OperationFuture<AnnotateVideoResponse, AnnotateVideoProgress> future =
        client.annotateVideoAsync(request);

    System.out.println("Waiting for operation to complete...");
    // The first result is retrieved because a single video was processed.
    AnnotateVideoResponse response = future.get(450, TimeUnit.SECONDS);
    VideoAnnotationResults results = response.getAnnotationResults(0);

    // Get only the first annotation for demo purposes.
    ObjectTrackingAnnotation annotation = results.getObjectAnnotations(0);
    System.out.println("Confidence: " + annotation.getConfidence());

    if (annotation.hasEntity()) {
      Entity entity = annotation.getEntity();
      System.out.println("Entity description: " + entity.getDescription());
      System.out.println("Entity id:: " + entity.getEntityId());
    }

    if (annotation.hasSegment()) {
      VideoSegment videoSegment = annotation.getSegment();
      Duration startTimeOffset = videoSegment.getStartTimeOffset();
      Duration endTimeOffset = videoSegment.getEndTimeOffset();
      // Display the segment time in seconds, 1e9 converts nanos to seconds
      System.out.println(
          String.format(
              "Segment: %.2fs to %.2fs",
              startTimeOffset.getSeconds() + startTimeOffset.getNanos() / 1e9,
              endTimeOffset.getSeconds() + endTimeOffset.getNanos() / 1e9));
    }

    // Here we print only the bounding box of the first frame in this segment.
    ObjectTrackingFrame frame = annotation.getFrames(0);
    // Display the offset time in seconds, 1e9 converts nanos to seconds
    Duration timeOffset = frame.getTimeOffset();
    System.out.println(
        String.format(
            "Time offset of the first frame: %.2fs",
            timeOffset.getSeconds() + timeOffset.getNanos() / 1e9));

    // Display the bounding box of the detected object
    NormalizedBoundingBox normalizedBoundingBox = frame.getNormalizedBoundingBox();
    System.out.println("Bounding box position:");
    System.out.println("\tleft: " + normalizedBoundingBox.getLeft());
    System.out.println("\ttop: " + normalizedBoundingBox.getTop());
    System.out.println("\tright: " + normalizedBoundingBox.getRight());
    System.out.println("\tbottom: " + normalizedBoundingBox.getBottom());
    return results;
  }
}

Node.js

Para autenticar na Video Intelligence, configure o Application Default Credentials. Para mais informações, consulte Configurar a autenticação para um ambiente de desenvolvimento local.

// Imports the Google Cloud Video Intelligence library
const Video = require('@google-cloud/video-intelligence');

// Creates a client
const video = new Video.VideoIntelligenceServiceClient();

/**
 * TODO(developer): Uncomment the following line before running the sample.
 */
// const gcsUri = 'GCS URI of the video to analyze, e.g. gs://my-bucket/my-video.mp4';

const request = {
  inputUri: gcsUri,
  features: ['OBJECT_TRACKING'],
  //recommended to use us-east1 for the best latency due to different types of processors used in this region and others
  locationId: 'us-east1',
};
// Detects objects in a video
const [operation] = await video.annotateVideo(request);
const results = await operation.promise();
console.log('Waiting for operation to complete...');
//Gets annotations for video
const annotations = results[0].annotationResults[0];
const objects = annotations.objectAnnotations;
objects.forEach(object => {
  console.log(`Entity description:  ${object.entity.description}`);
  console.log(`Entity id: ${object.entity.entityId}`);
  const time = object.segment;
  console.log(
    `Segment: ${time.startTimeOffset.seconds || 0}` +
      `.${(time.startTimeOffset.nanos / 1e6).toFixed(0)}s to ${
        time.endTimeOffset.seconds || 0
      }.` +
      `${(time.endTimeOffset.nanos / 1e6).toFixed(0)}s`
  );
  console.log(`Confidence: ${object.confidence}`);
  const frame = object.frames[0];
  const box = frame.normalizedBoundingBox;
  const timeOffset = frame.timeOffset;
  console.log(
    `Time offset for the first frame: ${timeOffset.seconds || 0}` +
      `.${(timeOffset.nanos / 1e6).toFixed(0)}s`
  );
  console.log('Bounding box position:');
  console.log(` left   :${box.left}`);
  console.log(` top    :${box.top}`);
  console.log(` right  :${box.right}`);
  console.log(` bottom :${box.bottom}`);
});

Python

"""Object tracking in a video stored on GCS."""
from google.cloud import videointelligence

video_client = videointelligence.VideoIntelligenceServiceClient()
features = [videointelligence.Feature.OBJECT_TRACKING]
operation = video_client.annotate_video(
    request={"features": features, "input_uri": gcs_uri}
)
print("\nProcessing video for object annotations.")

result = operation.result(timeout=500)
print("\nFinished processing.\n")

# The first result is retrieved because a single video was processed.
object_annotations = result.annotation_results[0].object_annotations

for object_annotation in object_annotations:
    print("Entity description: {}".format(object_annotation.entity.description))
    if object_annotation.entity.entity_id:
        print("Entity id: {}".format(object_annotation.entity.entity_id))

    print(
        "Segment: {}s to {}s".format(
            object_annotation.segment.start_time_offset.seconds
            + object_annotation.segment.start_time_offset.microseconds / 1e6,
            object_annotation.segment.end_time_offset.seconds
            + object_annotation.segment.end_time_offset.microseconds / 1e6,
        )
    )

    print("Confidence: {}".format(object_annotation.confidence))

    # Here we print only the bounding box of the first frame in the segment
    frame = object_annotation.frames[0]
    box = frame.normalized_bounding_box
    print(
        "Time offset of the first frame: {}s".format(
            frame.time_offset.seconds + frame.time_offset.microseconds / 1e6
        )
    )
    print("Bounding box position:")
    print("\tleft  : {}".format(box.left))
    print("\ttop   : {}".format(box.top))
    print("\tright : {}".format(box.right))
    print("\tbottom: {}".format(box.bottom))
    print("\n")

Outras linguagens

C#: Siga as Instruções de configuração do C# na página das bibliotecas de cliente e acesse a Documentação de referência do Video Intelligence para .NET.

PHP: Siga as Instruções de configuração do PHP na página das bibliotecas de cliente e acesse a Documentação de referência do Video Intelligence para PHP.

Ruby: Siga as Instruções de configuração do Ruby na página das bibliotecas de cliente e acesse a Documentação de referência do Video Intelligence para Ruby.

Solicitar rastreamento de objetos para vídeo de um arquivo local

As amostras a seguir demonstram o rastreamento de objetos em um arquivo armazenado localmente.

REST

Enviar a solicitação de processo

Para realizar a anotação em um arquivo de vídeo local, codifique em base64 o conteúdo do arquivo de vídeo. Inclua o conteúdo codificado em base64 no campo inputContent da solicitação. Para informações sobre como codificar o conteúdo de um arquivo de vídeo em base64, consulte Codificação em Base64.

Veja a seguir como enviar uma solicitação POST para o método videos:annotate. Nele, é usado o token de acesso de uma conta de serviço configurada para o projeto usando a Google Cloud CLI. Para instruções sobre como instalar a Google Cloud CLI, configurar um projeto com uma conta de serviço e conseguir um token de acesso, consulte o guia de início rápido do Video Intelligence.

Antes de usar os dados da solicitação, faça as substituições a seguir:

inputContent: BASE64_ENCODED_CONTENT
Por exemplo: "UklGRg41AwBBVkkgTElTVAwBAABoZHJsYXZpaDgAAAA1ggAAxPMBAAAAAAAQCAA..."
PROJECT_NUMBER: o identificador numérico do projeto do Google Cloud

Método HTTP e URL:

POST https://videointelligence.googleapis.com/v1/videos:annotate

Corpo JSON da solicitação:

{
  "inputContent": "BASE64_ENCODED_CONTENT",
  "features": ["OBJECT_TRACKING"]
}

Para enviar a solicitação, expanda uma destas opções:

curl (Linux, macOS ou Cloud Shell)

Salve o corpo da solicitação em um arquivo com o nome request.json e execute o comando a seguir:

curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "x-goog-user-project: PROJECT_NUMBER" \
    -H "Content-Type: application/json; charset=utf-8" \
    -d @request.json \
    "https://videointelligence.googleapis.com/v1/videos:annotate"

PowerShell (Windows)

Salve o corpo da solicitação em um arquivo com o nome request.json e execute o comando a seguir:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_NUMBER" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://videointelligence.googleapis.com/v1/videos:annotate" | Select-Object -Expand Content

Você receberá uma resposta JSON semelhante a esta:

Resposta

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/operations/OPERATION_ID"
}

Se a solicitação for bem-sucedida, a Video Intelligence retornará o name para sua operação. Veja a seguir um exemplo dessa resposta, em que PROJECT_NUMBER é o número do projeto e OPERATION_ID é o ID da operação de longa duração criada para a solicitação.

Ver os resultados

Para receber os resultados da solicitação, envie um GET usando o nome da operação retornado da chamada para videos:annotate, conforme mostrado no exemplo a seguir.

Antes de usar os dados da solicitação, faça as substituições a seguir:

OPERATION_NAME: o nome da operação, conforme retornado pela API Video Intelligence. O nome da operação tem o formato projects/PROJECT_NUMBER/locations/LOCATION_ID/operations/OPERATION_ID.
PROJECT_NUMBER: o identificador numérico do projeto do Google Cloud

Método HTTP e URL:

GET https://videointelligence.googleapis.com/v1/OPERATION_NAME

Para enviar a solicitação, expanda uma destas opções:

curl (Linux, macOS ou Cloud Shell)

execute o seguinte comando:

curl -X GET \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "x-goog-user-project: PROJECT_NUMBER" \
    "https://videointelligence.googleapis.com/v1/OPERATION_NAME"

PowerShell (Windows)

execute o seguinte comando:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_NUMBER" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://videointelligence.googleapis.com/v1/OPERATION_NAME" | Select-Object -Expand Content

Você receberá uma resposta JSON semelhante a esta:

Resposta

// Object tracking annotations are returned as a objectAnnotations list.
{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.videointelligence.v1.AnnotateVideoProgress",
    "annotationProgress": [
      {
        "inputContent": "UklGRg41AwBBVkkgTElTVAwBAABoZHJsYXZpaDgAAAA1ggAAxPMBAAAAAAAQCAA...",
        "progressPercent": 100,
        "startTime": "2018-06-21T16:56:46.755199Z",
        "updateTime": "2018-06-21T16:59:17.911197Z"
      }
    ]
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.cloud.videointelligence.v1.AnnotateVideoResponse",
    "annotationResults": [
      {
        "inputContent": "/cloud-ml-sandbox/video/chicago.mp4",
        "objectAnnotations": [
          {
            "entity": {
              "entityId": "/m/0k4j",
              "description": "car",
              "languageCode": "en-US"
            },
            "frames": [
              {
                "normalizedBoundingBox": {
                  "left": 0.2672763,
                  "top": 0.5677657,
                  "right": 0.4388713,
                  "bottom": 0.7623171
                },
                "timeOffset": "0s"
              },
              {
                "normalizedBoundingBox": {
                  "left": 0.26920167,
                  "top": 0.5659805,
                  "right": 0.44331276,
                  "bottom": 0.76780635
                },
                "timeOffset": "0.100495s"
              },
           ...
              {
                "normalizedBoundingBox": {
                  "left": 0.83573246,
                  "top": 0.6645812,
                  "right": 1,
                  "bottom": 0.99865407
                },
                "timeOffset": "2.311402s"
              }
            ],
            "segment": {
              "startTimeOffset": "0s",
              "endTimeOffset": "2.311402s"
            },
            "confidence": 0.99488896
          },
        ...
          {
            "entity": {
              "entityId": "/m/0cgh4",
              "description": "building",
              "languageCode": "en-US"
            },
            "frames": [
              {
                "normalizedBoundingBox": {
                  "left": 0.12340179,
                  "top": 0.010383379,
                  "right": 0.21914443,
                  "bottom": 0.5591795
                },
                "timeOffset": "0s"
              },
              {
                "normalizedBoundingBox": {
                  "left": 0.12340179,
                  "top": 0.009684974,
                  "right": 0.22915152,
                  "bottom": 0.56070584
                },
                "timeOffset": "0.100495s"
              },
           ...
              {
                "normalizedBoundingBox": {
                  "left": 0.12340179,
                  "top": 0.008624528,
                  "right": 0.22723165,
                  "bottom": 0.56158626
                },
                "timeOffset": "0.401983s"
              }
            ],
            "segment": {
              "startTimeOffset": "0s",
              "endTimeOffset": "0.401983s"
            },
            "confidence": 0.33914912
          },
       ...
          {
            "entity": {
              "entityId": "/m/0cgh4",
              "description": "building",
              "languageCode": "en-US"
            },
            "frames": [
              {
                "normalizedBoundingBox": {
                  "left": 0.79324204,
                  "top": 0.0006896425,
                  "right": 0.99659824,
                  "bottom": 0.5324423
                },
                "timeOffset": "37.585421s"
              },
              {
                "normalizedBoundingBox": {
                  "left": 0.78935236,
                  "top": 0.0011992548,
                  "right": 0.99659824,
                  "bottom": 0.5374946
                },
                "timeOffset": "37.685917s"
              },
           ...
              {
                "normalizedBoundingBox": {
                  "left": 0.79404694,
                  "right": 0.99659824,
                  "bottom": 0.5280966
                },
                "timeOffset": "38.590379s"
              }
            ],
            "segment": {
              "startTimeOffset": "37.585421s",
              "endTimeOffset": "38.590379s"
            },
            "confidence": 0.3415429
          }
        ]
      }
    ]
  }
}

Go


import (
	"context"
	"fmt"
	"io"
	"io/ioutil"

	video "cloud.google.com/go/videointelligence/apiv1"
	videopb "cloud.google.com/go/videointelligence/apiv1/videointelligencepb"
	"github.com/golang/protobuf/ptypes"
)

// objectTracking analyzes a video and extracts entities with their bounding boxes.
func objectTracking(w io.Writer, filename string) error {
	// filename := "../testdata/cat.mp4"

	ctx := context.Background()

	// Creates a client.
	client, err := video.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("video.NewClient: %w", err)
	}
	defer client.Close()

	fileBytes, err := ioutil.ReadFile(filename)
	if err != nil {
		return err
	}

	op, err := client.AnnotateVideo(ctx, &videopb.AnnotateVideoRequest{
		InputContent: fileBytes,
		Features: []videopb.Feature{
			videopb.Feature_OBJECT_TRACKING,
		},
	})
	if err != nil {
		return fmt.Errorf("AnnotateVideo: %w", err)
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		return fmt.Errorf("Wait: %w", err)
	}

	// Only one video was processed, so get the first result.
	result := resp.GetAnnotationResults()[0]

	for _, annotation := range result.ObjectAnnotations {
		fmt.Fprintf(w, "Description: %q\n", annotation.Entity.GetDescription())
		if len(annotation.Entity.EntityId) > 0 {
			fmt.Fprintf(w, "\tEntity ID: %q\n", annotation.Entity.GetEntityId())
		}

		segment := annotation.GetSegment()
		start, _ := ptypes.Duration(segment.GetStartTimeOffset())
		end, _ := ptypes.Duration(segment.GetEndTimeOffset())
		fmt.Fprintf(w, "\tSegment: %v to %v\n", start, end)

		fmt.Fprintf(w, "\tConfidence: %f\n", annotation.GetConfidence())

		// Here we print only the bounding box of the first frame in this segment.
		frame := annotation.GetFrames()[0]
		seconds := float32(frame.GetTimeOffset().GetSeconds())
		nanos := float32(frame.GetTimeOffset().GetNanos())
		fmt.Fprintf(w, "\tTime offset of the first frame: %fs\n", seconds+nanos/1e9)

		box := frame.GetNormalizedBoundingBox()
		fmt.Fprintf(w, "\tBounding box position:\n")
		fmt.Fprintf(w, "\t\tleft  : %f\n", box.GetLeft())
		fmt.Fprintf(w, "\t\ttop   : %f\n", box.GetTop())
		fmt.Fprintf(w, "\t\tright : %f\n", box.GetRight())
		fmt.Fprintf(w, "\t\tbottom: %f\n", box.GetBottom())
	}

	return nil
}

Java

/**
 * Track objects in a video.
 *
 * @param filePath the path to the video file to analyze.
 */
public static VideoAnnotationResults trackObjects(String filePath) throws Exception {
  try (VideoIntelligenceServiceClient client = VideoIntelligenceServiceClient.create()) {
    // Read file
    Path path = Paths.get(filePath);
    byte[] data = Files.readAllBytes(path);

    // Create the request
    AnnotateVideoRequest request =
        AnnotateVideoRequest.newBuilder()
            .setInputContent(ByteString.copyFrom(data))
            .addFeatures(Feature.OBJECT_TRACKING)
            .setLocationId("us-east1")
            .build();

    // asynchronously perform object tracking on videos
    OperationFuture<AnnotateVideoResponse, AnnotateVideoProgress> future =
        client.annotateVideoAsync(request);

    System.out.println("Waiting for operation to complete...");
    // The first result is retrieved because a single video was processed.
    AnnotateVideoResponse response = future.get(450, TimeUnit.SECONDS);
    VideoAnnotationResults results = response.getAnnotationResults(0);

    // Get only the first annotation for demo purposes.
    ObjectTrackingAnnotation annotation = results.getObjectAnnotations(0);
    System.out.println("Confidence: " + annotation.getConfidence());

    if (annotation.hasEntity()) {
      Entity entity = annotation.getEntity();
      System.out.println("Entity description: " + entity.getDescription());
      System.out.println("Entity id:: " + entity.getEntityId());
    }

    if (annotation.hasSegment()) {
      VideoSegment videoSegment = annotation.getSegment();
      Duration startTimeOffset = videoSegment.getStartTimeOffset();
      Duration endTimeOffset = videoSegment.getEndTimeOffset();
      // Display the segment time in seconds, 1e9 converts nanos to seconds
      System.out.println(
          String.format(
              "Segment: %.2fs to %.2fs",
              startTimeOffset.getSeconds() + startTimeOffset.getNanos() / 1e9,
              endTimeOffset.getSeconds() + endTimeOffset.getNanos() / 1e9));
    }

    // Here we print only the bounding box of the first frame in this segment.
    ObjectTrackingFrame frame = annotation.getFrames(0);
    // Display the offset time in seconds, 1e9 converts nanos to seconds
    Duration timeOffset = frame.getTimeOffset();
    System.out.println(
        String.format(
            "Time offset of the first frame: %.2fs",
            timeOffset.getSeconds() + timeOffset.getNanos() / 1e9));

    // Display the bounding box of the detected object
    NormalizedBoundingBox normalizedBoundingBox = frame.getNormalizedBoundingBox();
    System.out.println("Bounding box position:");
    System.out.println("\tleft: " + normalizedBoundingBox.getLeft());
    System.out.println("\ttop: " + normalizedBoundingBox.getTop());
    System.out.println("\tright: " + normalizedBoundingBox.getRight());
    System.out.println("\tbottom: " + normalizedBoundingBox.getBottom());
    return results;
  }
}

Node.js

Para autenticar na Video Intelligence, configure o Application Default Credentials. Para mais informações, consulte Configurar a autenticação para um ambiente de desenvolvimento local.

// Imports the Google Cloud Video Intelligence library
const Video = require('@google-cloud/video-intelligence');
const fs = require('fs');
const util = require('util');
// Creates a client
const video = new Video.VideoIntelligenceServiceClient();
/**
 * TODO(developer): Uncomment the following line before running the sample.
 */
// const path = 'Local file to analyze, e.g. ./my-file.mp4';

// Reads a local video file and converts it to base64
const file = await util.promisify(fs.readFile)(path);
const inputContent = file.toString('base64');

const request = {
  inputContent: inputContent,
  features: ['OBJECT_TRACKING'],
  //recommended to use us-east1 for the best latency due to different types of processors used in this region and others
  locationId: 'us-east1',
};
// Detects objects in a video
const [operation] = await video.annotateVideo(request);
const results = await operation.promise();
console.log('Waiting for operation to complete...');
//Gets annotations for video
const annotations = results[0].annotationResults[0];
const objects = annotations.objectAnnotations;
objects.forEach(object => {
  console.log(`Entity description:  ${object.entity.description}`);
  console.log(`Entity id: ${object.entity.entityId}`);
  const time = object.segment;
  console.log(
    `Segment: ${time.startTimeOffset.seconds || 0}` +
      `.${(time.startTimeOffset.nanos / 1e6).toFixed(0)}s to ${
        time.endTimeOffset.seconds || 0
      }.` +
      `${(time.endTimeOffset.nanos / 1e6).toFixed(0)}s`
  );
  console.log(`Confidence: ${object.confidence}`);
  const frame = object.frames[0];
  const box = frame.normalizedBoundingBox;
  const timeOffset = frame.timeOffset;
  console.log(
    `Time offset for the first frame: ${timeOffset.seconds || 0}` +
      `.${(timeOffset.nanos / 1e6).toFixed(0)}s`
  );
  console.log('Bounding box position:');
  console.log(` left   :${box.left}`);
  console.log(` top    :${box.top}`);
  console.log(` right  :${box.right}`);
  console.log(` bottom :${box.bottom}`);
});

Python

"""Object tracking in a local video."""
from google.cloud import videointelligence

video_client = videointelligence.VideoIntelligenceServiceClient()
features = [videointelligence.Feature.OBJECT_TRACKING]

with io.open(path, "rb") as file:
    input_content = file.read()

operation = video_client.annotate_video(
    request={"features": features, "input_content": input_content}
)
print("\nProcessing video for object annotations.")

result = operation.result(timeout=500)
print("\nFinished processing.\n")

# The first result is retrieved because a single video was processed.
object_annotations = result.annotation_results[0].object_annotations

# Get only the first annotation for demo purposes.
object_annotation = object_annotations[0]
print("Entity description: {}".format(object_annotation.entity.description))
if object_annotation.entity.entity_id:
    print("Entity id: {}".format(object_annotation.entity.entity_id))

print(
    "Segment: {}s to {}s".format(
        object_annotation.segment.start_time_offset.seconds
        + object_annotation.segment.start_time_offset.microseconds / 1e6,
        object_annotation.segment.end_time_offset.seconds
        + object_annotation.segment.end_time_offset.microseconds / 1e6,
    )
)

print("Confidence: {}".format(object_annotation.confidence))

# Here we print only the bounding box of the first frame in this segment
frame = object_annotation.frames[0]
box = frame.normalized_bounding_box
print(
    "Time offset of the first frame: {}s".format(
        frame.time_offset.seconds + frame.time_offset.microseconds / 1e6
    )
)
print("Bounding box position:")
print("\tleft  : {}".format(box.left))
print("\ttop   : {}".format(box.top))
print("\tright : {}".format(box.right))
print("\tbottom: {}".format(box.bottom))
print("\n")

Outras linguagens

C#: Siga as Instruções de configuração do C# na página das bibliotecas de cliente e acesse a Documentação de referência do Video Intelligence para .NET.

PHP: Siga as Instruções de configuração do PHP na página das bibliotecas de cliente e acesse a Documentação de referência do Video Intelligence para PHP.

Ruby: Siga as Instruções de configuração do Ruby na página das bibliotecas de cliente e acesse a Documentação de referência do Video Intelligence para Ruby.