Se usó la API de Cloud Translation para traducir esta página.

Hacer un seguimiento de los objetos

El seguimiento de objetos realiza un seguimiento de los objetos detectados en un video de entrada. Para realizar una solicitud de seguimiento de objetos, llama al método annotate y especifica OBJECT_TRACKING en el campo features.

En el caso de las entidades y las ubicaciones espaciales que se detectan en un video o segmentos de video, una solicitud de seguimiento de objetos anota el video con las etiquetas adecuadas para estas entidades y ubicaciones espaciales. Por ejemplo, un video de vehículos que cruzan una señal de tráfico puede producir etiquetas como “automóvil”, “ camión”, “bicicleta”, “frenos”, “luces”, “ventana”, etcétera. Cada etiqueta puede incluir una serie de cuadros de límite, cada uno con un segmento de tiempo asociado que contiene una compensación de tiempo que indica la compensación de duración desde el principio del video. La anotación también contiene información adicional sobre la entidad, incluido un ID de la entidad que puedes usar para encontrar más información sobre la entidad en la API de búsqueda del Gráfico de conocimiento de Google.

Seguimiento de objetos en comparación con la detección de etiquetas

El seguimiento de objetos difiere de la detección de etiquetas. La detección de etiquetas proporciona etiquetas sin cuadros de límite, mientras que el seguimiento de objetos proporciona las etiquetas de los objetos individuales presentes en un video determinado con el cuadro delimitador de cada instancia de objeto en cada paso.

Varias instancias del mismo tipo de objeto se asignan a diferentes instancias del mensaje ObjectTrackingAnnotation, en las que todas las ocurrencias de una pista de objeto determinada se mantienen en su propia instancia de ObjectTrackingAnnotation. Por ejemplo, si hay un automóvil rojo y uno azul que aparecen durante 5 segundos en un video, la solicitud de seguimiento debe mostrar dos instancias de ObjectTrackingAnnotation. La primera instancia contendrá las ubicaciones de uno de los dos automóviles, por ejemplo, el automóvil rojo, mientras que la segunda contendrá las ubicaciones del otro automóvil.

Solicita el seguimiento de objetos para un video en Cloud Storage

En los siguientes ejemplos, se muestra el seguimiento de objetos en un archivo ubicado en Cloud Storage.

REST

Envía la solicitud de proceso

A continuación, se muestra cómo enviar una solicitud POST al método annotate. En el ejemplo, se usa el token de acceso correspondiente a la configuración de una cuenta de servicio para el proyecto con Google Cloud CLI. Si deseas obtener instrucciones para instalar Google Cloud CLI, configurar un proyecto con una cuenta de servicio y obtener un token de acceso, consulta la guía de inicio rápido de Video Intelligence.

Antes de usar cualquiera de los datos de solicitud a continuación, realiza los siguientes reemplazos:

INPUT_URI: STORAGE_URI
Por ejemplo:
"inputUri": "gs://cloud-videointelligence-demo/assistant.mp4",
PROJECT_NUMBER: El identificador numérico de tu proyecto de Google Cloud

Método HTTP y URL:

POST https://videointelligence.googleapis.com/v1/videos:annotate

Cuerpo JSON de la solicitud:

{
  "inputUri": "STORAGE_URI",
  "features": ["OBJECT_TRACKING"]
}

Para enviar tu solicitud, expande una de estas opciones:

curl (Linux, macOS o Cloud Shell)

Nota: Con el siguiente comando, se supone que accediste a la CLI de gcloud con tu cuenta de usuario a través de la ejecución de gcloud init o gcloud auth login, o a través del uso de Cloud Shell, que accede de forma automática a la CLI de gcloud. Para comprobar la cuenta activa actual, ejecuta gcloud auth list.

Guarda el cuerpo de la solicitud en un archivo llamado request.json y ejecuta el siguiente comando:

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: PROJECT_NUMBER" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://videointelligence.googleapis.com/v1/videos:annotate"

PowerShell (Windows)

Nota: El siguiente comando supone que accediste a la CLI de gcloud con tu cuenta de usuario mediante la ejecución de gcloud init o gcloud auth login. Para comprobar la cuenta activa actual, ejecuta gcloud auth list.

Guarda el cuerpo de la solicitud en un archivo llamado request.json y ejecuta el siguiente comando:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_NUMBER" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://videointelligence.googleapis.com/v1/videos:annotate" | Select-Object -Expand Content

Deberías recibir una respuesta JSON similar a la que se muestra a continuación:

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/operations/OPERATION_ID"
}

Si la respuesta es correcta, la API de Video Intelligence muestra name para tu operación. A continuación, se muestra un ejemplo de la respuesta, en la que PROJECT_NUMBER es el número de tu proyecto y OPERATION_ID es el ID de la operación de larga duración creada para la solicitud.

Obtén los resultados

Para obtener los resultados de tu solicitud, debes enviar un GET con el nombre de la operación que se muestra de la llamada a videos:annotate, como se muestra en el siguiente ejemplo.

Antes de usar cualquiera de los datos de solicitud a continuación, realiza los siguientes reemplazos:

OPERATION_NAME: el nombre de la operación que muestra la API de Video Intelligence. El nombre de la operación tiene el formato projects/PROJECT_NUMBER/locations/LOCATION_ID/operations/OPERATION_ID.
PROJECT_NUMBER: El identificador numérico de tu proyecto de Google Cloud

Método HTTP y URL:

GET https://videointelligence.googleapis.com/v1/OPERATION_NAME

Para enviar tu solicitud, expande una de estas opciones:

curl (Linux, macOS o Cloud Shell)

Ejecuta el siguiente comando:

curl -X GET \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: PROJECT_NUMBER" \
     "https://videointelligence.googleapis.com/v1/OPERATION_NAME"

PowerShell (Windows)

Ejecuta el siguiente comando:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_NUMBER" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://videointelligence.googleapis.com/v1/OPERATION_NAME" | Select-Object -Expand Content

Deberías recibir una respuesta JSON similar a la que se muestra a continuación:

Respuesta

// Object tracking annotations are returned as a objectAnnotations list.
{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.videointelligence.v1.AnnotateVideoProgress",
    "annotationProgress": [
      {
        "inputUri": "/cloud-ml-sandbox/video/chicago.mp4",
        "progressPercent": 100,
        "startTime": "2019-12-21T16:56:46.755199Z",
        "updateTime": "2019-12-21T16:59:17.911197Z"
      }
    ]
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.cloud.videointelligence.v1.AnnotateVideoResponse",
    "annotationResults": [
      {
        "inputUri": "/cloud-ml-sandbox/video/chicago.mp4",
        "objectAnnotations": [
          {
            "entity": {
              "entityId": "/m/0k4j",
              "description": "car",
              "languageCode": "en-US"
            },
            "frames": [
              {
                "normalizedBoundingBox": {
                  "left": 0.2672763,
                  "top": 0.5677657,
                  "right": 0.4388713,
                  "bottom": 0.7623171
                },
                "timeOffset": "0s"
              },
              {
                "normalizedBoundingBox": {
                  "left": 0.26920167,
                  "top": 0.5659805,
                  "right": 0.44331276,
                  "bottom": 0.76780635
                },
                "timeOffset": "0.100495s"
              },
           ...
              {
                "normalizedBoundingBox": {
                  "left": 0.83573246,
                  "top": 0.6645812,
                  "right": 1,
                  "bottom": 0.99865407
                },
                "timeOffset": "2.311402s"
              }
            ],
            "segment": {
              "startTimeOffset": "0s",
              "endTimeOffset": "2.311402s"
            },
            "confidence": 0.99488896
          },
        ...
          {
            "entity": {
              "entityId": "/m/0cgh4",
              "description": "building",
              "languageCode": "en-US"
            },
            "frames": [
              {
                "normalizedBoundingBox": {
                  "left": 0.12340179,
                  "top": 0.010383379,
                  "right": 0.21914443,
                  "bottom": 0.5591795
                },
                "timeOffset": "0s"
              },
              {
                "normalizedBoundingBox": {
                  "left": 0.12340179,
                  "top": 0.009684974,
                  "right": 0.22915152,
                  "bottom": 0.56070584
                },
                "timeOffset": "0.100495s"
              },
           ...
              {
                "normalizedBoundingBox": {
                  "left": 0.12340179,
                  "top": 0.008624528,
                  "right": 0.22723165,
                  "bottom": 0.56158626
                },
                "timeOffset": "0.401983s"
              }
            ],
            "segment": {
              "startTimeOffset": "0s",
              "endTimeOffset": "0.401983s"
            },
            "confidence": 0.33914912
          },
       ...
          {
            "entity": {
              "entityId": "/m/0cgh4",
              "description": "building",
              "languageCode": "en-US"
            },
            "frames": [
              {
                "normalizedBoundingBox": {
                  "left": 0.79324204,
                  "top": 0.0006896425,
                  "right": 0.99659824,
                  "bottom": 0.5324423
                },
                "timeOffset": "37.585421s"
              },
              {
                "normalizedBoundingBox": {
                  "left": 0.78935236,
                  "top": 0.0011992548,
                  "right": 0.99659824,
                  "bottom": 0.5374946
                },
                "timeOffset": "37.685917s"
              },
           ...
              {
                "normalizedBoundingBox": {
                  "left": 0.79404694,
                  "right": 0.99659824,
                  "bottom": 0.5280966
                },
                "timeOffset": "38.590379s"
              }
            ],
            "segment": {
              "startTimeOffset": "37.585421s",
              "endTimeOffset": "38.590379s"
            },
            "confidence": 0.3415429
          }
        ]
      }
    ]
  }
}

Descarga los resultados de las anotaciones

Copia la anotación de la fuente al bucket de destino: (consulta Cómo copiar archivos y objetos)

gcloud storage cp gcs_uri gs://my-bucket

Nota: Si el usuario proporciona el URI de GCS de salida, la anotación se almacena en ese URI de GCS.

Go


import (
	"context"
	"fmt"
	"io"

	video "cloud.google.com/go/videointelligence/apiv1"
	videopb "cloud.google.com/go/videointelligence/apiv1/videointelligencepb"
	"github.com/golang/protobuf/ptypes"
)

// objectTrackingGCS analyzes a video and extracts entities with their bounding boxes.
func objectTrackingGCS(w io.Writer, gcsURI string) error {
	// gcsURI := "gs://cloud-samples-data/video/cat.mp4"

	ctx := context.Background()

	// Creates a client.
	client, err := video.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("video.NewClient: %w", err)
	}
	defer client.Close()

	op, err := client.AnnotateVideo(ctx, &videopb.AnnotateVideoRequest{
		InputUri: gcsURI,
		Features: []videopb.Feature{
			videopb.Feature_OBJECT_TRACKING,
		},
	})
	if err != nil {
		return fmt.Errorf("AnnotateVideo: %w", err)
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		return fmt.Errorf("Wait: %w", err)
	}

	// Only one video was processed, so get the first result.
	result := resp.GetAnnotationResults()[0]

	for _, annotation := range result.ObjectAnnotations {
		fmt.Fprintf(w, "Description: %q\n", annotation.Entity.GetDescription())
		if len(annotation.Entity.EntityId) > 0 {
			fmt.Fprintf(w, "\tEntity ID: %q\n", annotation.Entity.GetEntityId())
		}

		segment := annotation.GetSegment()
		start, _ := ptypes.Duration(segment.GetStartTimeOffset())
		end, _ := ptypes.Duration(segment.GetEndTimeOffset())
		fmt.Fprintf(w, "\tSegment: %v to %v\n", start, end)

		fmt.Fprintf(w, "\tConfidence: %f\n", annotation.GetConfidence())

		// Here we print only the bounding box of the first frame in this segment.
		frame := annotation.GetFrames()[0]
		seconds := float32(frame.GetTimeOffset().GetSeconds())
		nanos := float32(frame.GetTimeOffset().GetNanos())
		fmt.Fprintf(w, "\tTime offset of the first frame: %fs\n", seconds+nanos/1e9)

		box := frame.GetNormalizedBoundingBox()
		fmt.Fprintf(w, "\tBounding box position:\n")
		fmt.Fprintf(w, "\t\tleft  : %f\n", box.GetLeft())
		fmt.Fprintf(w, "\t\ttop   : %f\n", box.GetTop())
		fmt.Fprintf(w, "\t\tright : %f\n", box.GetRight())
		fmt.Fprintf(w, "\t\tbottom: %f\n", box.GetBottom())
	}

	return nil
}

Java

/**
 * Track objects in a video.
 *
 * @param gcsUri the path to the video file to analyze.
 */
public static VideoAnnotationResults trackObjectsGcs(String gcsUri) throws Exception {
  try (VideoIntelligenceServiceClient client = VideoIntelligenceServiceClient.create()) {
    // Create the request
    AnnotateVideoRequest request =
        AnnotateVideoRequest.newBuilder()
            .setInputUri(gcsUri)
            .addFeatures(Feature.OBJECT_TRACKING)
            .setLocationId("us-east1")
            .build();

    // asynchronously perform object tracking on videos
    OperationFuture<AnnotateVideoResponse, AnnotateVideoProgress> future =
        client.annotateVideoAsync(request);

    System.out.println("Waiting for operation to complete...");
    // The first result is retrieved because a single video was processed.
    AnnotateVideoResponse response = future.get(450, TimeUnit.SECONDS);
    VideoAnnotationResults results = response.getAnnotationResults(0);

    // Get only the first annotation for demo purposes.
    ObjectTrackingAnnotation annotation = results.getObjectAnnotations(0);
    System.out.println("Confidence: " + annotation.getConfidence());

    if (annotation.hasEntity()) {
      Entity entity = annotation.getEntity();
      System.out.println("Entity description: " + entity.getDescription());
      System.out.println("Entity id:: " + entity.getEntityId());
    }

    if (annotation.hasSegment()) {
      VideoSegment videoSegment = annotation.getSegment();
      Duration startTimeOffset = videoSegment.getStartTimeOffset();
      Duration endTimeOffset = videoSegment.getEndTimeOffset();
      // Display the segment time in seconds, 1e9 converts nanos to seconds
      System.out.println(
          String.format(
              "Segment: %.2fs to %.2fs",
              startTimeOffset.getSeconds() + startTimeOffset.getNanos() / 1e9,
              endTimeOffset.getSeconds() + endTimeOffset.getNanos() / 1e9));
    }

    // Here we print only the bounding box of the first frame in this segment.
    ObjectTrackingFrame frame = annotation.getFrames(0);
    // Display the offset time in seconds, 1e9 converts nanos to seconds
    Duration timeOffset = frame.getTimeOffset();
    System.out.println(
        String.format(
            "Time offset of the first frame: %.2fs",
            timeOffset.getSeconds() + timeOffset.getNanos() / 1e9));

    // Display the bounding box of the detected object
    NormalizedBoundingBox normalizedBoundingBox = frame.getNormalizedBoundingBox();
    System.out.println("Bounding box position:");
    System.out.println("\tleft: " + normalizedBoundingBox.getLeft());
    System.out.println("\ttop: " + normalizedBoundingBox.getTop());
    System.out.println("\tright: " + normalizedBoundingBox.getRight());
    System.out.println("\tbottom: " + normalizedBoundingBox.getBottom());
    return results;
  }
}

Node.js

Para autenticarte en Video Intelligence, configura las credenciales predeterminadas de la aplicación. Si deseas obtener más información, consulta Configura la autenticación para un entorno de desarrollo local.

// Imports the Google Cloud Video Intelligence library
const Video = require('@google-cloud/video-intelligence');

// Creates a client
const video = new Video.VideoIntelligenceServiceClient();

/**
 * TODO(developer): Uncomment the following line before running the sample.
 */
// const gcsUri = 'GCS URI of the video to analyze, e.g. gs://my-bucket/my-video.mp4';

const request = {
  inputUri: gcsUri,
  features: ['OBJECT_TRACKING'],
  //recommended to use us-east1 for the best latency due to different types of processors used in this region and others
  locationId: 'us-east1',
};
// Detects objects in a video
const [operation] = await video.annotateVideo(request);
const results = await operation.promise();
console.log('Waiting for operation to complete...');
//Gets annotations for video
const annotations = results[0].annotationResults[0];
const objects = annotations.objectAnnotations;
objects.forEach(object => {
  console.log(`Entity description:  ${object.entity.description}`);
  console.log(`Entity id: ${object.entity.entityId}`);
  const time = object.segment;
  console.log(
    `Segment: ${time.startTimeOffset.seconds || 0}` +
      `.${(time.startTimeOffset.nanos / 1e6).toFixed(0)}s to ${
        time.endTimeOffset.seconds || 0
      }.` +
      `${(time.endTimeOffset.nanos / 1e6).toFixed(0)}s`
  );
  console.log(`Confidence: ${object.confidence}`);
  const frame = object.frames[0];
  const box = frame.normalizedBoundingBox;
  const timeOffset = frame.timeOffset;
  console.log(
    `Time offset for the first frame: ${timeOffset.seconds || 0}` +
      `.${(timeOffset.nanos / 1e6).toFixed(0)}s`
  );
  console.log('Bounding box position:');
  console.log(` left   :${box.left}`);
  console.log(` top    :${box.top}`);
  console.log(` right  :${box.right}`);
  console.log(` bottom :${box.bottom}`);
});

Python

"""Object tracking in a video stored on GCS."""
from google.cloud import videointelligence

video_client = videointelligence.VideoIntelligenceServiceClient()
features = [videointelligence.Feature.OBJECT_TRACKING]
operation = video_client.annotate_video(
    request={"features": features, "input_uri": gcs_uri}
)
print("\nProcessing video for object annotations.")

result = operation.result(timeout=500)
print("\nFinished processing.\n")

# The first result is retrieved because a single video was processed.
object_annotations = result.annotation_results[0].object_annotations

for object_annotation in object_annotations:
    print("Entity description: {}".format(object_annotation.entity.description))
    if object_annotation.entity.entity_id:
        print("Entity id: {}".format(object_annotation.entity.entity_id))

    print(
        "Segment: {}s to {}s".format(
            object_annotation.segment.start_time_offset.seconds
            + object_annotation.segment.start_time_offset.microseconds / 1e6,
            object_annotation.segment.end_time_offset.seconds
            + object_annotation.segment.end_time_offset.microseconds / 1e6,
        )
    )

    print("Confidence: {}".format(object_annotation.confidence))

    # Here we print only the bounding box of the first frame in the segment
    frame = object_annotation.frames[0]
    box = frame.normalized_bounding_box
    print(
        "Time offset of the first frame: {}s".format(
            frame.time_offset.seconds + frame.time_offset.microseconds / 1e6
        )
    )
    print("Bounding box position:")
    print("\tleft  : {}".format(box.left))
    print("\ttop   : {}".format(box.top))
    print("\tright : {}".format(box.right))
    print("\tbottom: {}".format(box.bottom))
    print("\n")

Idiomas adicionales

C#: Sigue las instrucciones de configuración de C# en la página Bibliotecas cliente y, luego, visita la documentación de referencia de Video Intelligence para .NET.

PHP: Sigue las instrucciones de configuración de PHP en la página Bibliotecas cliente y, luego, visita la documentación de referencia de Video Intelligence para PHP.

Ruby: Sigue las instrucciones de configuración de Ruby en la página Bibliotecas cliente y, luego, visita la documentación de referencia de Video Intelligence para Ruby.

Solicita el seguimiento de objetos para video desde un archivo local

En los siguientes ejemplos, se demuestra el seguimiento de objetos en un archivo almacenado de forma local.

REST

Envía la solicitud de proceso

Para realizar una anotación en un archivo de video local, codifica el contenido del archivo de video en base64. Incluye el contenido codificado en base64 en el campo inputContent de la solicitud. Para obtener información sobre cómo codificar en base64 el contenido de un archivo de video, consulta Codificación en base64.

A continuación, se muestra cómo enviar una solicitud POST al método videos:annotate. En el ejemplo, se usa el token de acceso correspondiente a la configuración de una cuenta de servicio para el proyecto con Google Cloud CLI. Si deseas obtener instrucciones para instalar Google Cloud CLI, configurar un proyecto con una cuenta de servicio y obtener un token de acceso, consulta la guía de inicio rápido de Video Intelligence.

Antes de usar cualquiera de los datos de solicitud a continuación, realiza los siguientes reemplazos:

inputContent: BASE64_ENCODED_CONTENT
Por ejemplo: "UklGRg41AwBBVkkgTElTVAwBAABoZHJsYXZpaDgAAAA1ggAAxPMBAAAAAAAQCAA..."
PROJECT_NUMBER: El identificador numérico de tu proyecto de Google Cloud

Método HTTP y URL:

POST https://videointelligence.googleapis.com/v1/videos:annotate

Cuerpo JSON de la solicitud:

{
  "inputContent": "BASE64_ENCODED_CONTENT",
  "features": ["OBJECT_TRACKING"]
}

Para enviar tu solicitud, expande una de estas opciones:

curl (Linux, macOS o Cloud Shell)

Guarda el cuerpo de la solicitud en un archivo llamado request.json y ejecuta el siguiente comando:

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: PROJECT_NUMBER" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://videointelligence.googleapis.com/v1/videos:annotate"

PowerShell (Windows)

Guarda el cuerpo de la solicitud en un archivo llamado request.json y ejecuta el siguiente comando:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_NUMBER" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://videointelligence.googleapis.com/v1/videos:annotate" | Select-Object -Expand Content

Deberías recibir una respuesta JSON similar a la que se muestra a continuación:

Tiempo

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/operations/OPERATION_ID"
}

Si la solicitud se realiza correctamente, Video Intelligence elige el name para tu operación. A continuación, se muestra un ejemplo de la respuesta, en la que PROJECT_NUMBER es el número de tu proyecto y OPERATION_ID es el ID de la operación de larga duración creada para la solicitud.

Obtén los resultados

Para obtener los resultados de tu solicitud, debes enviar un GET con el nombre de la operación que se muestra de la llamada a videos:annotate, como se muestra en el siguiente ejemplo.

Antes de usar cualquiera de los datos de solicitud a continuación, realiza los siguientes reemplazos:

OPERATION_NAME: el nombre de la operación que muestra la API de Video Intelligence. El nombre de la operación tiene el formato projects/PROJECT_NUMBER/locations/LOCATION_ID/operations/OPERATION_ID.
PROJECT_NUMBER: El identificador numérico de tu proyecto de Google Cloud

Método HTTP y URL:

GET https://videointelligence.googleapis.com/v1/OPERATION_NAME

Para enviar tu solicitud, expande una de estas opciones:

curl (Linux, macOS o Cloud Shell)

Ejecuta el siguiente comando:

curl -X GET \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: PROJECT_NUMBER" \
     "https://videointelligence.googleapis.com/v1/OPERATION_NAME"

PowerShell (Windows)

Ejecuta el siguiente comando:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_NUMBER" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://videointelligence.googleapis.com/v1/OPERATION_NAME" | Select-Object -Expand Content

Deberías recibir una respuesta JSON similar a la que se muestra a continuación:

Respuesta

// Object tracking annotations are returned as a objectAnnotations list.
{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.videointelligence.v1.AnnotateVideoProgress",
    "annotationProgress": [
      {
        "inputContent": "UklGRg41AwBBVkkgTElTVAwBAABoZHJsYXZpaDgAAAA1ggAAxPMBAAAAAAAQCAA...",
        "progressPercent": 100,
        "startTime": "2018-06-21T16:56:46.755199Z",
        "updateTime": "2018-06-21T16:59:17.911197Z"
      }
    ]
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.cloud.videointelligence.v1.AnnotateVideoResponse",
    "annotationResults": [
      {
        "inputContent": "/cloud-ml-sandbox/video/chicago.mp4",
        "objectAnnotations": [
          {
            "entity": {
              "entityId": "/m/0k4j",
              "description": "car",
              "languageCode": "en-US"
            },
            "frames": [
              {
                "normalizedBoundingBox": {
                  "left": 0.2672763,
                  "top": 0.5677657,
                  "right": 0.4388713,
                  "bottom": 0.7623171
                },
                "timeOffset": "0s"
              },
              {
                "normalizedBoundingBox": {
                  "left": 0.26920167,
                  "top": 0.5659805,
                  "right": 0.44331276,
                  "bottom": 0.76780635
                },
                "timeOffset": "0.100495s"
              },
           ...
              {
                "normalizedBoundingBox": {
                  "left": 0.83573246,
                  "top": 0.6645812,
                  "right": 1,
                  "bottom": 0.99865407
                },
                "timeOffset": "2.311402s"
              }
            ],
            "segment": {
              "startTimeOffset": "0s",
              "endTimeOffset": "2.311402s"
            },
            "confidence": 0.99488896
          },
        ...
          {
            "entity": {
              "entityId": "/m/0cgh4",
              "description": "building",
              "languageCode": "en-US"
            },
            "frames": [
              {
                "normalizedBoundingBox": {
                  "left": 0.12340179,
                  "top": 0.010383379,
                  "right": 0.21914443,
                  "bottom": 0.5591795
                },
                "timeOffset": "0s"
              },
              {
                "normalizedBoundingBox": {
                  "left": 0.12340179,
                  "top": 0.009684974,
                  "right": 0.22915152,
                  "bottom": 0.56070584
                },
                "timeOffset": "0.100495s"
              },
           ...
              {
                "normalizedBoundingBox": {
                  "left": 0.12340179,
                  "top": 0.008624528,
                  "right": 0.22723165,
                  "bottom": 0.56158626
                },
                "timeOffset": "0.401983s"
              }
            ],
            "segment": {
              "startTimeOffset": "0s",
              "endTimeOffset": "0.401983s"
            },
            "confidence": 0.33914912
          },
       ...
          {
            "entity": {
              "entityId": "/m/0cgh4",
              "description": "building",
              "languageCode": "en-US"
            },
            "frames": [
              {
                "normalizedBoundingBox": {
                  "left": 0.79324204,
                  "top": 0.0006896425,
                  "right": 0.99659824,
                  "bottom": 0.5324423
                },
                "timeOffset": "37.585421s"
              },
              {
                "normalizedBoundingBox": {
                  "left": 0.78935236,
                  "top": 0.0011992548,
                  "right": 0.99659824,
                  "bottom": 0.5374946
                },
                "timeOffset": "37.685917s"
              },
           ...
              {
                "normalizedBoundingBox": {
                  "left": 0.79404694,
                  "right": 0.99659824,
                  "bottom": 0.5280966
                },
                "timeOffset": "38.590379s"
              }
            ],
            "segment": {
              "startTimeOffset": "37.585421s",
              "endTimeOffset": "38.590379s"
            },
            "confidence": 0.3415429
          }
        ]
      }
    ]
  }
}

Go


import (
	"context"
	"fmt"
	"io"
	"os"

	video "cloud.google.com/go/videointelligence/apiv1"
	videopb "cloud.google.com/go/videointelligence/apiv1/videointelligencepb"
	"github.com/golang/protobuf/ptypes"
)

// objectTracking analyzes a video and extracts entities with their bounding boxes.
func objectTracking(w io.Writer, filename string) error {
	// filename := "../testdata/cat.mp4"

	ctx := context.Background()

	// Creates a client.
	client, err := video.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("video.NewClient: %w", err)
	}
	defer client.Close()

	fileBytes, err := os.ReadFile(filename)
	if err != nil {
		return err
	}

	op, err := client.AnnotateVideo(ctx, &videopb.AnnotateVideoRequest{
		InputContent: fileBytes,
		Features: []videopb.Feature{
			videopb.Feature_OBJECT_TRACKING,
		},
	})
	if err != nil {
		return fmt.Errorf("AnnotateVideo: %w", err)
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		return fmt.Errorf("Wait: %w", err)
	}

	// Only one video was processed, so get the first result.
	result := resp.GetAnnotationResults()[0]

	for _, annotation := range result.ObjectAnnotations {
		fmt.Fprintf(w, "Description: %q\n", annotation.Entity.GetDescription())
		if len(annotation.Entity.EntityId) > 0 {
			fmt.Fprintf(w, "\tEntity ID: %q\n", annotation.Entity.GetEntityId())
		}

		segment := annotation.GetSegment()
		start, _ := ptypes.Duration(segment.GetStartTimeOffset())
		end, _ := ptypes.Duration(segment.GetEndTimeOffset())
		fmt.Fprintf(w, "\tSegment: %v to %v\n", start, end)

		fmt.Fprintf(w, "\tConfidence: %f\n", annotation.GetConfidence())

		// Here we print only the bounding box of the first frame in this segment.
		frame := annotation.GetFrames()[0]
		seconds := float32(frame.GetTimeOffset().GetSeconds())
		nanos := float32(frame.GetTimeOffset().GetNanos())
		fmt.Fprintf(w, "\tTime offset of the first frame: %fs\n", seconds+nanos/1e9)

		box := frame.GetNormalizedBoundingBox()
		fmt.Fprintf(w, "\tBounding box position:\n")
		fmt.Fprintf(w, "\t\tleft  : %f\n", box.GetLeft())
		fmt.Fprintf(w, "\t\ttop   : %f\n", box.GetTop())
		fmt.Fprintf(w, "\t\tright : %f\n", box.GetRight())
		fmt.Fprintf(w, "\t\tbottom: %f\n", box.GetBottom())
	}

	return nil
}

Java

/**
 * Track objects in a video.
 *
 * @param filePath the path to the video file to analyze.
 */
public static VideoAnnotationResults trackObjects(String filePath) throws Exception {
  try (VideoIntelligenceServiceClient client = VideoIntelligenceServiceClient.create()) {
    // Read file
    Path path = Paths.get(filePath);
    byte[] data = Files.readAllBytes(path);

    // Create the request
    AnnotateVideoRequest request =
        AnnotateVideoRequest.newBuilder()
            .setInputContent(ByteString.copyFrom(data))
            .addFeatures(Feature.OBJECT_TRACKING)
            .setLocationId("us-east1")
            .build();

    // asynchronously perform object tracking on videos
    OperationFuture<AnnotateVideoResponse, AnnotateVideoProgress> future =
        client.annotateVideoAsync(request);

    System.out.println("Waiting for operation to complete...");
    // The first result is retrieved because a single video was processed.
    AnnotateVideoResponse response = future.get(450, TimeUnit.SECONDS);
    VideoAnnotationResults results = response.getAnnotationResults(0);

    // Get only the first annotation for demo purposes.
    ObjectTrackingAnnotation annotation = results.getObjectAnnotations(0);
    System.out.println("Confidence: " + annotation.getConfidence());

    if (annotation.hasEntity()) {
      Entity entity = annotation.getEntity();
      System.out.println("Entity description: " + entity.getDescription());
      System.out.println("Entity id:: " + entity.getEntityId());
    }

    if (annotation.hasSegment()) {
      VideoSegment videoSegment = annotation.getSegment();
      Duration startTimeOffset = videoSegment.getStartTimeOffset();
      Duration endTimeOffset = videoSegment.getEndTimeOffset();
      // Display the segment time in seconds, 1e9 converts nanos to seconds
      System.out.println(
          String.format(
              "Segment: %.2fs to %.2fs",
              startTimeOffset.getSeconds() + startTimeOffset.getNanos() / 1e9,
              endTimeOffset.getSeconds() + endTimeOffset.getNanos() / 1e9));
    }

    // Here we print only the bounding box of the first frame in this segment.
    ObjectTrackingFrame frame = annotation.getFrames(0);
    // Display the offset time in seconds, 1e9 converts nanos to seconds
    Duration timeOffset = frame.getTimeOffset();
    System.out.println(
        String.format(
            "Time offset of the first frame: %.2fs",
            timeOffset.getSeconds() + timeOffset.getNanos() / 1e9));

    // Display the bounding box of the detected object
    NormalizedBoundingBox normalizedBoundingBox = frame.getNormalizedBoundingBox();
    System.out.println("Bounding box position:");
    System.out.println("\tleft: " + normalizedBoundingBox.getLeft());
    System.out.println("\ttop: " + normalizedBoundingBox.getTop());
    System.out.println("\tright: " + normalizedBoundingBox.getRight());
    System.out.println("\tbottom: " + normalizedBoundingBox.getBottom());
    return results;
  }
}

Node.js

// Imports the Google Cloud Video Intelligence library
const Video = require('@google-cloud/video-intelligence');
const fs = require('fs');
const util = require('util');
// Creates a client
const video = new Video.VideoIntelligenceServiceClient();
/**
 * TODO(developer): Uncomment the following line before running the sample.
 */
// const path = 'Local file to analyze, e.g. ./my-file.mp4';

// Reads a local video file and converts it to base64
const file = await util.promisify(fs.readFile)(path);
const inputContent = file.toString('base64');

const request = {
  inputContent: inputContent,
  features: ['OBJECT_TRACKING'],
  //recommended to use us-east1 for the best latency due to different types of processors used in this region and others
  locationId: 'us-east1',
};
// Detects objects in a video
const [operation] = await video.annotateVideo(request);
const results = await operation.promise();
console.log('Waiting for operation to complete...');
//Gets annotations for video
const annotations = results[0].annotationResults[0];
const objects = annotations.objectAnnotations;
objects.forEach(object => {
  console.log(`Entity description:  ${object.entity.description}`);
  console.log(`Entity id: ${object.entity.entityId}`);
  const time = object.segment;
  console.log(
    `Segment: ${time.startTimeOffset.seconds || 0}` +
      `.${(time.startTimeOffset.nanos / 1e6).toFixed(0)}s to ${
        time.endTimeOffset.seconds || 0
      }.` +
      `${(time.endTimeOffset.nanos / 1e6).toFixed(0)}s`
  );
  console.log(`Confidence: ${object.confidence}`);
  const frame = object.frames[0];
  const box = frame.normalizedBoundingBox;
  const timeOffset = frame.timeOffset;
  console.log(
    `Time offset for the first frame: ${timeOffset.seconds || 0}` +
      `.${(timeOffset.nanos / 1e6).toFixed(0)}s`
  );
  console.log('Bounding box position:');
  console.log(` left   :${box.left}`);
  console.log(` top    :${box.top}`);
  console.log(` right  :${box.right}`);
  console.log(` bottom :${box.bottom}`);
});

Python

"""Object tracking in a local video."""
from google.cloud import videointelligence

video_client = videointelligence.VideoIntelligenceServiceClient()
features = [videointelligence.Feature.OBJECT_TRACKING]

with io.open(path, "rb") as file:
    input_content = file.read()

operation = video_client.annotate_video(
    request={"features": features, "input_content": input_content}
)
print("\nProcessing video for object annotations.")

result = operation.result(timeout=500)
print("\nFinished processing.\n")

# The first result is retrieved because a single video was processed.
object_annotations = result.annotation_results[0].object_annotations

# Get only the first annotation for demo purposes.
object_annotation = object_annotations[0]
print("Entity description: {}".format(object_annotation.entity.description))
if object_annotation.entity.entity_id:
    print("Entity id: {}".format(object_annotation.entity.entity_id))

print(
    "Segment: {}s to {}s".format(
        object_annotation.segment.start_time_offset.seconds
        + object_annotation.segment.start_time_offset.microseconds / 1e6,
        object_annotation.segment.end_time_offset.seconds
        + object_annotation.segment.end_time_offset.microseconds / 1e6,
    )
)

print("Confidence: {}".format(object_annotation.confidence))

# Here we print only the bounding box of the first frame in this segment
frame = object_annotation.frames[0]
box = frame.normalized_bounding_box
print(
    "Time offset of the first frame: {}s".format(
        frame.time_offset.seconds + frame.time_offset.microseconds / 1e6
    )
)
print("Bounding box position:")
print("\tleft  : {}".format(box.left))
print("\ttop   : {}".format(box.top))
print("\tright : {}".format(box.right))
print("\tbottom: {}".format(box.bottom))
print("\n")

Idiomas adicionales

C#: Sigue las instrucciones de configuración de C# en la página Bibliotecas cliente y, luego, visita la documentación de referencia de Video Intelligence para .NET.

PHP: Sigue las instrucciones de configuración de PHP en la página Bibliotecas cliente y, luego, visita la documentación de referencia de Video Intelligence para PHP.

Ruby: Sigue las instrucciones de configuración de Ruby en la página Bibliotecas cliente y, luego, visita la documentación de referencia de Video Intelligence para Ruby.