Detect people

The following code sample demonstrates how to detect people in a video file using Video Intelligence API.

Video Intelligence can detect the presence of humans in a video file and track individuals across a video or video segment.

Person detection from a file in Cloud Storage

The following demonstrates how to send an annotation request to Video Intelligence with the person detection feature.

REST & CMD LINE

Send video annotation request

The following shows how to send a POST request to the videos:annotate method. The example uses the access token for a service account set up for the project using the Cloud SDK. For instructions on installing the Cloud SDK, setting up a project with a service account, and obtaining an access token, see the Video Intelligence API Quickstart. See also PersonDetectionConfig.

Before using any of the request data below, make the following replacements:

  • INPUT_URI: a Cloud Storage bucket that contains the file you want to annotate, including the file name. Must start with gs://.
    For example:
    "inputUri": "gs://cloud-samples-data/video/googlework_short.mp4"

HTTP method and URL:

POST https://videointelligence.googleapis.com/v1/videos:annotate

Request JSON body:

{
  "inputUri": "INPUT_URI",
  "features": ["PERSON_DETECTION"],
  "videoContext": {
    "personDetectionConfig": {
      "includeBoundingBoxes": true,
      "includePoseLandmarks": true,
      "includeAttributes": true
     }
  }
}

To send your request, expand one of these options:

You should receive a JSON response similar to the following:

If the response is successful, the Video Intelligence API returns the name for your operation. The above shows an example of such a response, where:

  • PROJECT_NUMBER: the number of your project
  • LOCATION_ID: the Cloud region where annotation should take place. Supported cloud regions are: us-east1, us-west1, europe-west1, asia-east1. If no region is specified, a region will be determined based on video file location.
  • OPERATION_ID: the ID of the long running operation created for the request and provided in the response when you started the operation, for example 12345...

Get annotation results

To retrieve the result of the operation, make a GET request, using the operation name returned from the call to videos:annotate, as shown in the following example.

Before using any of the request data below, make the following replacements:

  • OPERATION_NAME: the name of the operation as returned by Video Intelligence API. The operation name has the format projects/PROJECT_NUMBER/locations/LOCATION_ID/operations/OPERATION_ID

HTTP method and URL:

GET https://videointelligence.googleapis.com/v1/OPERATION_NAME

To send your request, expand one of these options:

You should receive a JSON response similar to the following:

Shot detection annotations are returned as a shotAnnotations list. Note: The done field is only returned when its value is True. It's not included in responses for which the operation has not completed.

Download annotation results

Copy the annotation from the source to the destination bucket: (see Copy files and objects)

gsutil cp gcs_uri gs://my-bucket

Note: If the output gcs uri is provided by the user, then the annotation is stored in that gcs uri.

Java


import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.videointelligence.v1.AnnotateVideoProgress;
import com.google.cloud.videointelligence.v1.AnnotateVideoRequest;
import com.google.cloud.videointelligence.v1.AnnotateVideoResponse;
import com.google.cloud.videointelligence.v1.DetectedAttribute;
import com.google.cloud.videointelligence.v1.DetectedLandmark;
import com.google.cloud.videointelligence.v1.Feature;
import com.google.cloud.videointelligence.v1.PersonDetectionAnnotation;
import com.google.cloud.videointelligence.v1.PersonDetectionConfig;
import com.google.cloud.videointelligence.v1.TimestampedObject;
import com.google.cloud.videointelligence.v1.Track;
import com.google.cloud.videointelligence.v1.VideoAnnotationResults;
import com.google.cloud.videointelligence.v1.VideoContext;
import com.google.cloud.videointelligence.v1.VideoIntelligenceServiceClient;
import com.google.cloud.videointelligence.v1.VideoSegment;

public class DetectPersonGcs {

  public static void detectPersonGcs() throws Exception {
    // TODO(developer): Replace these variables before running the sample.
    String gcsUri = "gs://cloud-samples-data/video/googlework_short.mp4";
    detectPersonGcs(gcsUri);
  }

  // Detects people in a video stored in Google Cloud Storage using
  // the Cloud Video Intelligence API.
  public static void detectPersonGcs(String gcsUri) throws Exception {
    try (VideoIntelligenceServiceClient videoIntelligenceServiceClient =
        VideoIntelligenceServiceClient.create()) {
      // Reads a local video file and converts it to base64.

      PersonDetectionConfig personDetectionConfig =
          PersonDetectionConfig.newBuilder()
              // Must set includeBoundingBoxes to true to get poses and attributes.
              .setIncludeBoundingBoxes(true)
              .setIncludePoseLandmarks(true)
              .setIncludeAttributes(true)
              .build();
      VideoContext videoContext =
          VideoContext.newBuilder().setPersonDetectionConfig(personDetectionConfig).build();

      AnnotateVideoRequest request =
          AnnotateVideoRequest.newBuilder()
              .setInputUri(gcsUri)
              .addFeatures(Feature.PERSON_DETECTION)
              .setVideoContext(videoContext)
              .build();

      // Detects people in a video
      OperationFuture<AnnotateVideoResponse, AnnotateVideoProgress> future =
          videoIntelligenceServiceClient.annotateVideoAsync(request);

      System.out.println("Waiting for operation to complete...");
      AnnotateVideoResponse response = future.get();
      // Get the first response, since we sent only one video.
      VideoAnnotationResults annotationResult = response.getAnnotationResultsList().get(0);

      // Annotations for list of people detected, tracked and recognized in video.
      for (PersonDetectionAnnotation personDetectionAnnotation :
          annotationResult.getPersonDetectionAnnotationsList()) {
        System.out.print("Person detected:\n");
        for (Track track : personDetectionAnnotation.getTracksList()) {
          VideoSegment segment = track.getSegment();
          System.out.printf(
              "\tStart: %d.%.0fs\n",
              segment.getStartTimeOffset().getSeconds(),
              segment.getStartTimeOffset().getNanos() / 1e6);
          System.out.printf(
              "\tEnd: %d.%.0fs\n",
              segment.getEndTimeOffset().getSeconds(), segment.getEndTimeOffset().getNanos() / 1e6);

          // Each segment includes timestamped objects that include characteristic--e.g. clothes,
          // posture of the person detected.
          TimestampedObject firstTimestampedObject = track.getTimestampedObjects(0);

          // Attributes include unique pieces of clothing, poses (i.e., body landmarks)
          // of the person detected.
          for (DetectedAttribute attribute : firstTimestampedObject.getAttributesList()) {
            System.out.printf(
                "\tAttribute: %s; Value: %s\n", attribute.getName(), attribute.getValue());
          }

          // Landmarks in person detection include body parts.
          for (DetectedLandmark attribute : firstTimestampedObject.getLandmarksList()) {
            System.out.printf(
                "\tLandmark: %s; Vertex: %f, %f\n",
                attribute.getName(), attribute.getPoint().getX(), attribute.getPoint().getY());
          }
        }
      }
    }
  }
}

Node.js

/**
 * TODO(developer): Uncomment these variables before running the sample.
 */
// const gcsUri = 'GCS URI of the video to analyze, e.g. gs://my-bucket/my-video.mp4';

// Imports the Google Cloud Video Intelligence library + Node's fs library
const Video = require('@google-cloud/video-intelligence').v1;

// Creates a client
const video = new Video.VideoIntelligenceServiceClient();

async function detectPersonGCS() {
  const request = {
    inputUri: gcsUri,
    features: ['PERSON_DETECTION'],
    videoContext: {
      personDetectionConfig: {
        // Must set includeBoundingBoxes to true to get poses and attributes.
        includeBoundingBoxes: true,
        includePoseLandmarks: true,
        includeAttributes: true,
      },
    },
  };
  // Detects faces in a video
  // We get the first result because we only process 1 video
  const [operation] = await video.annotateVideo(request);
  const results = await operation.promise();
  console.log('Waiting for operation to complete...');

  // Gets annotations for video
  const personAnnotations =
    results[0].annotationResults[0].personDetectionAnnotations;

  for (const {tracks} of personAnnotations) {
    console.log('Person detected:');

    for (const {segment, timestampedObjects} of tracks) {
      if (segment.startTimeOffset.seconds === undefined) {
        segment.startTimeOffset.seconds = 0;
      }
      if (segment.startTimeOffset.nanos === undefined) {
        segment.startTimeOffset.nanos = 0;
      }
      if (segment.endTimeOffset.seconds === undefined) {
        segment.endTimeOffset.seconds = 0;
      }
      if (segment.endTimeOffset.nanos === undefined) {
        segment.endTimeOffset.nanos = 0;
      }
      console.log(
        `\tStart: ${segment.startTimeOffset.seconds}` +
          `.${(segment.startTimeOffset.nanos / 1e6).toFixed(0)}s`
      );
      console.log(
        `\tEnd: ${segment.endTimeOffset.seconds}.` +
          `${(segment.endTimeOffset.nanos / 1e6).toFixed(0)}s`
      );

      // Each segment includes timestamped objects that
      // include characteristic--e.g. clothes, posture
      // of the person detected.
      const [firstTimestampedObject] = timestampedObjects;

      // Attributes include unique pieces of clothing, poses (i.e., body
      // landmarks) of the person detected.
      for (const {name, value} of firstTimestampedObject.attributes) {
        console.log(`\tAttribute: ${name}; Value: ${value}`);
      }

      // Landmarks in person detection include body parts.
      for (const {name, point} of firstTimestampedObject.landmarks) {
        console.log(`\tLandmark: ${name}; Vertex: ${point.x}, ${point.y}`);
      }
    }
  }
}

detectPersonGCS();

Python

from google.cloud import videointelligence_v1 as videointelligence


def detect_person(gcs_uri="gs://YOUR_BUCKET_ID/path/to/your/video.mp4"):
    """Detects people in a video."""

    client = videointelligence.VideoIntelligenceServiceClient()

    # Configure the request
    config = videointelligence.types.PersonDetectionConfig(
        include_bounding_boxes=True,
        include_attributes=True,
        include_pose_landmarks=True,
    )
    context = videointelligence.types.VideoContext(person_detection_config=config)

    # Start the asynchronous request
    operation = client.annotate_video(
        request={
            "features": [videointelligence.Feature.PERSON_DETECTION],
            "input_uri": gcs_uri,
            "video_context": context,
        }
    )

    print("\nProcessing video for person detection annotations.")
    result = operation.result(timeout=300)

    print("\nFinished processing.\n")

    # Retrieve the first result, because a single video was processed.
    annotation_result = result.annotation_results[0]

    for annotation in annotation_result.person_detection_annotations:
        print("Person detected:")
        for track in annotation.tracks:
            print(
                "Segment: {}s to {}s".format(
                    track.segment.start_time_offset.seconds
                    + track.segment.start_time_offset.microseconds / 1e6,
                    track.segment.end_time_offset.seconds
                    + track.segment.end_time_offset.microseconds / 1e6,
                )
            )

            # Each segment includes timestamped objects that include
            # characteristics - -e.g.clothes, posture of the person detected.
            # Grab the first timestamped object
            timestamped_object = track.timestamped_objects[0]
            box = timestamped_object.normalized_bounding_box
            print("Bounding box:")
            print("\tleft  : {}".format(box.left))
            print("\ttop   : {}".format(box.top))
            print("\tright : {}".format(box.right))
            print("\tbottom: {}".format(box.bottom))

            # Attributes include unique pieces of clothing,
            # poses, or hair color.
            print("Attributes:")
            for attribute in timestamped_object.attributes:
                print(
                    "\t{}:{} {}".format(
                        attribute.name, attribute.value, attribute.confidence
                    )
                )

            # Landmarks in person detection include body parts such as
            # left_shoulder, right_ear, and right_ankle
            print("Landmarks:")
            for landmark in timestamped_object.landmarks:
                print(
                    "\t{}: {} (x={}, y={})".format(
                        landmark.name,
                        landmark.confidence,
                        landmark.point.x,  # Normalized vertex
                        landmark.point.y,  # Normalized vertex
                    )
                )

Additional languages

C#: Please follow the C# setup instructions on the client libraries page and then visit the Video Intelligence reference documentation for .NET.

PHP: Please follow the PHP setup instructions on the client libraries page and then visit the Video Intelligence reference documentation for PHP.

Ruby: Please follow the Ruby setup instructions on the client libraries page and then visit the Video Intelligence reference documentation for Ruby.

Person detection from a local file

The following example uses person detection to find entities in a video from a video file uploaded from your local machine.

REST & CMD LINE

Send the process request

To perform person detection on a local video file, base64-encode the contents of the video file. For information on how to base64-encode the contents of a video file, see Base64 Encoding. Then, make a POST request to the videos:annotate method. Include the base64-encoded contents in the inputContent field of the request and specify the PERSON_DETECTION feature.

The following shows an example of a POST request using curl. The example uses the access token for a service account set up for the project using the Cloud SDK. For instructions on installing the Cloud SDK, setting up a project with a service account, and obtaining an access token, see the Video Intelligence API Quickstart

Before using any of the request data below, make the following replacements:

  • inputContent: Local video file in binary format
    For example: 'AAAAGGZ0eXBtcDQyAAAAAGlzb21tcDQyAAGVYW1vb3YAAABsbXZoZAAAAADWvhlR1r4ZUQABX5ABCOxo AAEAAAEAAAAAAA4...'

HTTP method and URL:

POST https://videointelligence.googleapis.com/v1/videos:annotate

Request JSON body:

{
  "inputUri": "Local video file in binary format",
  "features": ["PERSON_DETECTION"],
  "videoContext": {
    "personDetectionConfig": {
      "includeBoundingBoxes": true,
      "includePoseLandmarks": true,
      "includeAttributes": true
     }
  }
}

To send your request, expand one of these options:

You should receive a JSON response similar to the following:

If the request is successful, Video Intelligence the name for your operation. The above shows an example of such a response, where project-number is the number of your project and operation-id is the ID of the long-running operation created for the request.

{ "name": "us-west1.17122464255125931980" }

Get the results

To retrieve the result of the operation, make a GET request to the operations endpoint and specify the name of your operation.

Before using any of the request data below, make the following replacements:

  • oOPERATION_NAME: the name of the operation as returned by Video Intelligence API. The operation name has the format projects/PROJECT_NUMBER/locations/LOCATION_ID/operations/OPERATION_ID

HTTP method and URL:

GET https://videointelligence.googleapis.com/v1/OPERATION_NAME

To send your request, expand one of these options:

You should receive a JSON response similar to the following:

Java


import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.videointelligence.v1.AnnotateVideoProgress;
import com.google.cloud.videointelligence.v1.AnnotateVideoRequest;
import com.google.cloud.videointelligence.v1.AnnotateVideoResponse;
import com.google.cloud.videointelligence.v1.DetectedAttribute;
import com.google.cloud.videointelligence.v1.DetectedLandmark;
import com.google.cloud.videointelligence.v1.Feature;
import com.google.cloud.videointelligence.v1.PersonDetectionAnnotation;
import com.google.cloud.videointelligence.v1.PersonDetectionConfig;
import com.google.cloud.videointelligence.v1.TimestampedObject;
import com.google.cloud.videointelligence.v1.Track;
import com.google.cloud.videointelligence.v1.VideoAnnotationResults;
import com.google.cloud.videointelligence.v1.VideoContext;
import com.google.cloud.videointelligence.v1.VideoIntelligenceServiceClient;
import com.google.cloud.videointelligence.v1.VideoSegment;
import com.google.protobuf.ByteString;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;

public class DetectPerson {

  public static void detectPerson() throws Exception {
    // TODO(developer): Replace these variables before running the sample.
    String localFilePath = "resources/googlework_short.mp4";
    detectPerson(localFilePath);
  }

  // Detects people in a video stored in a local file using the Cloud Video Intelligence API.
  public static void detectPerson(String localFilePath) throws Exception {
    try (VideoIntelligenceServiceClient videoIntelligenceServiceClient =
        VideoIntelligenceServiceClient.create()) {
      // Reads a local video file and converts it to base64.
      Path path = Paths.get(localFilePath);
      byte[] data = Files.readAllBytes(path);
      ByteString inputContent = ByteString.copyFrom(data);

      PersonDetectionConfig personDetectionConfig =
          PersonDetectionConfig.newBuilder()
              // Must set includeBoundingBoxes to true to get poses and attributes.
              .setIncludeBoundingBoxes(true)
              .setIncludePoseLandmarks(true)
              .setIncludeAttributes(true)
              .build();
      VideoContext videoContext =
          VideoContext.newBuilder().setPersonDetectionConfig(personDetectionConfig).build();

      AnnotateVideoRequest request =
          AnnotateVideoRequest.newBuilder()
              .setInputContent(inputContent)
              .addFeatures(Feature.PERSON_DETECTION)
              .setVideoContext(videoContext)
              .build();

      // Detects people in a video
      // We get the first result because only one video is processed.
      OperationFuture<AnnotateVideoResponse, AnnotateVideoProgress> future =
          videoIntelligenceServiceClient.annotateVideoAsync(request);

      System.out.println("Waiting for operation to complete...");
      AnnotateVideoResponse response = future.get();

      // Gets annotations for video
      VideoAnnotationResults annotationResult = response.getAnnotationResultsList().get(0);

      // Annotations for list of people detected, tracked and recognized in video.
      for (PersonDetectionAnnotation personDetectionAnnotation :
          annotationResult.getPersonDetectionAnnotationsList()) {
        System.out.print("Person detected:\n");
        for (Track track : personDetectionAnnotation.getTracksList()) {
          VideoSegment segment = track.getSegment();
          System.out.printf(
              "\tStart: %d.%.0fs\n",
              segment.getStartTimeOffset().getSeconds(),
              segment.getStartTimeOffset().getNanos() / 1e6);
          System.out.printf(
              "\tEnd: %d.%.0fs\n",
              segment.getEndTimeOffset().getSeconds(), segment.getEndTimeOffset().getNanos() / 1e6);

          // Each segment includes timestamped objects that include characteristic--e.g. clothes,
          // posture of the person detected.
          TimestampedObject firstTimestampedObject = track.getTimestampedObjects(0);

          // Attributes include unique pieces of clothing, poses (i.e., body landmarks)
          // of the person detected.
          for (DetectedAttribute attribute : firstTimestampedObject.getAttributesList()) {
            System.out.printf(
                "\tAttribute: %s; Value: %s\n", attribute.getName(), attribute.getValue());
          }

          // Landmarks in person detection include body parts.
          for (DetectedLandmark attribute : firstTimestampedObject.getLandmarksList()) {
            System.out.printf(
                "\tLandmark: %s; Vertex: %f, %f\n",
                attribute.getName(), attribute.getPoint().getX(), attribute.getPoint().getY());
          }
        }
      }
    }
  }
}

Node.js

/**
 * TODO(developer): Uncomment these variables before running the sample.
 */
// const gcsUri = 'GCS URI of the video to analyze, e.g. gs://my-bucket/my-video.mp4';

// Imports the Google Cloud Video Intelligence library + Node's fs library
const Video = require('@google-cloud/video-intelligence').v1;
const fs = require('fs');
// Creates a client
const video = new Video.VideoIntelligenceServiceClient();

/**
 * TODO(developer): Uncomment the following line before running the sample.
 */
// const path = 'Local file to analyze, e.g. ./my-file.mp4';

// Reads a local video file and converts it to base64
const file = fs.readFileSync(path);
const inputContent = file.toString('base64');

async function detectPerson() {
  const request = {
    inputContent: inputContent,
    features: ['PERSON_DETECTION'],
    videoContext: {
      personDetectionConfig: {
        // Must set includeBoundingBoxes to true to get poses and attributes.
        includeBoundingBoxes: true,
        includePoseLandmarks: true,
        includeAttributes: true,
      },
    },
  };
  // Detects faces in a video
  // We get the first result because we only process 1 video
  const [operation] = await video.annotateVideo(request);
  const results = await operation.promise();
  console.log('Waiting for operation to complete...');

  // Gets annotations for video
  const personAnnotations =
    results[0].annotationResults[0].personDetectionAnnotations;

  for (const {tracks} of personAnnotations) {
    console.log('Person detected:');

    for (const {segment, timestampedObjects} of tracks) {
      if (segment.startTimeOffset.seconds === undefined) {
        segment.startTimeOffset.seconds = 0;
      }
      if (segment.startTimeOffset.nanos === undefined) {
        segment.startTimeOffset.nanos = 0;
      }
      if (segment.endTimeOffset.seconds === undefined) {
        segment.endTimeOffset.seconds = 0;
      }
      if (segment.endTimeOffset.nanos === undefined) {
        segment.endTimeOffset.nanos = 0;
      }
      console.log(
        `\tStart: ${segment.startTimeOffset.seconds}` +
          `.${(segment.startTimeOffset.nanos / 1e6).toFixed(0)}s`
      );
      console.log(
        `\tEnd: ${segment.endTimeOffset.seconds}.` +
          `${(segment.endTimeOffset.nanos / 1e6).toFixed(0)}s`
      );

      // Each segment includes timestamped objects that
      // include characteristic--e.g. clothes, posture
      // of the person detected.
      const [firstTimestampedObject] = timestampedObjects;

      // Attributes include unique pieces of clothing, poses (i.e., body
      // landmarks) of the person detected.
      for (const {name, value} of firstTimestampedObject.attributes) {
        console.log(`\tAttribute: ${name}; Value: ${value}`);
      }

      // Landmarks in person detection include body parts.
      for (const {name, point} of firstTimestampedObject.landmarks) {
        console.log(`\tLandmark: ${name}; Vertex: ${point.x}, ${point.y}`);
      }
    }
  }
}

detectPerson();

Python

import io

from google.cloud import videointelligence_v1 as videointelligence


def detect_person(local_file_path="path/to/your/video-file.mp4"):
    """Detects people in a video from a local file."""

    client = videointelligence.VideoIntelligenceServiceClient()

    with io.open(local_file_path, "rb") as f:
        input_content = f.read()

    # Configure the request
    config = videointelligence.types.PersonDetectionConfig(
        include_bounding_boxes=True,
        include_attributes=True,
        include_pose_landmarks=True,
    )
    context = videointelligence.types.VideoContext(person_detection_config=config)

    # Start the asynchronous request
    operation = client.annotate_video(
        request={
            "features": [videointelligence.Feature.PERSON_DETECTION],
            "input_content": input_content,
            "video_context": context,
        }
    )

    print("\nProcessing video for person detection annotations.")
    result = operation.result(timeout=300)

    print("\nFinished processing.\n")

    # Retrieve the first result, because a single video was processed.
    annotation_result = result.annotation_results[0]

    for annotation in annotation_result.person_detection_annotations:
        print("Person detected:")
        for track in annotation.tracks:
            print(
                "Segment: {}s to {}s".format(
                    track.segment.start_time_offset.seconds
                    + track.segment.start_time_offset.microseconds / 1e6,
                    track.segment.end_time_offset.seconds
                    + track.segment.end_time_offset.microseconds / 1e6,
                )
            )

            # Each segment includes timestamped objects that include
            # characteristic - -e.g.clothes, posture of the person detected.
            # Grab the first timestamped object
            timestamped_object = track.timestamped_objects[0]
            box = timestamped_object.normalized_bounding_box
            print("Bounding box:")
            print("\tleft  : {}".format(box.left))
            print("\ttop   : {}".format(box.top))
            print("\tright : {}".format(box.right))
            print("\tbottom: {}".format(box.bottom))

            # Attributes include unique pieces of clothing,
            # poses, or hair color.
            print("Attributes:")
            for attribute in timestamped_object.attributes:
                print(
                    "\t{}:{} {}".format(
                        attribute.name, attribute.value, attribute.confidence
                    )
                )

            # Landmarks in person detection include body parts such as
            # left_shoulder, right_ear, and right_ankle
            print("Landmarks:")
            for landmark in timestamped_object.landmarks:
                print(
                    "\t{}: {} (x={}, y={})".format(
                        landmark.name,
                        landmark.confidence,
                        landmark.point.x,  # Normalized vertex
                        landmark.point.y,  # Normalized vertex
                    )
                )

Additional languages

C#: Please follow the C# setup instructions on the client libraries page and then visit the Video Intelligence reference documentation for .NET.

PHP: Please follow the PHP setup instructions on the client libraries page and then visit the Video Intelligence reference documentation for PHP.

Ruby: Please follow the Ruby setup instructions on the client libraries page and then visit the Video Intelligence reference documentation for Ruby.