Cette page a été traduite par l'API Cloud Translation.

Reconnaître du texte

La détection de texte effectue une reconnaissance optique des caractères (OCR) Cette opération détecte et extrait le texte d'une vidéo d'entrée.

La détection de texte est disponible pour toutes les langues compatibles avec l'API Cloud Vision.

Effectuer une requête de détection de texte pour une vidéo sur Cloud Storage

Les exemples suivants illustrent la détection de texte appliquée à un fichier hébergé dans Cloud Storage.

REST

Envoyer une requête d'annotation vidéo

Vous trouverez ci-dessous la procédure à suivre pour envoyer une requête à la méthode videos:annotate. L'exemple utilise Google Cloud CLI pour créer un jeton d'accès. Pour obtenir des instructions sur l'installation de gcloud CLI, consultez le démarrage rapide de l'API Video Intelligence.

Avant d'utiliser les données de requête ci-dessous, effectuez les remplacements suivants :

INPUT_URI : bucket Cloud Storage contenant le fichier que vous souhaitez annoter, y compris son nom. Doit commencer par gs://.
Par exemple, "inputUri": "gs://cloud-videointelligence-demo/assistant.mp4",
LANGUAGE_CODE : [Facultatif] Par exemple, "en-US"
PROJECT_NUMBER : identifiant numérique de votre projet Google Cloud

Méthode HTTP et URL :

POST https://videointelligence.googleapis.com/v1/videos:annotate

Corps JSON de la requête :

{
  "inputUri": "INPUT_URI",
  "features": ["TEXT_DETECTION"],
  "videoContext": {
    "textDetectionConfig": {
      "languageHints": ["LANGUAGE_CODE"]
    }
  }
}

Pour envoyer votre requête, développez l'une des options suivantes :

curl (Linux, macOS ou Cloud Shell)

Remarque : La commande suivante suppose que vous êtes connecté à la CLI gcloud avec votre compte utilisateur en exécutant la commande gcloud init ou gcloud auth login, ou en utilisant Cloud Shell, qui vous connecte automatiquement à la CLI gcloud. Vous pouvez exécuter la commande gcloud auth list pour vérifier quel est le compte actuellement actif.

Enregistrez le corps de la requête dans un fichier nommé request.json, puis exécutez la commande suivante :

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: PROJECT_NUMBER" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://videointelligence.googleapis.com/v1/videos:annotate"

PowerShell (Windows)

Remarque : La commande suivante suppose que vous vous êtes connecté à la CLI gcloud avec votre compte utilisateur en exécutant la commande gcloud init ou gcloud auth login. Vous pouvez exécuter la commande gcloud auth list pour vérifier quel est le compte actuellement actif.

Enregistrez le corps de la requête dans un fichier nommé request.json, puis exécutez la commande suivante :

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_NUMBER" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://videointelligence.googleapis.com/v1/videos:annotate" | Select-Object -Expand Content

Vous devriez recevoir une réponse JSON de ce type :

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/operations/OPERATION_ID"
}

Si la réponse aboutit, l'API Video Intelligence renvoie le name de votre opération. L'exemple ci-dessus montre un exemple de ce type de réponse, où project-number est le numéro de votre projet et operation-id est l'ID de l'opération de longue durée créée pour la requête.

PROJECT_NUMBER : numéro de votre projet.
LOCATION_ID : région cloud dans laquelle l'annotation doit avoir lieu. Les régions cloud compatibles sont les suivantes : us-east1, us-west1, europe-west1 et asia-east1. Si aucune région n'est spécifiée, une région sera déterminée en fonction de l'emplacement du fichier vidéo.
OPERATION_ID : ID de l'opération de longue durée créée pour la requête, qui est fourni dans la réponse renvoyée au démarrage de l'opération, par exemple 12345...

Obtenir des résultats d'annotation

Pour récupérer le résultat de l'opération, exécutez une requête GET en utilisant le nom d'opération renvoyé par l'appel à videos:annotate, comme indiqué dans l'exemple suivant.

Avant d'utiliser les données de requête ci-dessous, effectuez les remplacements suivants :

OPERATION_NAME: nom de l'opération tel qu'il a été renvoyé par l'API Video Intelligence. Il est au format suivant : projects/PROJECT_NUMBER/locations/LOCATION_ID/operations/OPERATION_ID.
PROJECT_NUMBER : identifiant numérique de votre projet Google Cloud

Méthode HTTP et URL :

GET https://videointelligence.googleapis.com/v1/OPERATION_NAME

Pour envoyer votre requête, développez l'une des options suivantes :

curl (Linux, macOS ou Cloud Shell)

Exécutez la commande suivante :

curl -X GET \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: PROJECT_NUMBER" \
     "https://videointelligence.googleapis.com/v1/OPERATION_NAME"

PowerShell (Windows)

Exécutez la commande suivante :

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_NUMBER" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://videointelligence.googleapis.com/v1/OPERATION_NAME" | Select-Object -Expand Content

Vous devriez recevoir une réponse JSON de ce type :

Réponse

"textAnnotations": [
  {
    "text": "Hair Salon",
    "segments": [
      {
        "segment": {
          "startTimeOffset": "0.833333s",
          "endTimeOffset": "2.291666s"
        },
        "confidence": 0.99438506,
        "frames": [
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.7015625,
                  "y": 0.59583336
                },
                {
                  "x": 0.7984375,
                  "y": 0.59583336
                },
                {
                  "x": 0.7984375,
                  "y": 0.64166665
                },
                {
                  "x": 0.7015625,
                  "y": 0.64166665
                }
              ]
            },
            "timeOffset": "0.833333s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "1.041666s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "1.250s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6319444
                },
                {
                  "x": 0.70234376,
                  "y": 0.6319444
                }
              ]
            },
            "timeOffset": "1.458333s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "1.666666s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "1.875s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "2.083333s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "2.291666s"
          }
        ]
      }
    ]
  },
  {
    "text": "\"Sure, give me one second.\"",
    "segments": [
      {
        "segment": {
          "startTimeOffset": "10.625s",
          "endTimeOffset": "13.333333s"
        },
        "confidence": 0.98716676,
        "frames": [
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.60859376,
                  "y": 0.59583336
                },
                {
                  "x": 0.8952959,
                  "y": 0.5903528
                },
                {
                  "x": 0.89560676,
                  "y": 0.6417387
                },
                {
                  "x": 0.60890454,
                  "y": 0.64721924
                }
              ]
            },
            "timeOffset": "10.625s"
          },
  ...

    ]
  }

Les annotations de détection de texte sont renvoyées sous la forme d'une liste textAnnotations. Remarque : Le champ done n'est renvoyé que lorsque sa valeur est True. Il n'est pas inclus dans les réponses pour lesquelles l'opération n'est pas terminée.

Télécharger les résultats des annotations

Copiez l'annotation de la source vers le bucket de destination (consultez la page Copier des fichiers et des objets) :

gcloud storage cp gcs_uri gs://my-bucket

Remarque : Si l'URI GCS de sortie est fourni par l'utilisateur, l'annotation est stockée dans cet URI.

Go


import (
	"context"
	"fmt"
	"io"

	video "cloud.google.com/go/videointelligence/apiv1"
	videopb "cloud.google.com/go/videointelligence/apiv1/videointelligencepb"
	"github.com/golang/protobuf/ptypes"
)

// textDetectionGCS analyzes a video and extracts the text from the video's audio.
func textDetectionGCS(w io.Writer, gcsURI string) error {
	// gcsURI := "gs://python-docs-samples-tests/video/googlework_short.mp4"

	ctx := context.Background()

	// Creates a client.
	client, err := video.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("video.NewClient: %w", err)
	}
	defer client.Close()

	op, err := client.AnnotateVideo(ctx, &videopb.AnnotateVideoRequest{
		InputUri: gcsURI,
		Features: []videopb.Feature{
			videopb.Feature_TEXT_DETECTION,
		},
	})
	if err != nil {
		return fmt.Errorf("AnnotateVideo: %w", err)
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		return fmt.Errorf("Wait: %w", err)
	}

	// Only one video was processed, so get the first result.
	result := resp.GetAnnotationResults()[0]

	for _, annotation := range result.TextAnnotations {
		fmt.Fprintf(w, "Text: %q\n", annotation.GetText())

		// Get the first text segment.
		segment := annotation.GetSegments()[0]
		start, _ := ptypes.Duration(segment.GetSegment().GetStartTimeOffset())
		end, _ := ptypes.Duration(segment.GetSegment().GetEndTimeOffset())
		fmt.Fprintf(w, "\tSegment: %v to %v\n", start, end)

		fmt.Fprintf(w, "\tConfidence: %f\n", segment.GetConfidence())

		// Show the result for the first frame in this segment.
		frame := segment.GetFrames()[0]
		seconds := float32(frame.GetTimeOffset().GetSeconds())
		nanos := float32(frame.GetTimeOffset().GetNanos())
		fmt.Fprintf(w, "\tTime offset of the first frame: %fs\n", seconds+nanos/1e9)

		fmt.Fprintf(w, "\tRotated bounding box vertices:\n")
		for _, vertex := range frame.GetRotatedBoundingBox().GetVertices() {
			fmt.Fprintf(w, "\t\tVertex x=%f, y=%f\n", vertex.GetX(), vertex.GetY())
		}
	}

	return nil
}

Java

Pour vous authentifier auprès de Video Intelligence, configurez les Identifiants par défaut de l'application. Pour en savoir plus, consultez Configurer l'authentification pour un environnement de développement local.

/**
 * Detect Text in a video.
 *
 * @param gcsUri the path to the video file to analyze.
 */
public static VideoAnnotationResults detectTextGcs(String gcsUri) throws Exception {
  try (VideoIntelligenceServiceClient client = VideoIntelligenceServiceClient.create()) {
    // Create the request
    AnnotateVideoRequest request =
        AnnotateVideoRequest.newBuilder()
            .setInputUri(gcsUri)
            .addFeatures(Feature.TEXT_DETECTION)
            .build();

    // asynchronously perform object tracking on videos
    OperationFuture<AnnotateVideoResponse, AnnotateVideoProgress> future =
        client.annotateVideoAsync(request);

    System.out.println("Waiting for operation to complete...");
    // The first result is retrieved because a single video was processed.
    AnnotateVideoResponse response = future.get(300, TimeUnit.SECONDS);
    VideoAnnotationResults results = response.getAnnotationResults(0);

    // Get only the first annotation for demo purposes.
    TextAnnotation annotation = results.getTextAnnotations(0);
    System.out.println("Text: " + annotation.getText());

    // Get the first text segment.
    TextSegment textSegment = annotation.getSegments(0);
    System.out.println("Confidence: " + textSegment.getConfidence());
    // For the text segment display it's time offset
    VideoSegment videoSegment = textSegment.getSegment();
    Duration startTimeOffset = videoSegment.getStartTimeOffset();
    Duration endTimeOffset = videoSegment.getEndTimeOffset();
    // Display the offset times in seconds, 1e9 is part of the formula to convert nanos to seconds
    System.out.println(
        String.format(
            "Start time: %.2f", startTimeOffset.getSeconds() + startTimeOffset.getNanos() / 1e9));
    System.out.println(
        String.format(
            "End time: %.2f", endTimeOffset.getSeconds() + endTimeOffset.getNanos() / 1e9));

    // Show the first result for the first frame in the segment.
    TextFrame textFrame = textSegment.getFrames(0);
    Duration timeOffset = textFrame.getTimeOffset();
    System.out.println(
        String.format(
            "Time offset for the first frame: %.2f",
            timeOffset.getSeconds() + timeOffset.getNanos() / 1e9));

    // Display the rotated bounding box for where the text is on the frame.
    System.out.println("Rotated Bounding Box Vertices:");
    List<NormalizedVertex> vertices = textFrame.getRotatedBoundingBox().getVerticesList();
    for (NormalizedVertex normalizedVertex : vertices) {
      System.out.println(
          String.format(
              "\tVertex.x: %.2f, Vertex.y: %.2f",
              normalizedVertex.getX(), normalizedVertex.getY()));
    }
    return results;
  }
}

Node.js

// Imports the Google Cloud Video Intelligence library
const Video = require('@google-cloud/video-intelligence');
// Creates a client
const video = new Video.VideoIntelligenceServiceClient();

/**
 * TODO(developer): Uncomment the following line before running the sample.
 */
// const gcsUri = 'GCS URI of the video to analyze, e.g. gs://my-bucket/my-video.mp4';

const request = {
  inputUri: gcsUri,
  features: ['TEXT_DETECTION'],
};
// Detects text in a video
const [operation] = await video.annotateVideo(request);
const results = await operation.promise();
console.log('Waiting for operation to complete...');
// Gets annotations for video
const textAnnotations = results[0].annotationResults[0].textAnnotations;
textAnnotations.forEach(textAnnotation => {
  console.log(`Text ${textAnnotation.text} occurs at:`);
  textAnnotation.segments.forEach(segment => {
    const time = segment.segment;
    console.log(
      ` Start: ${time.startTimeOffset.seconds || 0}.${(
        time.startTimeOffset.nanos / 1e6
      ).toFixed(0)}s`
    );
    console.log(
      ` End: ${time.endTimeOffset.seconds || 0}.${(
        time.endTimeOffset.nanos / 1e6
      ).toFixed(0)}s`
    );
    console.log(` Confidence: ${segment.confidence}`);
    segment.frames.forEach(frame => {
      const timeOffset = frame.timeOffset;
      console.log(
        `Time offset for the frame: ${timeOffset.seconds || 0}` +
          `.${(timeOffset.nanos / 1e6).toFixed(0)}s`
      );
      console.log('Rotated Bounding Box Vertices:');
      frame.rotatedBoundingBox.vertices.forEach(vertex => {
        console.log(`Vertex.x:${vertex.x}, Vertex.y:${vertex.y}`);
      });
    });
  });
});

Python

"""Detect text in a video stored on GCS."""
from google.cloud import videointelligence

video_client = videointelligence.VideoIntelligenceServiceClient()
features = [videointelligence.Feature.TEXT_DETECTION]

operation = video_client.annotate_video(
    request={"features": features, "input_uri": input_uri}
)

print("\nProcessing video for text detection.")
result = operation.result(timeout=600)

# The first result is retrieved because a single video was processed.
annotation_result = result.annotation_results[0]

for text_annotation in annotation_result.text_annotations:
    print("\nText: {}".format(text_annotation.text))

    # Get the first text segment
    text_segment = text_annotation.segments[0]
    start_time = text_segment.segment.start_time_offset
    end_time = text_segment.segment.end_time_offset
    print(
        "start_time: {}, end_time: {}".format(
            start_time.seconds + start_time.microseconds * 1e-6,
            end_time.seconds + end_time.microseconds * 1e-6,
        )
    )

    print("Confidence: {}".format(text_segment.confidence))

    # Show the result for the first frame in this segment.
    frame = text_segment.frames[0]
    time_offset = frame.time_offset
    print(
        "Time offset for the first frame: {}".format(
            time_offset.seconds + time_offset.microseconds * 1e-6
        )
    )
    print("Rotated Bounding Box Vertices:")
    for vertex in frame.rotated_bounding_box.vertices:
        print("\tVertex.x: {}, Vertex.y: {}".format(vertex.x, vertex.y))

Langues supplémentaires

C# : Veuillez suivre les instructions de configuration de C# sur la page des bibliothèques clientes, puis consultez la documentation de référence sur Video Intelligence pour .NET.

PHP : Veuillez suivre les instructions de configuration pour PHP sur la page des bibliothèques clientes, puis consultez la documentation de référence sur Video Intelligence pour PHP.

Ruby : Veuillez suivre les instructions de configuration pour Ruby sur la page des bibliothèques clientes, puis consultez la documentation de référence sur Video Intelligence pour Ruby.

Effectuer une requête de détection de texte pour un fichier vidéo local

Les exemples suivants illustrent la détection de texte sur un fichier stocké en local.

REST

Envoyer une requête d'annotation vidéo

Pour effectuer l'annotation d'un fichier vidéo local, veillez à encoder son contenu en base64. Incluez le contenu encodé en base64 dans le champ inputContent de la requête. Pour en savoir plus sur l'encodage du contenu d'un fichier vidéo en base64, consultez la page Encoder en base64.

Vous trouverez ci-dessous la procédure à suivre pour envoyer une requête à la méthode videos:annotate. L'exemple utilise Google Cloud CLI pour créer un jeton d'accès. Pour obtenir des instructions sur l'installation de Google Cloud CLI, consultez le guide de démarrage rapide de l'API Video Intelligence.

Avant d'utiliser les données de requête ci-dessous, effectuez les remplacements suivants :

"inputContent": BASE64_ENCODED_CONTENT
Par exemple :
"UklGRg41AwBBVkkgTElTVAwBAABoZHJsYXZpaDgAAAA1ggAAxPMBAAAAAAAQCAA..."
LANGUAGE_CODE : [Facultatif] Par exemple, "en-US"
PROJECT_NUMBER : identifiant numérique de votre projet Google Cloud

Méthode HTTP et URL :

POST https://videointelligence.googleapis.com/v1/videos:annotate

Corps JSON de la requête :

{
  "inputContent": "BASE64_ENCODED_CONTENT",
  "features": ["TEXT_DETECTION"],
  "videoContext": {
    "textDetectionConfig": {
      "languageHints": ["LANGUAGE_CODE"]
    }
  }
}

Pour envoyer votre requête, développez l'une des options suivantes :

curl (Linux, macOS ou Cloud Shell)

Enregistrez le corps de la requête dans un fichier nommé request.json, puis exécutez la commande suivante :

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: PROJECT_NUMBER" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://videointelligence.googleapis.com/v1/videos:annotate"

PowerShell (Windows)

Enregistrez le corps de la requête dans un fichier nommé request.json, puis exécutez la commande suivante :

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_NUMBER" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://videointelligence.googleapis.com/v1/videos:annotate" | Select-Object -Expand Content

Vous devriez recevoir une réponse JSON de ce type :

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/operations/OPERATION_ID"
}

Si la réponse aboutit, l'API Video Intelligence renvoie le name de votre opération. L'exemple ci-dessus montre un exemple de ce type de réponse, où project-number est le nom de votre projet et operation-id est l'ID de l'opération de longue durée créée pour la requête.

OPERATION_ID : fourni dans la réponse lorsque vous avez démarré l'opération, par exemple 12345...

Obtenir des résultats d'annotation

Pour récupérer le résultat de l'opération, exécutez une requête GET en utilisant le nom d'opération renvoyé par l'appel à videos:annotate, comme indiqué dans l'exemple suivant.

Avant d'utiliser les données de requête ci-dessous, effectuez les remplacements suivants :

PROJECT_NUMBER : identifiant numérique de votre projet Google Cloud

Méthode HTTP et URL :

GET https://videointelligence.googleapis.com/v1/OPERATION_NAME

Pour envoyer votre requête, développez l'une des options suivantes :

curl (Linux, macOS ou Cloud Shell)

Exécutez la commande suivante :

curl -X GET \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: PROJECT_NUMBER" \
     "https://videointelligence.googleapis.com/v1/OPERATION_NAME"

PowerShell (Windows)

Exécutez la commande suivante :

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_NUMBER" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://videointelligence.googleapis.com/v1/OPERATION_NAME" | Select-Object -Expand Content

Vous devriez recevoir une réponse JSON de ce type :

Réponse

"textAnnotations": [
  {
    "text": "Hair Salon",
    "segments": [
      {
        "segment": {
          "startTimeOffset": "0.833333s",
          "endTimeOffset": "2.291666s"
        },
        "confidence": 0.99438506,
        "frames": [
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.7015625,
                  "y": 0.59583336
                },
                {
                  "x": 0.7984375,
                  "y": 0.59583336
                },
                {
                  "x": 0.7984375,
                  "y": 0.64166665
                },
                {
                  "x": 0.7015625,
                  "y": 0.64166665
                }
              ]
            },
            "timeOffset": "0.833333s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "1.041666s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "1.250s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6319444
                },
                {
                  "x": 0.70234376,
                  "y": 0.6319444
                }
              ]
            },
            "timeOffset": "1.458333s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "1.666666s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "1.875s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "2.083333s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "2.291666s"
          }
        ]
      }
    ]
  },
  {
    "text": "\"Sure, give me one second.\"",
    "segments": [
      {
        "segment": {
          "startTimeOffset": "10.625s",
          "endTimeOffset": "13.333333s"
        },
        "confidence": 0.98716676,
        "frames": [
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.60859376,
                  "y": 0.59583336
                },
                {
                  "x": 0.8952959,
                  "y": 0.5903528
                },
                {
                  "x": 0.89560676,
                  "y": 0.6417387
                },
                {
                  "x": 0.60890454,
                  "y": 0.64721924
                }
              ]
            },
            "timeOffset": "10.625s"
          },
  ...

    ]
}

Les annotations de détection de texte sont renvoyées sous forme de liste textAnnotations. Remarque : Le champ done n'est renvoyé que lorsque sa valeur est True. Il n'est pas inclus dans les réponses pour lesquelles l'opération n'est pas terminée.

Go


import (
	"context"
	"fmt"
	"io"
	"os"

	video "cloud.google.com/go/videointelligence/apiv1"
	videopb "cloud.google.com/go/videointelligence/apiv1/videointelligencepb"
	"github.com/golang/protobuf/ptypes"
)

// textDetection analyzes a video and extracts the text from the video's audio.
func textDetection(w io.Writer, filename string) error {
	// filename := "../testdata/googlework_short.mp4"

	ctx := context.Background()

	// Creates a client.
	client, err := video.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("video.NewClient: %w", err)
	}
	defer client.Close()

	fileBytes, err := os.ReadFile(filename)
	if err != nil {
		return fmt.Errorf("os.ReadFile: %w", err)
	}

	op, err := client.AnnotateVideo(ctx, &videopb.AnnotateVideoRequest{
		InputContent: fileBytes,
		Features: []videopb.Feature{
			videopb.Feature_TEXT_DETECTION,
		},
	})
	if err != nil {
		return fmt.Errorf("AnnotateVideo: %w", err)
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		return fmt.Errorf("Wait: %w", err)
	}

	// Only one video was processed, so get the first result.
	result := resp.GetAnnotationResults()[0]

	for _, annotation := range result.TextAnnotations {
		fmt.Fprintf(w, "Text: %q\n", annotation.GetText())

		// Get the first text segment.
		segment := annotation.GetSegments()[0]
		start, _ := ptypes.Duration(segment.GetSegment().GetStartTimeOffset())
		end, _ := ptypes.Duration(segment.GetSegment().GetEndTimeOffset())
		fmt.Fprintf(w, "\tSegment: %v to %v\n", start, end)

		fmt.Fprintf(w, "\tConfidence: %f\n", segment.GetConfidence())

		// Show the result for the first frame in this segment.
		frame := segment.GetFrames()[0]
		seconds := float32(frame.GetTimeOffset().GetSeconds())
		nanos := float32(frame.GetTimeOffset().GetNanos())
		fmt.Fprintf(w, "\tTime offset of the first frame: %fs\n", seconds+nanos/1e9)

		fmt.Fprintf(w, "\tRotated bounding box vertices:\n")
		for _, vertex := range frame.GetRotatedBoundingBox().GetVertices() {
			fmt.Fprintf(w, "\t\tVertex x=%f, y=%f\n", vertex.GetX(), vertex.GetY())
		}
	}

	return nil
}

Java

/**
 * Detect text in a video.
 *
 * @param filePath the path to the video file to analyze.
 */
public static VideoAnnotationResults detectText(String filePath) throws Exception {
  try (VideoIntelligenceServiceClient client = VideoIntelligenceServiceClient.create()) {
    // Read file
    Path path = Paths.get(filePath);
    byte[] data = Files.readAllBytes(path);

    // Create the request
    AnnotateVideoRequest request =
        AnnotateVideoRequest.newBuilder()
            .setInputContent(ByteString.copyFrom(data))
            .addFeatures(Feature.TEXT_DETECTION)
            .build();

    // asynchronously perform object tracking on videos
    OperationFuture<AnnotateVideoResponse, AnnotateVideoProgress> future =
        client.annotateVideoAsync(request);

    System.out.println("Waiting for operation to complete...");
    // The first result is retrieved because a single video was processed.
    AnnotateVideoResponse response = future.get(300, TimeUnit.SECONDS);
    VideoAnnotationResults results = response.getAnnotationResults(0);

    // Get only the first annotation for demo purposes.
    TextAnnotation annotation = results.getTextAnnotations(0);
    System.out.println("Text: " + annotation.getText());

    // Get the first text segment.
    TextSegment textSegment = annotation.getSegments(0);
    System.out.println("Confidence: " + textSegment.getConfidence());
    // For the text segment display it's time offset
    VideoSegment videoSegment = textSegment.getSegment();
    Duration startTimeOffset = videoSegment.getStartTimeOffset();
    Duration endTimeOffset = videoSegment.getEndTimeOffset();
    // Display the offset times in seconds, 1e9 is part of the formula to convert nanos to seconds
    System.out.println(
        String.format(
            "Start time: %.2f", startTimeOffset.getSeconds() + startTimeOffset.getNanos() / 1e9));
    System.out.println(
        String.format(
            "End time: %.2f", endTimeOffset.getSeconds() + endTimeOffset.getNanos() / 1e9));

    // Show the first result for the first frame in the segment.
    TextFrame textFrame = textSegment.getFrames(0);
    Duration timeOffset = textFrame.getTimeOffset();
    System.out.println(
        String.format(
            "Time offset for the first frame: %.2f",
            timeOffset.getSeconds() + timeOffset.getNanos() / 1e9));

    // Display the rotated bounding box for where the text is on the frame.
    System.out.println("Rotated Bounding Box Vertices:");
    List<NormalizedVertex> vertices = textFrame.getRotatedBoundingBox().getVerticesList();
    for (NormalizedVertex normalizedVertex : vertices) {
      System.out.println(
          String.format(
              "\tVertex.x: %.2f, Vertex.y: %.2f",
              normalizedVertex.getX(), normalizedVertex.getY()));
    }
    return results;
  }
}

Node.js

// Imports the Google Cloud Video Intelligence library + Node's fs library
const Video = require('@google-cloud/video-intelligence');
const fs = require('fs');
const util = require('util');
// Creates a client
const video = new Video.VideoIntelligenceServiceClient();

/**
 * TODO(developer): Uncomment the following line before running the sample.
 */
// const path = 'Local file to analyze, e.g. ./my-file.mp4';

// Reads a local video file and converts it to base64
const file = await util.promisify(fs.readFile)(path);
const inputContent = file.toString('base64');

const request = {
  inputContent: inputContent,
  features: ['TEXT_DETECTION'],
};
// Detects text in a video
const [operation] = await video.annotateVideo(request);
const results = await operation.promise();
console.log('Waiting for operation to complete...');

// Gets annotations for video
const textAnnotations = results[0].annotationResults[0].textAnnotations;
textAnnotations.forEach(textAnnotation => {
  console.log(`Text ${textAnnotation.text} occurs at:`);
  textAnnotation.segments.forEach(segment => {
    const time = segment.segment;
    if (time.startTimeOffset.seconds === undefined) {
      time.startTimeOffset.seconds = 0;
    }
    if (time.startTimeOffset.nanos === undefined) {
      time.startTimeOffset.nanos = 0;
    }
    if (time.endTimeOffset.seconds === undefined) {
      time.endTimeOffset.seconds = 0;
    }
    if (time.endTimeOffset.nanos === undefined) {
      time.endTimeOffset.nanos = 0;
    }
    console.log(
      `\tStart: ${time.startTimeOffset.seconds || 0}` +
        `.${(time.startTimeOffset.nanos / 1e6).toFixed(0)}s`
    );
    console.log(
      `\tEnd: ${time.endTimeOffset.seconds || 0}.` +
        `${(time.endTimeOffset.nanos / 1e6).toFixed(0)}s`
    );
    console.log(`\tConfidence: ${segment.confidence}`);
    segment.frames.forEach(frame => {
      const timeOffset = frame.timeOffset;
      console.log(
        `Time offset for the frame: ${timeOffset.seconds || 0}` +
          `.${(timeOffset.nanos / 1e6).toFixed(0)}s`
      );
      console.log('Rotated Bounding Box Vertices:');
      frame.rotatedBoundingBox.vertices.forEach(vertex => {
        console.log(`Vertex.x:${vertex.x}, Vertex.y:${vertex.y}`);
      });
    });
  });
});

Python

import io

from google.cloud import videointelligence

def video_detect_text(path):
    """Detect text in a local video."""
    video_client = videointelligence.VideoIntelligenceServiceClient()
    features = [videointelligence.Feature.TEXT_DETECTION]
    video_context = videointelligence.VideoContext()

    with io.open(path, "rb") as file:
        input_content = file.read()

    operation = video_client.annotate_video(
        request={
            "features": features,
            "input_content": input_content,
            "video_context": video_context,
        }
    )

    print("\nProcessing video for text detection.")
    result = operation.result(timeout=300)

    # The first result is retrieved because a single video was processed.
    annotation_result = result.annotation_results[0]

    for text_annotation in annotation_result.text_annotations:
        print("\nText: {}".format(text_annotation.text))

        # Get the first text segment
        text_segment = text_annotation.segments[0]
        start_time = text_segment.segment.start_time_offset
        end_time = text_segment.segment.end_time_offset
        print(
            "start_time: {}, end_time: {}".format(
                start_time.seconds + start_time.microseconds * 1e-6,
                end_time.seconds + end_time.microseconds * 1e-6,
            )
        )

        print("Confidence: {}".format(text_segment.confidence))

        # Show the result for the first frame in this segment.
        frame = text_segment.frames[0]
        time_offset = frame.time_offset
        print(
            "Time offset for the first frame: {}".format(
                time_offset.seconds + time_offset.microseconds * 1e-6
            )
        )
        print("Rotated Bounding Box Vertices:")
        for vertex in frame.rotated_bounding_box.vertices:
            print("\tVertex.x: {}, Vertex.y: {}".format(vertex.x, vertex.y))

Langues supplémentaires

C# : Veuillez suivre les instructions de configuration de C# sur la page des bibliothèques clientes, puis consultez la documentation de référence sur Video Intelligence pour .NET.

PHP : Veuillez suivre les instructions de configuration pour PHP sur la page des bibliothèques clientes, puis consultez la documentation de référence sur Video Intelligence pour PHP.

Ruby : Veuillez suivre les instructions de configuration pour Ruby sur la page des bibliothèques clientes, puis consultez la documentation de référence sur Video Intelligence pour Ruby.

Reconnaître du texte Restez organisé à l'aide des collections Enregistrez et classez les contenus selon vos préférences.

Effectuer une requête de détection de texte pour une vidéo sur Cloud Storage

REST

Envoyer une requête d'annotation vidéo

curl (Linux, macOS ou Cloud Shell)

PowerShell (Windows)

Obtenir des résultats d'annotation

curl (Linux, macOS ou Cloud Shell)

PowerShell (Windows)

Réponse

Télécharger les résultats des annotations

Go

Java

Node.js

Python

Langues supplémentaires

Effectuer une requête de détection de texte pour un fichier vidéo local

REST

Envoyer une requête d'annotation vidéo

curl (Linux, macOS ou Cloud Shell)

PowerShell (Windows)

Obtenir des résultats d'annotation

curl (Linux, macOS ou Cloud Shell)

PowerShell (Windows)

Réponse

Go

Java

Node.js

Python

Langues supplémentaires

Reconnaître du texte