Questa pagina è stata tradotta dall'API Cloud Translation.

Riconoscimento del testo

La funzionalità Rilevamento del testo esegue il riconoscimento ottico dei caratteri (OCR), che rileva e estrae il testo all'interno di un video di input.

Il rilevamento del testo è disponibile per tutte le lingue supportate dall'API Cloud Vision.

Richiedere il rilevamento del testo per un video su Cloud Storage

I seguenti esempi dimostrano il rilevamento del testo su un file che si trova in di archiviazione ideale in Cloud Storage.

REST

Inviare una richiesta di annotazione video

Di seguito è riportato un esempio di come inviare una richiesta POST al metodo videos:annotate. L'esempio utilizza Google Cloud CLI per creare un token di accesso. Per istruzioni sull'installazione di gcloud CLI, consulta Guida rapida dell'API Video Intelligence.

Prima di utilizzare i dati della richiesta, apporta le seguenti sostituzioni:

INPUT_URI: un bucket Cloud Storage contenente il file da annotare, incluso il nome del file. Deve iniziano con gs://.
Ad esempio: "inputUri": "gs://cloud-videointelligence-demo/assistant.mp4",
LANGUAGE_CODE: [facoltativo] ad esempio "en-US"
PROJECT_NUMBER: l'identificatore numerico del tuo progetto Google Cloud

Metodo HTTP e URL:

POST https://videointelligence.googleapis.com/v1/videos:annotate

Corpo JSON della richiesta:

{
  "inputUri": "INPUT_URI",
  "features": ["TEXT_DETECTION"],
  "videoContext": {
    "textDetectionConfig": {
      "languageHints": ["LANGUAGE_CODE"]
    }
  }
}

Per inviare la richiesta, espandi una di queste opzioni:

curl (Linux, macOS o Cloud Shell)

Nota: Il comando seguente presuppone che tu abbia eseguito l'accesso a l'interfaccia a riga di comando gcloud con il tuo account utente eseguendo gcloud init o gcloud auth login oppure utilizzando Cloud Shell, che ti consente di accedere automaticamente all'interfaccia a riga di comando gcloud di Google. Puoi controllare l'account attualmente attivo eseguendo gcloud auth list

Salva il corpo della richiesta in un file denominato request.json. ed esegui questo comando:

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: PROJECT_NUMBER" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://videointelligence.googleapis.com/v1/videos:annotate"

PowerShell (Windows)

Nota: il comando seguente presuppone che tu abbia eseguito l'accesso alla CLI gcloud con il tuo account utente eseguendo gcloud init o gcloud auth login . Puoi controllare l'account attualmente attivo eseguendo gcloud auth list.

Salva il corpo della richiesta in un file denominato request.json. ed esegui questo comando:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_NUMBER" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://videointelligence.googleapis.com/v1/videos:annotate" | Select-Object -Expand Content

Dovresti ricevere una risposta JSON simile alla seguente:

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/operations/OPERATION_ID"
}

Se la risposta ha esito positivo, l'API Video Intelligence restituisce name per operativa. Di seguito è riportato un esempio di questa risposta, in cui: project-number è il numero del progetto e operation-id è l'ID dell'operazione di lunga durata creata per la richiesta.

PROJECT_NUMBER: il numero del tuo progetto
LOCATION_ID: la regione Cloud in cui deve essere eseguita l'annotazione posto. Le regioni cloud supportate sono: us-east1, us-west1, europe-west1, asia-east1. Se non viene specificata alcuna regione, verrà determinata una regione in base alla posizione del file video.
OPERATION_ID: l'ID dell'operazione a lunga esecuzione creata della richiesta e fornito nella risposta quando hai avviato operativa, ad esempio 12345...

Ottieni i risultati delle annotazioni

Per recuperare il risultato dell'operazione, effettua una richiesta GET, utilizzando il nome dell'operazione restituito dalla chiamata a videos:annotate, come mostrato nell'esempio seguente.

Prima di utilizzare i dati della richiesta, apporta le seguenti sostituzioni:

OPERATION_NAME: il nome dell'operazione come restituiti dall'API Video Intelligence. Il nome dell'operazione ha il formato projects/PROJECT_NUMBER/locations/LOCATION_ID/operations/OPERATION_ID
PROJECT_NUMBER: l'identificatore numerico del tuo progetto Google Cloud

Metodo HTTP e URL:

GET https://videointelligence.googleapis.com/v1/OPERATION_NAME

Per inviare la richiesta, espandi una delle seguenti opzioni:

curl (Linux, macOS o Cloud Shell)

Nota: il seguente comando presuppone che tu abbia eseguito l'accesso all'interfaccia a riga di comando gcloud con il tuo account utente eseguendo gcloud init o gcloud auth login oppure utilizzando Cloud Shell, che ti consente di accedere automaticamente all'interfaccia a riga di comando gcloud. Puoi controllare l'account attualmente attivo eseguendo gcloud auth list.

Esegui questo comando:

curl -X GET \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: PROJECT_NUMBER" \
     "https://videointelligence.googleapis.com/v1/OPERATION_NAME"

PowerShell (Windows)

Esegui questo comando:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_NUMBER" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://videointelligence.googleapis.com/v1/OPERATION_NAME" | Select-Object -Expand Content

Dovresti ricevere una risposta JSON simile alla seguente:

Risposta

"textAnnotations": [
  {
    "text": "Hair Salon",
    "segments": [
      {
        "segment": {
          "startTimeOffset": "0.833333s",
          "endTimeOffset": "2.291666s"
        },
        "confidence": 0.99438506,
        "frames": [
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.7015625,
                  "y": 0.59583336
                },
                {
                  "x": 0.7984375,
                  "y": 0.59583336
                },
                {
                  "x": 0.7984375,
                  "y": 0.64166665
                },
                {
                  "x": 0.7015625,
                  "y": 0.64166665
                }
              ]
            },
            "timeOffset": "0.833333s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "1.041666s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "1.250s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6319444
                },
                {
                  "x": 0.70234376,
                  "y": 0.6319444
                }
              ]
            },
            "timeOffset": "1.458333s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "1.666666s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "1.875s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "2.083333s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "2.291666s"
          }
        ]
      }
    ]
  },
  {
    "text": "\"Sure, give me one second.\"",
    "segments": [
      {
        "segment": {
          "startTimeOffset": "10.625s",
          "endTimeOffset": "13.333333s"
        },
        "confidence": 0.98716676,
        "frames": [
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.60859376,
                  "y": 0.59583336
                },
                {
                  "x": 0.8952959,
                  "y": 0.5903528
                },
                {
                  "x": 0.89560676,
                  "y": 0.6417387
                },
                {
                  "x": 0.60890454,
                  "y": 0.64721924
                }
              ]
            },
            "timeOffset": "10.625s"
          },
  ...

    ]
  }

Le annotazioni per il rilevamento del testo vengono restituite come elenco textAnnotations. Nota: il campo done viene restituito solo quando il relativo valore è True. Non è incluso nelle risposte per le quali l'operazione non è stata completata.

Scarica i risultati delle annotazioni

Copia l'annotazione dall'origine al bucket di destinazione: (vedi Copiare file e oggetti)

gcloud storage cp gcs_uri gs://my-bucket

Nota: se l'URI GCS di output viene fornito dall'utente, l'annotazione viene archiviata in quell'URI GCS.

Go


import (
	"context"
	"fmt"
	"io"

	video "cloud.google.com/go/videointelligence/apiv1"
	videopb "cloud.google.com/go/videointelligence/apiv1/videointelligencepb"
	"github.com/golang/protobuf/ptypes"
)

// textDetectionGCS analyzes a video and extracts the text from the video's audio.
func textDetectionGCS(w io.Writer, gcsURI string) error {
	// gcsURI := "gs://python-docs-samples-tests/video/googlework_short.mp4"

	ctx := context.Background()

	// Creates a client.
	client, err := video.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("video.NewClient: %w", err)
	}
	defer client.Close()

	op, err := client.AnnotateVideo(ctx, &videopb.AnnotateVideoRequest{
		InputUri: gcsURI,
		Features: []videopb.Feature{
			videopb.Feature_TEXT_DETECTION,
		},
	})
	if err != nil {
		return fmt.Errorf("AnnotateVideo: %w", err)
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		return fmt.Errorf("Wait: %w", err)
	}

	// Only one video was processed, so get the first result.
	result := resp.GetAnnotationResults()[0]

	for _, annotation := range result.TextAnnotations {
		fmt.Fprintf(w, "Text: %q\n", annotation.GetText())

		// Get the first text segment.
		segment := annotation.GetSegments()[0]
		start, _ := ptypes.Duration(segment.GetSegment().GetStartTimeOffset())
		end, _ := ptypes.Duration(segment.GetSegment().GetEndTimeOffset())
		fmt.Fprintf(w, "\tSegment: %v to %v\n", start, end)

		fmt.Fprintf(w, "\tConfidence: %f\n", segment.GetConfidence())

		// Show the result for the first frame in this segment.
		frame := segment.GetFrames()[0]
		seconds := float32(frame.GetTimeOffset().GetSeconds())
		nanos := float32(frame.GetTimeOffset().GetNanos())
		fmt.Fprintf(w, "\tTime offset of the first frame: %fs\n", seconds+nanos/1e9)

		fmt.Fprintf(w, "\tRotated bounding box vertices:\n")
		for _, vertex := range frame.GetRotatedBoundingBox().GetVertices() {
			fmt.Fprintf(w, "\t\tVertex x=%f, y=%f\n", vertex.GetX(), vertex.GetY())
		}
	}

	return nil
}

Java

Per autenticarti a Video Intelligence, configura Credenziali predefinite dell'applicazione. Per ulteriori informazioni, vedi Configura l'autenticazione per un ambiente di sviluppo locale.

/**
 * Detect Text in a video.
 *
 * @param gcsUri the path to the video file to analyze.
 */
public static VideoAnnotationResults detectTextGcs(String gcsUri) throws Exception {
  try (VideoIntelligenceServiceClient client = VideoIntelligenceServiceClient.create()) {
    // Create the request
    AnnotateVideoRequest request =
        AnnotateVideoRequest.newBuilder()
            .setInputUri(gcsUri)
            .addFeatures(Feature.TEXT_DETECTION)
            .build();

    // asynchronously perform object tracking on videos
    OperationFuture<AnnotateVideoResponse, AnnotateVideoProgress> future =
        client.annotateVideoAsync(request);

    System.out.println("Waiting for operation to complete...");
    // The first result is retrieved because a single video was processed.
    AnnotateVideoResponse response = future.get(300, TimeUnit.SECONDS);
    VideoAnnotationResults results = response.getAnnotationResults(0);

    // Get only the first annotation for demo purposes.
    TextAnnotation annotation = results.getTextAnnotations(0);
    System.out.println("Text: " + annotation.getText());

    // Get the first text segment.
    TextSegment textSegment = annotation.getSegments(0);
    System.out.println("Confidence: " + textSegment.getConfidence());
    // For the text segment display it's time offset
    VideoSegment videoSegment = textSegment.getSegment();
    Duration startTimeOffset = videoSegment.getStartTimeOffset();
    Duration endTimeOffset = videoSegment.getEndTimeOffset();
    // Display the offset times in seconds, 1e9 is part of the formula to convert nanos to seconds
    System.out.println(
        String.format(
            "Start time: %.2f", startTimeOffset.getSeconds() + startTimeOffset.getNanos() / 1e9));
    System.out.println(
        String.format(
            "End time: %.2f", endTimeOffset.getSeconds() + endTimeOffset.getNanos() / 1e9));

    // Show the first result for the first frame in the segment.
    TextFrame textFrame = textSegment.getFrames(0);
    Duration timeOffset = textFrame.getTimeOffset();
    System.out.println(
        String.format(
            "Time offset for the first frame: %.2f",
            timeOffset.getSeconds() + timeOffset.getNanos() / 1e9));

    // Display the rotated bounding box for where the text is on the frame.
    System.out.println("Rotated Bounding Box Vertices:");
    List<NormalizedVertex> vertices = textFrame.getRotatedBoundingBox().getVerticesList();
    for (NormalizedVertex normalizedVertex : vertices) {
      System.out.println(
          String.format(
              "\tVertex.x: %.2f, Vertex.y: %.2f",
              normalizedVertex.getX(), normalizedVertex.getY()));
    }
    return results;
  }
}

Node.js

Per autenticarti a Video Intelligence, configura le credenziali predefinite dell'applicazione. Per maggiori informazioni, consulta Configurare l'autenticazione per un ambiente di sviluppo locale.

// Imports the Google Cloud Video Intelligence library
const Video = require('@google-cloud/video-intelligence');
// Creates a client
const video = new Video.VideoIntelligenceServiceClient();

/**
 * TODO(developer): Uncomment the following line before running the sample.
 */
// const gcsUri = 'GCS URI of the video to analyze, e.g. gs://my-bucket/my-video.mp4';

const request = {
  inputUri: gcsUri,
  features: ['TEXT_DETECTION'],
};
// Detects text in a video
const [operation] = await video.annotateVideo(request);
const results = await operation.promise();
console.log('Waiting for operation to complete...');
// Gets annotations for video
const textAnnotations = results[0].annotationResults[0].textAnnotations;
textAnnotations.forEach(textAnnotation => {
  console.log(`Text ${textAnnotation.text} occurs at:`);
  textAnnotation.segments.forEach(segment => {
    const time = segment.segment;
    console.log(
      ` Start: ${time.startTimeOffset.seconds || 0}.${(
        time.startTimeOffset.nanos / 1e6
      ).toFixed(0)}s`
    );
    console.log(
      ` End: ${time.endTimeOffset.seconds || 0}.${(
        time.endTimeOffset.nanos / 1e6
      ).toFixed(0)}s`
    );
    console.log(` Confidence: ${segment.confidence}`);
    segment.frames.forEach(frame => {
      const timeOffset = frame.timeOffset;
      console.log(
        `Time offset for the frame: ${timeOffset.seconds || 0}` +
          `.${(timeOffset.nanos / 1e6).toFixed(0)}s`
      );
      console.log('Rotated Bounding Box Vertices:');
      frame.rotatedBoundingBox.vertices.forEach(vertex => {
        console.log(`Vertex.x:${vertex.x}, Vertex.y:${vertex.y}`);
      });
    });
  });
});

Python

Per autenticarti a Video Intelligence, configura Credenziali predefinite dell'applicazione. Per ulteriori informazioni, vedi Configura l'autenticazione per un ambiente di sviluppo locale.

"""Detect text in a video stored on GCS."""
from google.cloud import videointelligence

video_client = videointelligence.VideoIntelligenceServiceClient()
features = [videointelligence.Feature.TEXT_DETECTION]

operation = video_client.annotate_video(
    request={"features": features, "input_uri": input_uri}
)

print("\nProcessing video for text detection.")
result = operation.result(timeout=600)

# The first result is retrieved because a single video was processed.
annotation_result = result.annotation_results[0]

for text_annotation in annotation_result.text_annotations:
    print("\nText: {}".format(text_annotation.text))

    # Get the first text segment
    text_segment = text_annotation.segments[0]
    start_time = text_segment.segment.start_time_offset
    end_time = text_segment.segment.end_time_offset
    print(
        "start_time: {}, end_time: {}".format(
            start_time.seconds + start_time.microseconds * 1e-6,
            end_time.seconds + end_time.microseconds * 1e-6,
        )
    )

    print("Confidence: {}".format(text_segment.confidence))

    # Show the result for the first frame in this segment.
    frame = text_segment.frames[0]
    time_offset = frame.time_offset
    print(
        "Time offset for the first frame: {}".format(
            time_offset.seconds + time_offset.microseconds * 1e-6
        )
    )
    print("Rotated Bounding Box Vertices:")
    for vertex in frame.rotated_bounding_box.vertices:
        print("\tVertex.x: {}, Vertex.y: {}".format(vertex.x, vertex.y))

Linguaggi aggiuntivi

C#: segui le istruzioni di configurazione per C# riportate nella pagina delle librerie client e consulta la documentazione di riferimento di Video Intelligence per .NET.

PHP Segui le Istruzioni per la configurazione dei file PHP Nella pagina delle librerie client e poi visita Documentazione di riferimento di Video Intelligence per PHP.

Ruby: Segui le Istruzioni per la configurazione di Ruby Nella pagina delle librerie client e poi visita Documentazione di riferimento di Video Intelligence per Ruby.

Richiedere il rilevamento del testo per il video da un file locale

Gli esempi seguenti dimostrano il rilevamento del testo su un file archiviato localmente.

REST

Inviare una richiesta di annotazione video

Per eseguire l'annotazione su un file video locale, assicurati di codificare in base64 i contenuti del file video. Includi i contenuti codificati in base64 nel campo inputContent della richiesta. Per informazioni su come per codificare il contenuto di un file video in base64, consulta Codifica Base64.

Di seguito è riportato un esempio di come inviare una richiesta POST al metodo videos:annotate. L'esempio utilizza Google Cloud CLI per creare un token di accesso. Per istruzioni su come installare Google Cloud CLI, consulta la guida rapida all'API Video Intelligence

Prima di utilizzare i dati della richiesta, effettua le seguenti sostituzioni:

"inputContent": BASE64_ENCODED_CONTENT
Ad esempio:
"UklGRg41AwBBVkkgTElTVAwBAABoZHJsYXZpaDgAAAA1ggAAxPMBAAAAAAAQCAA..."
LANGUAGE_CODE: [Facoltativo] ad esempio "it-IT"
PROJECT_NUMBER: l'identificatore numerico del tuo progetto Google Cloud

Metodo HTTP e URL:

POST https://videointelligence.googleapis.com/v1/videos:annotate

Corpo JSON della richiesta:

{
  "inputContent": "BASE64_ENCODED_CONTENT",
  "features": ["TEXT_DETECTION"],
  "videoContext": {
    "textDetectionConfig": {
      "languageHints": ["LANGUAGE_CODE"]
    }
  }
}

Per inviare la richiesta, espandi una di queste opzioni:

curl (Linux, macOS o Cloud Shell)

Salva il corpo della richiesta in un file denominato request.json. ed esegui questo comando:

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: PROJECT_NUMBER" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://videointelligence.googleapis.com/v1/videos:annotate"

PowerShell (Windows)

Salva il corpo della richiesta in un file denominato request.json. ed esegui questo comando:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_NUMBER" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://videointelligence.googleapis.com/v1/videos:annotate" | Select-Object -Expand Content

Dovresti ricevere una risposta JSON simile alla seguente:

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/operations/OPERATION_ID"
}

Se la risposta è positiva, l'API Video Intelligence restituisce il name della tua operazione. Quanto sopra mostra un esempio di questa risposta, in cui project-number è il nome del progetto, mentre operation-id è l'ID un'operazione a lunga esecuzione creata per la richiesta.

OPERATION_ID: fornito nella risposta quando hai avviato la operativa, ad esempio 12345...

Recuperare i risultati dell'annotazione

Per recuperare il risultato dell'operazione, effettua una richiesta GET, utilizzando il nome dell'operazione restituito dalla chiamata a videos:annotate, come mostrato nell'esempio seguente.

Prima di utilizzare i dati della richiesta, apporta le seguenti sostituzioni:

PROJECT_NUMBER: l'identificatore numerico del tuo progetto Google Cloud

Metodo HTTP e URL:

GET https://videointelligence.googleapis.com/v1/OPERATION_NAME

Per inviare la richiesta, espandi una delle seguenti opzioni:

curl (Linux, macOS o Cloud Shell)

Esegui questo comando:

curl -X GET \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: PROJECT_NUMBER" \
     "https://videointelligence.googleapis.com/v1/OPERATION_NAME"

PowerShell (Windows)

Esegui questo comando:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_NUMBER" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://videointelligence.googleapis.com/v1/OPERATION_NAME" | Select-Object -Expand Content

Dovresti ricevere una risposta JSON simile alla seguente:

Risposta

"textAnnotations": [
  {
    "text": "Hair Salon",
    "segments": [
      {
        "segment": {
          "startTimeOffset": "0.833333s",
          "endTimeOffset": "2.291666s"
        },
        "confidence": 0.99438506,
        "frames": [
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.7015625,
                  "y": 0.59583336
                },
                {
                  "x": 0.7984375,
                  "y": 0.59583336
                },
                {
                  "x": 0.7984375,
                  "y": 0.64166665
                },
                {
                  "x": 0.7015625,
                  "y": 0.64166665
                }
              ]
            },
            "timeOffset": "0.833333s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "1.041666s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "1.250s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6319444
                },
                {
                  "x": 0.70234376,
                  "y": 0.6319444
                }
              ]
            },
            "timeOffset": "1.458333s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "1.666666s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "1.875s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "2.083333s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "2.291666s"
          }
        ]
      }
    ]
  },
  {
    "text": "\"Sure, give me one second.\"",
    "segments": [
      {
        "segment": {
          "startTimeOffset": "10.625s",
          "endTimeOffset": "13.333333s"
        },
        "confidence": 0.98716676,
        "frames": [
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.60859376,
                  "y": 0.59583336
                },
                {
                  "x": 0.8952959,
                  "y": 0.5903528
                },
                {
                  "x": 0.89560676,
                  "y": 0.6417387
                },
                {
                  "x": 0.60890454,
                  "y": 0.64721924
                }
              ]
            },
            "timeOffset": "10.625s"
          },
  ...

    ]
}

Le annotazioni per il rilevamento del testo vengono restituite come elenco textAnnotations. Nota: il campo done viene restituito solo quando il valore è True. Non è incluso nelle risposte per cui l'operazione non è stata completata.

Go


import (
	"context"
	"fmt"
	"io"
	"io/ioutil"

	video "cloud.google.com/go/videointelligence/apiv1"
	videopb "cloud.google.com/go/videointelligence/apiv1/videointelligencepb"
	"github.com/golang/protobuf/ptypes"
)

// textDetection analyzes a video and extracts the text from the video's audio.
func textDetection(w io.Writer, filename string) error {
	// filename := "../testdata/googlework_short.mp4"

	ctx := context.Background()

	// Creates a client.
	client, err := video.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("video.NewClient: %w", err)
	}
	defer client.Close()

	fileBytes, err := ioutil.ReadFile(filename)
	if err != nil {
		return fmt.Errorf("ioutil.ReadFile: %w", err)
	}

	op, err := client.AnnotateVideo(ctx, &videopb.AnnotateVideoRequest{
		InputContent: fileBytes,
		Features: []videopb.Feature{
			videopb.Feature_TEXT_DETECTION,
		},
	})
	if err != nil {
		return fmt.Errorf("AnnotateVideo: %w", err)
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		return fmt.Errorf("Wait: %w", err)
	}

	// Only one video was processed, so get the first result.
	result := resp.GetAnnotationResults()[0]

	for _, annotation := range result.TextAnnotations {
		fmt.Fprintf(w, "Text: %q\n", annotation.GetText())

		// Get the first text segment.
		segment := annotation.GetSegments()[0]
		start, _ := ptypes.Duration(segment.GetSegment().GetStartTimeOffset())
		end, _ := ptypes.Duration(segment.GetSegment().GetEndTimeOffset())
		fmt.Fprintf(w, "\tSegment: %v to %v\n", start, end)

		fmt.Fprintf(w, "\tConfidence: %f\n", segment.GetConfidence())

		// Show the result for the first frame in this segment.
		frame := segment.GetFrames()[0]
		seconds := float32(frame.GetTimeOffset().GetSeconds())
		nanos := float32(frame.GetTimeOffset().GetNanos())
		fmt.Fprintf(w, "\tTime offset of the first frame: %fs\n", seconds+nanos/1e9)

		fmt.Fprintf(w, "\tRotated bounding box vertices:\n")
		for _, vertex := range frame.GetRotatedBoundingBox().GetVertices() {
			fmt.Fprintf(w, "\t\tVertex x=%f, y=%f\n", vertex.GetX(), vertex.GetY())
		}
	}

	return nil
}

Java

/**
 * Detect text in a video.
 *
 * @param filePath the path to the video file to analyze.
 */
public static VideoAnnotationResults detectText(String filePath) throws Exception {
  try (VideoIntelligenceServiceClient client = VideoIntelligenceServiceClient.create()) {
    // Read file
    Path path = Paths.get(filePath);
    byte[] data = Files.readAllBytes(path);

    // Create the request
    AnnotateVideoRequest request =
        AnnotateVideoRequest.newBuilder()
            .setInputContent(ByteString.copyFrom(data))
            .addFeatures(Feature.TEXT_DETECTION)
            .build();

    // asynchronously perform object tracking on videos
    OperationFuture<AnnotateVideoResponse, AnnotateVideoProgress> future =
        client.annotateVideoAsync(request);

    System.out.println("Waiting for operation to complete...");
    // The first result is retrieved because a single video was processed.
    AnnotateVideoResponse response = future.get(300, TimeUnit.SECONDS);
    VideoAnnotationResults results = response.getAnnotationResults(0);

    // Get only the first annotation for demo purposes.
    TextAnnotation annotation = results.getTextAnnotations(0);
    System.out.println("Text: " + annotation.getText());

    // Get the first text segment.
    TextSegment textSegment = annotation.getSegments(0);
    System.out.println("Confidence: " + textSegment.getConfidence());
    // For the text segment display it's time offset
    VideoSegment videoSegment = textSegment.getSegment();
    Duration startTimeOffset = videoSegment.getStartTimeOffset();
    Duration endTimeOffset = videoSegment.getEndTimeOffset();
    // Display the offset times in seconds, 1e9 is part of the formula to convert nanos to seconds
    System.out.println(
        String.format(
            "Start time: %.2f", startTimeOffset.getSeconds() + startTimeOffset.getNanos() / 1e9));
    System.out.println(
        String.format(
            "End time: %.2f", endTimeOffset.getSeconds() + endTimeOffset.getNanos() / 1e9));

    // Show the first result for the first frame in the segment.
    TextFrame textFrame = textSegment.getFrames(0);
    Duration timeOffset = textFrame.getTimeOffset();
    System.out.println(
        String.format(
            "Time offset for the first frame: %.2f",
            timeOffset.getSeconds() + timeOffset.getNanos() / 1e9));

    // Display the rotated bounding box for where the text is on the frame.
    System.out.println("Rotated Bounding Box Vertices:");
    List<NormalizedVertex> vertices = textFrame.getRotatedBoundingBox().getVerticesList();
    for (NormalizedVertex normalizedVertex : vertices) {
      System.out.println(
          String.format(
              "\tVertex.x: %.2f, Vertex.y: %.2f",
              normalizedVertex.getX(), normalizedVertex.getY()));
    }
    return results;
  }
}

Node.js

Per autenticarti a Video Intelligence, configura le credenziali predefinite dell'applicazione. Per maggiori informazioni, consulta Configurare l'autenticazione per un ambiente di sviluppo locale.

// Imports the Google Cloud Video Intelligence library + Node's fs library
const Video = require('@google-cloud/video-intelligence');
const fs = require('fs');
const util = require('util');
// Creates a client
const video = new Video.VideoIntelligenceServiceClient();

/**
 * TODO(developer): Uncomment the following line before running the sample.
 */
// const path = 'Local file to analyze, e.g. ./my-file.mp4';

// Reads a local video file and converts it to base64
const file = await util.promisify(fs.readFile)(path);
const inputContent = file.toString('base64');

const request = {
  inputContent: inputContent,
  features: ['TEXT_DETECTION'],
};
// Detects text in a video
const [operation] = await video.annotateVideo(request);
const results = await operation.promise();
console.log('Waiting for operation to complete...');

// Gets annotations for video
const textAnnotations = results[0].annotationResults[0].textAnnotations;
textAnnotations.forEach(textAnnotation => {
  console.log(`Text ${textAnnotation.text} occurs at:`);
  textAnnotation.segments.forEach(segment => {
    const time = segment.segment;
    if (time.startTimeOffset.seconds === undefined) {
      time.startTimeOffset.seconds = 0;
    }
    if (time.startTimeOffset.nanos === undefined) {
      time.startTimeOffset.nanos = 0;
    }
    if (time.endTimeOffset.seconds === undefined) {
      time.endTimeOffset.seconds = 0;
    }
    if (time.endTimeOffset.nanos === undefined) {
      time.endTimeOffset.nanos = 0;
    }
    console.log(
      `\tStart: ${time.startTimeOffset.seconds || 0}` +
        `.${(time.startTimeOffset.nanos / 1e6).toFixed(0)}s`
    );
    console.log(
      `\tEnd: ${time.endTimeOffset.seconds || 0}.` +
        `${(time.endTimeOffset.nanos / 1e6).toFixed(0)}s`
    );
    console.log(`\tConfidence: ${segment.confidence}`);
    segment.frames.forEach(frame => {
      const timeOffset = frame.timeOffset;
      console.log(
        `Time offset for the frame: ${timeOffset.seconds || 0}` +
          `.${(timeOffset.nanos / 1e6).toFixed(0)}s`
      );
      console.log('Rotated Bounding Box Vertices:');
      frame.rotatedBoundingBox.vertices.forEach(vertex => {
        console.log(`Vertex.x:${vertex.x}, Vertex.y:${vertex.y}`);
      });
    });
  });
});

Python

import io

from google.cloud import videointelligence

def video_detect_text(path):
    """Detect text in a local video."""
    video_client = videointelligence.VideoIntelligenceServiceClient()
    features = [videointelligence.Feature.TEXT_DETECTION]
    video_context = videointelligence.VideoContext()

    with io.open(path, "rb") as file:
        input_content = file.read()

    operation = video_client.annotate_video(
        request={
            "features": features,
            "input_content": input_content,
            "video_context": video_context,
        }
    )

    print("\nProcessing video for text detection.")
    result = operation.result(timeout=300)

    # The first result is retrieved because a single video was processed.
    annotation_result = result.annotation_results[0]

    for text_annotation in annotation_result.text_annotations:
        print("\nText: {}".format(text_annotation.text))

        # Get the first text segment
        text_segment = text_annotation.segments[0]
        start_time = text_segment.segment.start_time_offset
        end_time = text_segment.segment.end_time_offset
        print(
            "start_time: {}, end_time: {}".format(
                start_time.seconds + start_time.microseconds * 1e-6,
                end_time.seconds + end_time.microseconds * 1e-6,
            )
        )

        print("Confidence: {}".format(text_segment.confidence))

        # Show the result for the first frame in this segment.
        frame = text_segment.frames[0]
        time_offset = frame.time_offset
        print(
            "Time offset for the first frame: {}".format(
                time_offset.seconds + time_offset.microseconds * 1e-6
            )
        )
        print("Rotated Bounding Box Vertices:")
        for vertex in frame.rotated_bounding_box.vertices:
            print("\tVertex.x: {}, Vertex.y: {}".format(vertex.x, vertex.y))

Linguaggi aggiuntivi

C#: segui le istruzioni di configurazione per C# nella pagina delle librerie client e poi consulta la documentazione di riferimento di Video Intelligence per .NET.

PHP: segui le istruzioni di configurazione di PHP riportate nella pagina delle librerie client e consulta la documentazione di riferimento di Video Intelligence per PHP.

Ruby: segui le istruzioni di configurazione di Ruby nella pagina delle librerie client e poi consulta la documentazione di riferimento di Video Intelligence per Ruby.