Diese Seite wurde von der Cloud Translation API übersetzt.

Text erkennen

Bei der Texterkennung wird eine Optische Zeichenerkennung (OCR) durchgeführt, die Text in einem Eingabevideo erkennt und extrahiert.

Die Texterkennung ist für alle Sprachen verfügbar, die von der Cloud Vision API unterstützt werden.

Texterkennung für ein Video in Google Cloud Storage anfordern

Die folgenden Beispiele zeigen die Texterkennung für eine Datei in Cloud Storage.

REST

Anfrage zur Annotation eines Videos senden

Im Folgenden wird gezeigt, wie eine POST-Anfrage an die Methode videos:annotate gesendet wird. In diesem Beispiel wird die Google Cloud CLI verwendet, um ein Zugriffstoken zu erstellen. Eine Anleitung zur Installation der gcloud CLI finden Sie in der Kurzanleitung zur Video Intelligence API.

Ersetzen Sie diese Werte in den folgenden Anfragedaten:

INPUT_URI: Ein Cloud Storage-Bucket, der die Datei enthält, die Sie annotieren möchten, einschließlich des Dateinamens. Muss mit gs:// beginnen.
Beispiel: "inputUri": "gs://cloud-videointelligence-demo/assistant.mp4",
LANGUAGE_CODE: [Optional] Beispiel: "en-US"
PROJECT_NUMBER: Die numerische Kennung für Ihr Google Cloud-Projekt

HTTP-Methode und URL:

POST https://videointelligence.googleapis.com/v1/videos:annotate

JSON-Text anfordern:

{
  "inputUri": "INPUT_URI",
  "features": ["TEXT_DETECTION"],
  "videoContext": {
    "textDetectionConfig": {
      "languageHints": ["LANGUAGE_CODE"]
    }
  }
}

Wenn Sie die Anfrage senden möchten, maximieren Sie eine der folgenden Optionen:

curl (Linux, macOS oder Cloud Shell)

Hinweis: Der folgende Befehl setzt voraus, dass Sie sich mit Ihrem Nutzerkonto bei der gcloud-Befehlszeile angemeldet haben. Dazu haben Sie gcloud init oder gcloud auth login ausgeführt oder die Cloud Shell genutzt, die Sie automatisch bei der gcloud-Befehlszeile anmeldet. Um herauszufinden, welches Konto gerade aktiv ist, führen Sie gcloud auth list aus.

Speichern Sie den Anfragetext in einer Datei mit dem Namen request.json und führen Sie den folgenden Befehl aus:

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: PROJECT_NUMBER" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://videointelligence.googleapis.com/v1/videos:annotate"

PowerShell (Windows)

Hinweis: Der folgende Befehl setzt voraus, dass Sie sich mit Ihrem Nutzerkonto bei der gcloud-Befehlszeile angemeldet haben. Dazu führen Sie gcloud init oder gcloud auth login aus. Um herauszufinden, welches Konto gerade aktiv ist, führen Sie gcloud auth list aus.

Speichern Sie den Anfragetext in einer Datei mit dem Namen request.json und führen Sie den folgenden Befehl aus:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_NUMBER" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://videointelligence.googleapis.com/v1/videos:annotate" | Select-Object -Expand Content

Sie sollten eine JSON-Antwort ähnlich wie diese erhalten:

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/operations/OPERATION_ID"
}

Wenn die Antwort erfolgreich ist, gibt die Video Intelligence API den name für Ihren Vorgang zurück. Das Beispiel oben zeigt eine solche Antwort, wobei project-number die Nummer Ihres Projekts und operation-id die ID des lang andauernden Vorgangs ist, der für die Anfrage erstellt wurde.

PROJECT_NUMBER: Die Nummer Ihres Projekts
LOCATION_ID: Die Cloud-Region, in der die Annotation stattfinden soll. Unterstützte Cloud-Regionen sind: us-east1, us-west1, europe-west1, asia-east1. Wenn keine Region angegeben ist, wird eine Region basierend auf dem Speicherort der Videodatei festgelegt.
OPERATION_ID: Die ID des lang andauernden Vorgangs, der für die Anfrage erstellt und in der Antwort beim Start des Vorgangs angegeben wurde, z. B. 12345...

Ruft Annotationsergebnisse ab

Um das Ergebnis des Vorgangs abzurufen, führen Sie eine GET-Anfrage mithilfe des Vorgangsaufrufs, der vom Aufruf an Videos:Annotieren zurückgegeben wurde, wie im folgenden Beispiel gezeigt.

Ersetzen Sie diese Werte in den folgenden Anfragedaten:

OPERATION_NAME: Der von der Video Intelligence API zurückgegebene Name des Vorgangs. Der Vorgangsname hat das Format projects/PROJECT_NUMBER/locations/LOCATION_ID/operations/OPERATION_ID.
PROJECT_NUMBER: Die numerische Kennung für Ihr Google Cloud-Projekt

HTTP-Methode und URL:

GET https://videointelligence.googleapis.com/v1/OPERATION_NAME

Wenn Sie die Anfrage senden möchten, maximieren Sie eine der folgenden Optionen:

curl (Linux, macOS oder Cloud Shell)

Führen Sie folgenden Befehl aus:

curl -X GET \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: PROJECT_NUMBER" \
     "https://videointelligence.googleapis.com/v1/OPERATION_NAME"

PowerShell (Windows)

Führen Sie folgenden Befehl aus:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_NUMBER" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://videointelligence.googleapis.com/v1/OPERATION_NAME" | Select-Object -Expand Content

Sie sollten eine JSON-Antwort ähnlich wie diese erhalten:

Antwort

"textAnnotations": [
  {
    "text": "Hair Salon",
    "segments": [
      {
        "segment": {
          "startTimeOffset": "0.833333s",
          "endTimeOffset": "2.291666s"
        },
        "confidence": 0.99438506,
        "frames": [
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.7015625,
                  "y": 0.59583336
                },
                {
                  "x": 0.7984375,
                  "y": 0.59583336
                },
                {
                  "x": 0.7984375,
                  "y": 0.64166665
                },
                {
                  "x": 0.7015625,
                  "y": 0.64166665
                }
              ]
            },
            "timeOffset": "0.833333s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "1.041666s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "1.250s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6319444
                },
                {
                  "x": 0.70234376,
                  "y": 0.6319444
                }
              ]
            },
            "timeOffset": "1.458333s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "1.666666s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "1.875s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "2.083333s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "2.291666s"
          }
        ]
      }
    ]
  },
  {
    "text": "\"Sure, give me one second.\"",
    "segments": [
      {
        "segment": {
          "startTimeOffset": "10.625s",
          "endTimeOffset": "13.333333s"
        },
        "confidence": 0.98716676,
        "frames": [
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.60859376,
                  "y": 0.59583336
                },
                {
                  "x": 0.8952959,
                  "y": 0.5903528
                },
                {
                  "x": 0.89560676,
                  "y": 0.6417387
                },
                {
                  "x": 0.60890454,
                  "y": 0.64721924
                }
              ]
            },
            "timeOffset": "10.625s"
          },
  ...

    ]
  }

Anmerkungen der Texterkennung werden in der Liste textAnnotations zurückgegeben. Hinweis: Das Feld done wird nur zurückgegeben, wenn sein Wert True ist. Es ist nicht in Antworten enthalten, für die der Vorgang nicht abgeschlossen wurde.

Annotationsergebnisse herunterladen

Kopieren Sie die Annotation aus der Quelle in den Ziel-Bucket (siehe Dateien und Objekte kopieren)

gcloud storage cp gcs_uri gs://my-bucket

Hinweis: Wenn der Nutzer den Ausgabe-gcs-URI vom Nutzer bereitstellt, wird die Annotation in diesem gcs-uri gespeichert.

Go


import (
	"context"
	"fmt"
	"io"

	video "cloud.google.com/go/videointelligence/apiv1"
	videopb "cloud.google.com/go/videointelligence/apiv1/videointelligencepb"
	"github.com/golang/protobuf/ptypes"
)

// textDetectionGCS analyzes a video and extracts the text from the video's audio.
func textDetectionGCS(w io.Writer, gcsURI string) error {
	// gcsURI := "gs://python-docs-samples-tests/video/googlework_short.mp4"

	ctx := context.Background()

	// Creates a client.
	client, err := video.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("video.NewClient: %w", err)
	}
	defer client.Close()

	op, err := client.AnnotateVideo(ctx, &videopb.AnnotateVideoRequest{
		InputUri: gcsURI,
		Features: []videopb.Feature{
			videopb.Feature_TEXT_DETECTION,
		},
	})
	if err != nil {
		return fmt.Errorf("AnnotateVideo: %w", err)
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		return fmt.Errorf("Wait: %w", err)
	}

	// Only one video was processed, so get the first result.
	result := resp.GetAnnotationResults()[0]

	for _, annotation := range result.TextAnnotations {
		fmt.Fprintf(w, "Text: %q\n", annotation.GetText())

		// Get the first text segment.
		segment := annotation.GetSegments()[0]
		start, _ := ptypes.Duration(segment.GetSegment().GetStartTimeOffset())
		end, _ := ptypes.Duration(segment.GetSegment().GetEndTimeOffset())
		fmt.Fprintf(w, "\tSegment: %v to %v\n", start, end)

		fmt.Fprintf(w, "\tConfidence: %f\n", segment.GetConfidence())

		// Show the result for the first frame in this segment.
		frame := segment.GetFrames()[0]
		seconds := float32(frame.GetTimeOffset().GetSeconds())
		nanos := float32(frame.GetTimeOffset().GetNanos())
		fmt.Fprintf(w, "\tTime offset of the first frame: %fs\n", seconds+nanos/1e9)

		fmt.Fprintf(w, "\tRotated bounding box vertices:\n")
		for _, vertex := range frame.GetRotatedBoundingBox().GetVertices() {
			fmt.Fprintf(w, "\t\tVertex x=%f, y=%f\n", vertex.GetX(), vertex.GetY())
		}
	}

	return nil
}

Java

Richten Sie zur Authentifizierung bei Video Intelligence die Standardanmeldedaten für Anwendungen ein. Weitere Informationen finden Sie unter Authentifizierung für eine lokale Entwicklungsumgebung einrichten.

/**
 * Detect Text in a video.
 *
 * @param gcsUri the path to the video file to analyze.
 */
public static VideoAnnotationResults detectTextGcs(String gcsUri) throws Exception {
  try (VideoIntelligenceServiceClient client = VideoIntelligenceServiceClient.create()) {
    // Create the request
    AnnotateVideoRequest request =
        AnnotateVideoRequest.newBuilder()
            .setInputUri(gcsUri)
            .addFeatures(Feature.TEXT_DETECTION)
            .build();

    // asynchronously perform object tracking on videos
    OperationFuture<AnnotateVideoResponse, AnnotateVideoProgress> future =
        client.annotateVideoAsync(request);

    System.out.println("Waiting for operation to complete...");
    // The first result is retrieved because a single video was processed.
    AnnotateVideoResponse response = future.get(300, TimeUnit.SECONDS);
    VideoAnnotationResults results = response.getAnnotationResults(0);

    // Get only the first annotation for demo purposes.
    TextAnnotation annotation = results.getTextAnnotations(0);
    System.out.println("Text: " + annotation.getText());

    // Get the first text segment.
    TextSegment textSegment = annotation.getSegments(0);
    System.out.println("Confidence: " + textSegment.getConfidence());
    // For the text segment display it's time offset
    VideoSegment videoSegment = textSegment.getSegment();
    Duration startTimeOffset = videoSegment.getStartTimeOffset();
    Duration endTimeOffset = videoSegment.getEndTimeOffset();
    // Display the offset times in seconds, 1e9 is part of the formula to convert nanos to seconds
    System.out.println(
        String.format(
            "Start time: %.2f", startTimeOffset.getSeconds() + startTimeOffset.getNanos() / 1e9));
    System.out.println(
        String.format(
            "End time: %.2f", endTimeOffset.getSeconds() + endTimeOffset.getNanos() / 1e9));

    // Show the first result for the first frame in the segment.
    TextFrame textFrame = textSegment.getFrames(0);
    Duration timeOffset = textFrame.getTimeOffset();
    System.out.println(
        String.format(
            "Time offset for the first frame: %.2f",
            timeOffset.getSeconds() + timeOffset.getNanos() / 1e9));

    // Display the rotated bounding box for where the text is on the frame.
    System.out.println("Rotated Bounding Box Vertices:");
    List<NormalizedVertex> vertices = textFrame.getRotatedBoundingBox().getVerticesList();
    for (NormalizedVertex normalizedVertex : vertices) {
      System.out.println(
          String.format(
              "\tVertex.x: %.2f, Vertex.y: %.2f",
              normalizedVertex.getX(), normalizedVertex.getY()));
    }
    return results;
  }
}

Node.js

// Imports the Google Cloud Video Intelligence library
const Video = require('@google-cloud/video-intelligence');
// Creates a client
const video = new Video.VideoIntelligenceServiceClient();

/**
 * TODO(developer): Uncomment the following line before running the sample.
 */
// const gcsUri = 'GCS URI of the video to analyze, e.g. gs://my-bucket/my-video.mp4';

const request = {
  inputUri: gcsUri,
  features: ['TEXT_DETECTION'],
};
// Detects text in a video
const [operation] = await video.annotateVideo(request);
const results = await operation.promise();
console.log('Waiting for operation to complete...');
// Gets annotations for video
const textAnnotations = results[0].annotationResults[0].textAnnotations;
textAnnotations.forEach(textAnnotation => {
  console.log(`Text ${textAnnotation.text} occurs at:`);
  textAnnotation.segments.forEach(segment => {
    const time = segment.segment;
    console.log(
      ` Start: ${time.startTimeOffset.seconds || 0}.${(
        time.startTimeOffset.nanos / 1e6
      ).toFixed(0)}s`
    );
    console.log(
      ` End: ${time.endTimeOffset.seconds || 0}.${(
        time.endTimeOffset.nanos / 1e6
      ).toFixed(0)}s`
    );
    console.log(` Confidence: ${segment.confidence}`);
    segment.frames.forEach(frame => {
      const timeOffset = frame.timeOffset;
      console.log(
        `Time offset for the frame: ${timeOffset.seconds || 0}` +
          `.${(timeOffset.nanos / 1e6).toFixed(0)}s`
      );
      console.log('Rotated Bounding Box Vertices:');
      frame.rotatedBoundingBox.vertices.forEach(vertex => {
        console.log(`Vertex.x:${vertex.x}, Vertex.y:${vertex.y}`);
      });
    });
  });
});

Python

"""Detect text in a video stored on GCS."""
from google.cloud import videointelligence

video_client = videointelligence.VideoIntelligenceServiceClient()
features = [videointelligence.Feature.TEXT_DETECTION]

operation = video_client.annotate_video(
    request={"features": features, "input_uri": input_uri}
)

print("\nProcessing video for text detection.")
result = operation.result(timeout=600)

# The first result is retrieved because a single video was processed.
annotation_result = result.annotation_results[0]

for text_annotation in annotation_result.text_annotations:
    print("\nText: {}".format(text_annotation.text))

    # Get the first text segment
    text_segment = text_annotation.segments[0]
    start_time = text_segment.segment.start_time_offset
    end_time = text_segment.segment.end_time_offset
    print(
        "start_time: {}, end_time: {}".format(
            start_time.seconds + start_time.microseconds * 1e-6,
            end_time.seconds + end_time.microseconds * 1e-6,
        )
    )

    print("Confidence: {}".format(text_segment.confidence))

    # Show the result for the first frame in this segment.
    frame = text_segment.frames[0]
    time_offset = frame.time_offset
    print(
        "Time offset for the first frame: {}".format(
            time_offset.seconds + time_offset.microseconds * 1e-6
        )
    )
    print("Rotated Bounding Box Vertices:")
    for vertex in frame.rotated_bounding_box.vertices:
        print("\tVertex.x: {}, Vertex.y: {}".format(vertex.x, vertex.y))

Weitere Sprachen

C#: Folgen Sie der Anleitung zur Einrichtung von C# auf der Seite der Clientbibliotheken und rufen Sie dann die Video Intelligence-Referenzdokumentation für .NET auf.

PHP: Folgen Sie der Anleitung zur Einrichtung von PHP auf der Seite der Clientbibliotheken und rufen Sie dann die Video Intelligence-Referenzdokumentation für PHP auf.

Ruby: Folgen Sie der Anleitung zur Einrichtung von Ruby auf der Seite der Clientbibliotheken und rufen Sie dann die Video Intelligence-Referenzdokumentation für Ruby auf.

Texterkennung für ein Video aus einer lokalen Datei anfordern

Die folgenden Beispiele zeigen die Texterkennung für eine lokal gespeicherte Datei.

REST

Anfrage zur Annotation eines Videos senden

Wenn Sie in einer lokalen Videodatei Annotationen erstellen möchten, codieren Sie den Inhalt der Videodatei mit Base64. Fügen Sie den Base64-codierten Inhalt in das Feld inputContent der Anfrage ein. Informationen zum Base64-Codieren des Inhalts einer Videodatei finden Sie unter Base64-Codierung.

Das folgende Beispiel zeigt, wie Sie eine POST-Anfrage an die Methode videos:annotate senden. In diesem Beispiel wird die Google Cloud CLI verwendet, um ein Zugriffstoken zu erstellen. Eine Anleitung zur Installation der Google Cloud CLI finden Sie in der Kurzanleitung zur Video Intelligence API.

Ersetzen Sie diese Werte in den folgenden Anfragedaten:

"inputContent": BASE64_ENCODED_CONTENT
Beispiel:
"UklGRg41AwBBVkkgTElTVAwBAABoZHJsYXZpaDgAAAA1ggAAxPMBAAAAAAAQCAA..."
LANGUAGE_CODE: [Optional] Beispiel: "en-US"
PROJECT_NUMBER: Die numerische Kennung für Ihr Google Cloud-Projekt

HTTP-Methode und URL:

POST https://videointelligence.googleapis.com/v1/videos:annotate

JSON-Text anfordern:

{
  "inputContent": "BASE64_ENCODED_CONTENT",
  "features": ["TEXT_DETECTION"],
  "videoContext": {
    "textDetectionConfig": {
      "languageHints": ["LANGUAGE_CODE"]
    }
  }
}

Wenn Sie die Anfrage senden möchten, maximieren Sie eine der folgenden Optionen:

curl (Linux, macOS oder Cloud Shell)

Speichern Sie den Anfragetext in einer Datei mit dem Namen request.json und führen Sie den folgenden Befehl aus:

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: PROJECT_NUMBER" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://videointelligence.googleapis.com/v1/videos:annotate"

PowerShell (Windows)

Speichern Sie den Anfragetext in einer Datei mit dem Namen request.json und führen Sie den folgenden Befehl aus:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_NUMBER" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://videointelligence.googleapis.com/v1/videos:annotate" | Select-Object -Expand Content

Sie sollten eine JSON-Antwort ähnlich wie diese erhalten:

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/operations/OPERATION_ID"
}

Wenn die Antwort erfolgreich ist, gibt die Video Intelligence API den name Ihres Vorgang zurück. Das Beispiel oben zeigt eine solche Antwort, wobei project-number der Name Ihres Projekts und operation-id die ID des lang andauernden Vorgangs ist, der für die Anfrage erstellt wurde.

OPERATION_ID: Wird in der Antwort beim Start des Vorgangs angegeben, z. B. 12345...

Ruft Annotationsergebnisse ab

Um das Ergebnis des Vorgangs abzurufen, führen Sie eine GET-Anfrage mithilfe des Vorgangsaufrufs, der vom Aufruf an Videos:Annotieren zurückgegeben wurde, wie im folgenden Beispiel gezeigt.

Ersetzen Sie diese Werte in den folgenden Anfragedaten:

PROJECT_NUMBER: Die numerische Kennung für Ihr Google Cloud-Projekt

HTTP-Methode und URL:

GET https://videointelligence.googleapis.com/v1/OPERATION_NAME

Wenn Sie die Anfrage senden möchten, maximieren Sie eine der folgenden Optionen:

curl (Linux, macOS oder Cloud Shell)

Führen Sie folgenden Befehl aus:

curl -X GET \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: PROJECT_NUMBER" \
     "https://videointelligence.googleapis.com/v1/OPERATION_NAME"

PowerShell (Windows)

Führen Sie folgenden Befehl aus:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_NUMBER" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://videointelligence.googleapis.com/v1/OPERATION_NAME" | Select-Object -Expand Content

Sie sollten eine JSON-Antwort ähnlich wie diese erhalten:

Antwort

"textAnnotations": [
  {
    "text": "Hair Salon",
    "segments": [
      {
        "segment": {
          "startTimeOffset": "0.833333s",
          "endTimeOffset": "2.291666s"
        },
        "confidence": 0.99438506,
        "frames": [
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.7015625,
                  "y": 0.59583336
                },
                {
                  "x": 0.7984375,
                  "y": 0.59583336
                },
                {
                  "x": 0.7984375,
                  "y": 0.64166665
                },
                {
                  "x": 0.7015625,
                  "y": 0.64166665
                }
              ]
            },
            "timeOffset": "0.833333s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "1.041666s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "1.250s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6319444
                },
                {
                  "x": 0.70234376,
                  "y": 0.6319444
                }
              ]
            },
            "timeOffset": "1.458333s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "1.666666s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "1.875s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "2.083333s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "2.291666s"
          }
        ]
      }
    ]
  },
  {
    "text": "\"Sure, give me one second.\"",
    "segments": [
      {
        "segment": {
          "startTimeOffset": "10.625s",
          "endTimeOffset": "13.333333s"
        },
        "confidence": 0.98716676,
        "frames": [
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.60859376,
                  "y": 0.59583336
                },
                {
                  "x": 0.8952959,
                  "y": 0.5903528
                },
                {
                  "x": 0.89560676,
                  "y": 0.6417387
                },
                {
                  "x": 0.60890454,
                  "y": 0.64721924
                }
              ]
            },
            "timeOffset": "10.625s"
          },
  ...

    ]
}

Annotationen der Texterkennung werden in der Liste textAnnotations zurückgegeben. Hinweis: Das Feld done wird nur zurückgegeben, wenn sein Wert True ist. Es ist nicht in Antworten enthalten, für die der Vorgang nicht abgeschlossen wurde.

Go


import (
	"context"
	"fmt"
	"io"
	"os"

	video "cloud.google.com/go/videointelligence/apiv1"
	videopb "cloud.google.com/go/videointelligence/apiv1/videointelligencepb"
	"github.com/golang/protobuf/ptypes"
)

// textDetection analyzes a video and extracts the text from the video's audio.
func textDetection(w io.Writer, filename string) error {
	// filename := "../testdata/googlework_short.mp4"

	ctx := context.Background()

	// Creates a client.
	client, err := video.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("video.NewClient: %w", err)
	}
	defer client.Close()

	fileBytes, err := os.ReadFile(filename)
	if err != nil {
		return fmt.Errorf("os.ReadFile: %w", err)
	}

	op, err := client.AnnotateVideo(ctx, &videopb.AnnotateVideoRequest{
		InputContent: fileBytes,
		Features: []videopb.Feature{
			videopb.Feature_TEXT_DETECTION,
		},
	})
	if err != nil {
		return fmt.Errorf("AnnotateVideo: %w", err)
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		return fmt.Errorf("Wait: %w", err)
	}

	// Only one video was processed, so get the first result.
	result := resp.GetAnnotationResults()[0]

	for _, annotation := range result.TextAnnotations {
		fmt.Fprintf(w, "Text: %q\n", annotation.GetText())

		// Get the first text segment.
		segment := annotation.GetSegments()[0]
		start, _ := ptypes.Duration(segment.GetSegment().GetStartTimeOffset())
		end, _ := ptypes.Duration(segment.GetSegment().GetEndTimeOffset())
		fmt.Fprintf(w, "\tSegment: %v to %v\n", start, end)

		fmt.Fprintf(w, "\tConfidence: %f\n", segment.GetConfidence())

		// Show the result for the first frame in this segment.
		frame := segment.GetFrames()[0]
		seconds := float32(frame.GetTimeOffset().GetSeconds())
		nanos := float32(frame.GetTimeOffset().GetNanos())
		fmt.Fprintf(w, "\tTime offset of the first frame: %fs\n", seconds+nanos/1e9)

		fmt.Fprintf(w, "\tRotated bounding box vertices:\n")
		for _, vertex := range frame.GetRotatedBoundingBox().GetVertices() {
			fmt.Fprintf(w, "\t\tVertex x=%f, y=%f\n", vertex.GetX(), vertex.GetY())
		}
	}

	return nil
}

Java

/**
 * Detect text in a video.
 *
 * @param filePath the path to the video file to analyze.
 */
public static VideoAnnotationResults detectText(String filePath) throws Exception {
  try (VideoIntelligenceServiceClient client = VideoIntelligenceServiceClient.create()) {
    // Read file
    Path path = Paths.get(filePath);
    byte[] data = Files.readAllBytes(path);

    // Create the request
    AnnotateVideoRequest request =
        AnnotateVideoRequest.newBuilder()
            .setInputContent(ByteString.copyFrom(data))
            .addFeatures(Feature.TEXT_DETECTION)
            .build();

    // asynchronously perform object tracking on videos
    OperationFuture<AnnotateVideoResponse, AnnotateVideoProgress> future =
        client.annotateVideoAsync(request);

    System.out.println("Waiting for operation to complete...");
    // The first result is retrieved because a single video was processed.
    AnnotateVideoResponse response = future.get(300, TimeUnit.SECONDS);
    VideoAnnotationResults results = response.getAnnotationResults(0);

    // Get only the first annotation for demo purposes.
    TextAnnotation annotation = results.getTextAnnotations(0);
    System.out.println("Text: " + annotation.getText());

    // Get the first text segment.
    TextSegment textSegment = annotation.getSegments(0);
    System.out.println("Confidence: " + textSegment.getConfidence());
    // For the text segment display it's time offset
    VideoSegment videoSegment = textSegment.getSegment();
    Duration startTimeOffset = videoSegment.getStartTimeOffset();
    Duration endTimeOffset = videoSegment.getEndTimeOffset();
    // Display the offset times in seconds, 1e9 is part of the formula to convert nanos to seconds
    System.out.println(
        String.format(
            "Start time: %.2f", startTimeOffset.getSeconds() + startTimeOffset.getNanos() / 1e9));
    System.out.println(
        String.format(
            "End time: %.2f", endTimeOffset.getSeconds() + endTimeOffset.getNanos() / 1e9));

    // Show the first result for the first frame in the segment.
    TextFrame textFrame = textSegment.getFrames(0);
    Duration timeOffset = textFrame.getTimeOffset();
    System.out.println(
        String.format(
            "Time offset for the first frame: %.2f",
            timeOffset.getSeconds() + timeOffset.getNanos() / 1e9));

    // Display the rotated bounding box for where the text is on the frame.
    System.out.println("Rotated Bounding Box Vertices:");
    List<NormalizedVertex> vertices = textFrame.getRotatedBoundingBox().getVerticesList();
    for (NormalizedVertex normalizedVertex : vertices) {
      System.out.println(
          String.format(
              "\tVertex.x: %.2f, Vertex.y: %.2f",
              normalizedVertex.getX(), normalizedVertex.getY()));
    }
    return results;
  }
}

Node.js

// Imports the Google Cloud Video Intelligence library + Node's fs library
const Video = require('@google-cloud/video-intelligence');
const fs = require('fs');
const util = require('util');
// Creates a client
const video = new Video.VideoIntelligenceServiceClient();

/**
 * TODO(developer): Uncomment the following line before running the sample.
 */
// const path = 'Local file to analyze, e.g. ./my-file.mp4';

// Reads a local video file and converts it to base64
const file = await util.promisify(fs.readFile)(path);
const inputContent = file.toString('base64');

const request = {
  inputContent: inputContent,
  features: ['TEXT_DETECTION'],
};
// Detects text in a video
const [operation] = await video.annotateVideo(request);
const results = await operation.promise();
console.log('Waiting for operation to complete...');

// Gets annotations for video
const textAnnotations = results[0].annotationResults[0].textAnnotations;
textAnnotations.forEach(textAnnotation => {
  console.log(`Text ${textAnnotation.text} occurs at:`);
  textAnnotation.segments.forEach(segment => {
    const time = segment.segment;
    if (time.startTimeOffset.seconds === undefined) {
      time.startTimeOffset.seconds = 0;
    }
    if (time.startTimeOffset.nanos === undefined) {
      time.startTimeOffset.nanos = 0;
    }
    if (time.endTimeOffset.seconds === undefined) {
      time.endTimeOffset.seconds = 0;
    }
    if (time.endTimeOffset.nanos === undefined) {
      time.endTimeOffset.nanos = 0;
    }
    console.log(
      `\tStart: ${time.startTimeOffset.seconds || 0}` +
        `.${(time.startTimeOffset.nanos / 1e6).toFixed(0)}s`
    );
    console.log(
      `\tEnd: ${time.endTimeOffset.seconds || 0}.` +
        `${(time.endTimeOffset.nanos / 1e6).toFixed(0)}s`
    );
    console.log(`\tConfidence: ${segment.confidence}`);
    segment.frames.forEach(frame => {
      const timeOffset = frame.timeOffset;
      console.log(
        `Time offset for the frame: ${timeOffset.seconds || 0}` +
          `.${(timeOffset.nanos / 1e6).toFixed(0)}s`
      );
      console.log('Rotated Bounding Box Vertices:');
      frame.rotatedBoundingBox.vertices.forEach(vertex => {
        console.log(`Vertex.x:${vertex.x}, Vertex.y:${vertex.y}`);
      });
    });
  });
});

Python

import io

from google.cloud import videointelligence

def video_detect_text(path):
    """Detect text in a local video."""
    video_client = videointelligence.VideoIntelligenceServiceClient()
    features = [videointelligence.Feature.TEXT_DETECTION]
    video_context = videointelligence.VideoContext()

    with io.open(path, "rb") as file:
        input_content = file.read()

    operation = video_client.annotate_video(
        request={
            "features": features,
            "input_content": input_content,
            "video_context": video_context,
        }
    )

    print("\nProcessing video for text detection.")
    result = operation.result(timeout=300)

    # The first result is retrieved because a single video was processed.
    annotation_result = result.annotation_results[0]

    for text_annotation in annotation_result.text_annotations:
        print("\nText: {}".format(text_annotation.text))

        # Get the first text segment
        text_segment = text_annotation.segments[0]
        start_time = text_segment.segment.start_time_offset
        end_time = text_segment.segment.end_time_offset
        print(
            "start_time: {}, end_time: {}".format(
                start_time.seconds + start_time.microseconds * 1e-6,
                end_time.seconds + end_time.microseconds * 1e-6,
            )
        )

        print("Confidence: {}".format(text_segment.confidence))

        # Show the result for the first frame in this segment.
        frame = text_segment.frames[0]
        time_offset = frame.time_offset
        print(
            "Time offset for the first frame: {}".format(
                time_offset.seconds + time_offset.microseconds * 1e-6
            )
        )
        print("Rotated Bounding Box Vertices:")
        for vertex in frame.rotated_bounding_box.vertices:
            print("\tVertex.x: {}, Vertex.y: {}".format(vertex.x, vertex.y))

Weitere Sprachen

C#: Folgen Sie der Anleitung zur Einrichtung von C# auf der Seite der Clientbibliotheken und rufen Sie dann die Video Intelligence-Referenzdokumentation für .NET auf.

PHP: Folgen Sie der Anleitung zur Einrichtung von PHP auf der Seite der Clientbibliotheken und rufen Sie dann die Video Intelligence-Referenzdokumentation für PHP auf.

Ruby: Folgen Sie der Anleitung zur Einrichtung von Ruby auf der Seite der Clientbibliotheken und rufen Sie dann die Video Intelligence-Referenzdokumentation für Ruby auf.