此页面由 Cloud Translation API 翻译。

识别文字

文本检测功能可执行光学字符识别（OCR），检测并提取输入视频中的文本。

文本检测适用于 Cloud Vision API 支持的所有语言。

请求对 Cloud Storage 中的视频执行文本检测

以下示例演示了如何对 Cloud Storage 中的文件执行文本检测。

REST

发送视频注释请求

以下代码展示了如何向 videos:annotate 方法发送 POST 请求。该示例使用 Google Cloud CLI 创建访问令牌。如需了解如何安装 gcloud CLI，请参阅 Video Intelligence API 快速入门。

在使用任何请求数据之前，请先进行以下替换：

INPUT_URI：包含要添加注释的文件的 Cloud Storage 存储分区（包括文件名）。必须以 gs:// 开头。
例如："inputUri": "gs://cloud-videointelligence-demo/assistant.mp4",。
LANGUAGE_CODE：[可选]例如 “en-US”
PROJECT_NUMBER：您的 Google Cloud 项目的数字标识符

HTTP 方法和网址：

POST https://videointelligence.googleapis.com/v1/videos:annotate

请求 JSON 正文：

{
  "inputUri": "INPUT_URI",
  "features": ["TEXT_DETECTION"],
  "videoContext": {
    "textDetectionConfig": {
      "languageHints": ["LANGUAGE_CODE"]
    }
  }
}

如需发送您的请求，请展开以下选项之一：

curl（Linux、macOS 或 Cloud Shell）

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI，或者使用了 Cloud Shell，这会使您自动登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: PROJECT_NUMBER" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://videointelligence.googleapis.com/v1/videos:annotate"

PowerShell (Windows)

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_NUMBER" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://videointelligence.googleapis.com/v1/videos:annotate" | Select-Object -Expand Content

您应该收到类似以下内容的 JSON 响应：

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/operations/OPERATION_ID"
}

如果响应成功，Video Intelligence API 将返回您的操作的 name。上面的示例展示了此类响应的示例，其中 project-number 是您的项目编号，operation-id 是为请求创建的长时间运行的操作的 ID。

PROJECT_NUMBER：您项目的编号
LOCATION_ID：在其中添加注释的 Cloud 区域。支持的云区域为：us-east1、us-west1、europe-west1、asia-east1。如果未指定区域，系统将根据视频文件位置确定区域。
OPERATION_ID：是为请求创建的长时间运行的操作的 ID，并在启动操作时在响应中提供，例如 12345...

获取注释结果

要检索操作的结果，请使用从 videos：annotate 调用返回的操作名称发出 GET 请求，如以下示例所示。

在使用任何请求数据之前，请先进行以下替换：

OPERATION_NAME：Video Intelligence API 返回的操作名称。操作名称采用 projects/PROJECT_NUMBER/locations/LOCATION_ID/operations/OPERATION_ID 格式
PROJECT_NUMBER：您的 Google Cloud 项目的数字标识符

HTTP 方法和网址：

GET https://videointelligence.googleapis.com/v1/OPERATION_NAME

如需发送您的请求，请展开以下选项之一：

curl（Linux、macOS 或 Cloud Shell）

执行以下命令：

curl -X GET \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: PROJECT_NUMBER" \
     "https://videointelligence.googleapis.com/v1/OPERATION_NAME"

PowerShell (Windows)

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

执行以下命令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_NUMBER" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://videointelligence.googleapis.com/v1/OPERATION_NAME" | Select-Object -Expand Content

您应该收到类似以下内容的 JSON 响应：

响应

"textAnnotations": [
  {
    "text": "Hair Salon",
    "segments": [
      {
        "segment": {
          "startTimeOffset": "0.833333s",
          "endTimeOffset": "2.291666s"
        },
        "confidence": 0.99438506,
        "frames": [
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.7015625,
                  "y": 0.59583336
                },
                {
                  "x": 0.7984375,
                  "y": 0.59583336
                },
                {
                  "x": 0.7984375,
                  "y": 0.64166665
                },
                {
                  "x": 0.7015625,
                  "y": 0.64166665
                }
              ]
            },
            "timeOffset": "0.833333s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "1.041666s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "1.250s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6319444
                },
                {
                  "x": 0.70234376,
                  "y": 0.6319444
                }
              ]
            },
            "timeOffset": "1.458333s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "1.666666s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "1.875s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "2.083333s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "2.291666s"
          }
        ]
      }
    ]
  },
  {
    "text": "\"Sure, give me one second.\"",
    "segments": [
      {
        "segment": {
          "startTimeOffset": "10.625s",
          "endTimeOffset": "13.333333s"
        },
        "confidence": 0.98716676,
        "frames": [
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.60859376,
                  "y": 0.59583336
                },
                {
                  "x": 0.8952959,
                  "y": 0.5903528
                },
                {
                  "x": 0.89560676,
                  "y": 0.6417387
                },
                {
                  "x": 0.60890454,
                  "y": 0.64721924
                }
              ]
            },
            "timeOffset": "10.625s"
          },
  ...

    ]
  }

文本检测注释以 textAnnotations 列表的形式返回。注意：仅当值为 True 时，才会返回 done 字段。操作未完成的响应中不包含该字段。

下载注释结果

将来源中的注释复制到目标存储分区（请参阅复制文件和对象）：

gcloud storage cp gcs_uri gs://my-bucket

注意：如果输出 gcs uri 由用户提供，则注释存储在该 gcs uri 中。

Go


import (
	"context"
	"fmt"
	"io"

	video "cloud.google.com/go/videointelligence/apiv1"
	videopb "cloud.google.com/go/videointelligence/apiv1/videointelligencepb"
	"github.com/golang/protobuf/ptypes"
)

// textDetectionGCS analyzes a video and extracts the text from the video's audio.
func textDetectionGCS(w io.Writer, gcsURI string) error {
	// gcsURI := "gs://python-docs-samples-tests/video/googlework_short.mp4"

	ctx := context.Background()

	// Creates a client.
	client, err := video.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("video.NewClient: %w", err)
	}
	defer client.Close()

	op, err := client.AnnotateVideo(ctx, &videopb.AnnotateVideoRequest{
		InputUri: gcsURI,
		Features: []videopb.Feature{
			videopb.Feature_TEXT_DETECTION,
		},
	})
	if err != nil {
		return fmt.Errorf("AnnotateVideo: %w", err)
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		return fmt.Errorf("Wait: %w", err)
	}

	// Only one video was processed, so get the first result.
	result := resp.GetAnnotationResults()[0]

	for _, annotation := range result.TextAnnotations {
		fmt.Fprintf(w, "Text: %q\n", annotation.GetText())

		// Get the first text segment.
		segment := annotation.GetSegments()[0]
		start, _ := ptypes.Duration(segment.GetSegment().GetStartTimeOffset())
		end, _ := ptypes.Duration(segment.GetSegment().GetEndTimeOffset())
		fmt.Fprintf(w, "\tSegment: %v to %v\n", start, end)

		fmt.Fprintf(w, "\tConfidence: %f\n", segment.GetConfidence())

		// Show the result for the first frame in this segment.
		frame := segment.GetFrames()[0]
		seconds := float32(frame.GetTimeOffset().GetSeconds())
		nanos := float32(frame.GetTimeOffset().GetNanos())
		fmt.Fprintf(w, "\tTime offset of the first frame: %fs\n", seconds+nanos/1e9)

		fmt.Fprintf(w, "\tRotated bounding box vertices:\n")
		for _, vertex := range frame.GetRotatedBoundingBox().GetVertices() {
			fmt.Fprintf(w, "\t\tVertex x=%f, y=%f\n", vertex.GetX(), vertex.GetY())
		}
	}

	return nil
}

Java

如需向 Video Intelligence 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。

/**
 * Detect Text in a video.
 *
 * @param gcsUri the path to the video file to analyze.
 */
public static VideoAnnotationResults detectTextGcs(String gcsUri) throws Exception {
  try (VideoIntelligenceServiceClient client = VideoIntelligenceServiceClient.create()) {
    // Create the request
    AnnotateVideoRequest request =
        AnnotateVideoRequest.newBuilder()
            .setInputUri(gcsUri)
            .addFeatures(Feature.TEXT_DETECTION)
            .build();

    // asynchronously perform object tracking on videos
    OperationFuture<AnnotateVideoResponse, AnnotateVideoProgress> future =
        client.annotateVideoAsync(request);

    System.out.println("Waiting for operation to complete...");
    // The first result is retrieved because a single video was processed.
    AnnotateVideoResponse response = future.get(300, TimeUnit.SECONDS);
    VideoAnnotationResults results = response.getAnnotationResults(0);

    // Get only the first annotation for demo purposes.
    TextAnnotation annotation = results.getTextAnnotations(0);
    System.out.println("Text: " + annotation.getText());

    // Get the first text segment.
    TextSegment textSegment = annotation.getSegments(0);
    System.out.println("Confidence: " + textSegment.getConfidence());
    // For the text segment display it's time offset
    VideoSegment videoSegment = textSegment.getSegment();
    Duration startTimeOffset = videoSegment.getStartTimeOffset();
    Duration endTimeOffset = videoSegment.getEndTimeOffset();
    // Display the offset times in seconds, 1e9 is part of the formula to convert nanos to seconds
    System.out.println(
        String.format(
            "Start time: %.2f", startTimeOffset.getSeconds() + startTimeOffset.getNanos() / 1e9));
    System.out.println(
        String.format(
            "End time: %.2f", endTimeOffset.getSeconds() + endTimeOffset.getNanos() / 1e9));

    // Show the first result for the first frame in the segment.
    TextFrame textFrame = textSegment.getFrames(0);
    Duration timeOffset = textFrame.getTimeOffset();
    System.out.println(
        String.format(
            "Time offset for the first frame: %.2f",
            timeOffset.getSeconds() + timeOffset.getNanos() / 1e9));

    // Display the rotated bounding box for where the text is on the frame.
    System.out.println("Rotated Bounding Box Vertices:");
    List<NormalizedVertex> vertices = textFrame.getRotatedBoundingBox().getVerticesList();
    for (NormalizedVertex normalizedVertex : vertices) {
      System.out.println(
          String.format(
              "\tVertex.x: %.2f, Vertex.y: %.2f",
              normalizedVertex.getX(), normalizedVertex.getY()));
    }
    return results;
  }
}

Node.js

如需向 Video Intelligence 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。

// Imports the Google Cloud Video Intelligence library
const Video = require('@google-cloud/video-intelligence');
// Creates a client
const video = new Video.VideoIntelligenceServiceClient();

/**
 * TODO(developer): Uncomment the following line before running the sample.
 */
// const gcsUri = 'GCS URI of the video to analyze, e.g. gs://my-bucket/my-video.mp4';

const request = {
  inputUri: gcsUri,
  features: ['TEXT_DETECTION'],
};
// Detects text in a video
const [operation] = await video.annotateVideo(request);
const results = await operation.promise();
console.log('Waiting for operation to complete...');
// Gets annotations for video
const textAnnotations = results[0].annotationResults[0].textAnnotations;
textAnnotations.forEach(textAnnotation => {
  console.log(`Text ${textAnnotation.text} occurs at:`);
  textAnnotation.segments.forEach(segment => {
    const time = segment.segment;
    console.log(
      ` Start: ${time.startTimeOffset.seconds || 0}.${(
        time.startTimeOffset.nanos / 1e6
      ).toFixed(0)}s`
    );
    console.log(
      ` End: ${time.endTimeOffset.seconds || 0}.${(
        time.endTimeOffset.nanos / 1e6
      ).toFixed(0)}s`
    );
    console.log(` Confidence: ${segment.confidence}`);
    segment.frames.forEach(frame => {
      const timeOffset = frame.timeOffset;
      console.log(
        `Time offset for the frame: ${timeOffset.seconds || 0}` +
          `.${(timeOffset.nanos / 1e6).toFixed(0)}s`
      );
      console.log('Rotated Bounding Box Vertices:');
      frame.rotatedBoundingBox.vertices.forEach(vertex => {
        console.log(`Vertex.x:${vertex.x}, Vertex.y:${vertex.y}`);
      });
    });
  });
});

Python

如需向 Video Intelligence 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。

"""Detect text in a video stored on GCS."""
from google.cloud import videointelligence

video_client = videointelligence.VideoIntelligenceServiceClient()
features = [videointelligence.Feature.TEXT_DETECTION]

operation = video_client.annotate_video(
    request={"features": features, "input_uri": input_uri}
)

print("\nProcessing video for text detection.")
result = operation.result(timeout=600)

# The first result is retrieved because a single video was processed.
annotation_result = result.annotation_results[0]

for text_annotation in annotation_result.text_annotations:
    print("\nText: {}".format(text_annotation.text))

    # Get the first text segment
    text_segment = text_annotation.segments[0]
    start_time = text_segment.segment.start_time_offset
    end_time = text_segment.segment.end_time_offset
    print(
        "start_time: {}, end_time: {}".format(
            start_time.seconds + start_time.microseconds * 1e-6,
            end_time.seconds + end_time.microseconds * 1e-6,
        )
    )

    print("Confidence: {}".format(text_segment.confidence))

    # Show the result for the first frame in this segment.
    frame = text_segment.frames[0]
    time_offset = frame.time_offset
    print(
        "Time offset for the first frame: {}".format(
            time_offset.seconds + time_offset.microseconds * 1e-6
        )
    )
    print("Rotated Bounding Box Vertices:")
    for vertex in frame.rotated_bounding_box.vertices:
        print("\tVertex.x: {}, Vertex.y: {}".format(vertex.x, vertex.y))

其他语言

C#：请按照客户端库页面上的 C# 设置说明操作，然后访问 .NET 版 Video Intelligence 参考文档。

PHP：请按照客户端库页面上的 PHP 设置说明操作，然后访问 PHP 版 Video Intelligence 参考文档。

Ruby：请按照客户端库页面上的 Ruby 设置说明操作，然后访问 Ruby 版 Video Intelligence 参考文档。

请求对本地文件中的视频执行文本检测

以下示例展示如何对本地存储的文件执行文本检测。

REST

发送视频注释请求

要对本地视频文件执行注释，请务必对视频文件的内容进行 base64 编码。在请求的 inputContent 字段中添加 base64 编码的内容。如需了解如何对视频文件的内容进行 base64 编码，请参阅 Base64 编码。

以下代码展示了如何向 videos:annotate 方法发送 POST 请求。该示例使用 Google Cloud CLI 创建访问令牌。如需了解如何安装 Google Cloud CLI，请参阅 Video Intelligence API 快速入门

在使用任何请求数据之前，请先进行以下替换：

"inputContent": BASE64_ENCODED_CONTENT
例如：
"UklGRg41AwBBVkkgTElTVAwBAABoZHJsYXZpaDgAAAA1ggAAxPMBAAAAAAAQCAA..."
LANGUAGE_CODE：[可选]例如 “en-US”
PROJECT_NUMBER：您的 Google Cloud 项目的数字标识符

HTTP 方法和网址：

POST https://videointelligence.googleapis.com/v1/videos:annotate

请求 JSON 正文：

{
  "inputContent": "BASE64_ENCODED_CONTENT",
  "features": ["TEXT_DETECTION"],
  "videoContext": {
    "textDetectionConfig": {
      "languageHints": ["LANGUAGE_CODE"]
    }
  }
}

如需发送您的请求，请展开以下选项之一：

curl（Linux、macOS 或 Cloud Shell）

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: PROJECT_NUMBER" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://videointelligence.googleapis.com/v1/videos:annotate"

PowerShell (Windows)

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_NUMBER" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://videointelligence.googleapis.com/v1/videos:annotate" | Select-Object -Expand Content

您应该收到类似以下内容的 JSON 响应：

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/operations/OPERATION_ID"
}

如果响应成功，Video Intelligence API 将返回您的操作的 name。上面的示例展示了此类响应的示例，其中 project-number 是您的项目名称，operation-id 是为请求创建的长时间运行的操作的 ID。

OPERATION_ID：并在启动操作时在响应中提供，例如 12345...

获取注释结果

要检索操作的结果，请使用从 videos：annotate 调用返回的操作名称发出 GET 请求，如以下示例所示。

在使用任何请求数据之前，请先进行以下替换：

PROJECT_NUMBER：您的 Google Cloud 项目的数字标识符

HTTP 方法和网址：

GET https://videointelligence.googleapis.com/v1/OPERATION_NAME

如需发送您的请求，请展开以下选项之一：

curl（Linux、macOS 或 Cloud Shell）

执行以下命令：

curl -X GET \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: PROJECT_NUMBER" \
     "https://videointelligence.googleapis.com/v1/OPERATION_NAME"

PowerShell (Windows)

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

执行以下命令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_NUMBER" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://videointelligence.googleapis.com/v1/OPERATION_NAME" | Select-Object -Expand Content

您应该收到类似以下内容的 JSON 响应：

响应

"textAnnotations": [
  {
    "text": "Hair Salon",
    "segments": [
      {
        "segment": {
          "startTimeOffset": "0.833333s",
          "endTimeOffset": "2.291666s"
        },
        "confidence": 0.99438506,
        "frames": [
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.7015625,
                  "y": 0.59583336
                },
                {
                  "x": 0.7984375,
                  "y": 0.59583336
                },
                {
                  "x": 0.7984375,
                  "y": 0.64166665
                },
                {
                  "x": 0.7015625,
                  "y": 0.64166665
                }
              ]
            },
            "timeOffset": "0.833333s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "1.041666s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "1.250s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6319444
                },
                {
                  "x": 0.70234376,
                  "y": 0.6319444
                }
              ]
            },
            "timeOffset": "1.458333s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "1.666666s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "1.875s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "2.083333s"
          },
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.70234376,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6
                },
                {
                  "x": 0.7992188,
                  "y": 0.6333333
                },
                {
                  "x": 0.70234376,
                  "y": 0.6333333
                }
              ]
            },
            "timeOffset": "2.291666s"
          }
        ]
      }
    ]
  },
  {
    "text": "\"Sure, give me one second.\"",
    "segments": [
      {
        "segment": {
          "startTimeOffset": "10.625s",
          "endTimeOffset": "13.333333s"
        },
        "confidence": 0.98716676,
        "frames": [
          {
            "rotatedBoundingBox": {
              "vertices": [
                {
                  "x": 0.60859376,
                  "y": 0.59583336
                },
                {
                  "x": 0.8952959,
                  "y": 0.5903528
                },
                {
                  "x": 0.89560676,
                  "y": 0.6417387
                },
                {
                  "x": 0.60890454,
                  "y": 0.64721924
                }
              ]
            },
            "timeOffset": "10.625s"
          },
  ...

    ]
}

文本检测注释以 textAnnotations 列表的形式返回。注意：仅当值为 True 时，才会返回 done 字段。操作未完成的响应中不包含该字段。

Go


import (
	"context"
	"fmt"
	"io"
	"os"

	video "cloud.google.com/go/videointelligence/apiv1"
	videopb "cloud.google.com/go/videointelligence/apiv1/videointelligencepb"
	"github.com/golang/protobuf/ptypes"
)

// textDetection analyzes a video and extracts the text from the video's audio.
func textDetection(w io.Writer, filename string) error {
	// filename := "../testdata/googlework_short.mp4"

	ctx := context.Background()

	// Creates a client.
	client, err := video.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("video.NewClient: %w", err)
	}
	defer client.Close()

	fileBytes, err := os.ReadFile(filename)
	if err != nil {
		return fmt.Errorf("os.ReadFile: %w", err)
	}

	op, err := client.AnnotateVideo(ctx, &videopb.AnnotateVideoRequest{
		InputContent: fileBytes,
		Features: []videopb.Feature{
			videopb.Feature_TEXT_DETECTION,
		},
	})
	if err != nil {
		return fmt.Errorf("AnnotateVideo: %w", err)
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		return fmt.Errorf("Wait: %w", err)
	}

	// Only one video was processed, so get the first result.
	result := resp.GetAnnotationResults()[0]

	for _, annotation := range result.TextAnnotations {
		fmt.Fprintf(w, "Text: %q\n", annotation.GetText())

		// Get the first text segment.
		segment := annotation.GetSegments()[0]
		start, _ := ptypes.Duration(segment.GetSegment().GetStartTimeOffset())
		end, _ := ptypes.Duration(segment.GetSegment().GetEndTimeOffset())
		fmt.Fprintf(w, "\tSegment: %v to %v\n", start, end)

		fmt.Fprintf(w, "\tConfidence: %f\n", segment.GetConfidence())

		// Show the result for the first frame in this segment.
		frame := segment.GetFrames()[0]
		seconds := float32(frame.GetTimeOffset().GetSeconds())
		nanos := float32(frame.GetTimeOffset().GetNanos())
		fmt.Fprintf(w, "\tTime offset of the first frame: %fs\n", seconds+nanos/1e9)

		fmt.Fprintf(w, "\tRotated bounding box vertices:\n")
		for _, vertex := range frame.GetRotatedBoundingBox().GetVertices() {
			fmt.Fprintf(w, "\t\tVertex x=%f, y=%f\n", vertex.GetX(), vertex.GetY())
		}
	}

	return nil
}

Java

/**
 * Detect text in a video.
 *
 * @param filePath the path to the video file to analyze.
 */
public static VideoAnnotationResults detectText(String filePath) throws Exception {
  try (VideoIntelligenceServiceClient client = VideoIntelligenceServiceClient.create()) {
    // Read file
    Path path = Paths.get(filePath);
    byte[] data = Files.readAllBytes(path);

    // Create the request
    AnnotateVideoRequest request =
        AnnotateVideoRequest.newBuilder()
            .setInputContent(ByteString.copyFrom(data))
            .addFeatures(Feature.TEXT_DETECTION)
            .build();

    // asynchronously perform object tracking on videos
    OperationFuture<AnnotateVideoResponse, AnnotateVideoProgress> future =
        client.annotateVideoAsync(request);

    System.out.println("Waiting for operation to complete...");
    // The first result is retrieved because a single video was processed.
    AnnotateVideoResponse response = future.get(300, TimeUnit.SECONDS);
    VideoAnnotationResults results = response.getAnnotationResults(0);

    // Get only the first annotation for demo purposes.
    TextAnnotation annotation = results.getTextAnnotations(0);
    System.out.println("Text: " + annotation.getText());

    // Get the first text segment.
    TextSegment textSegment = annotation.getSegments(0);
    System.out.println("Confidence: " + textSegment.getConfidence());
    // For the text segment display it's time offset
    VideoSegment videoSegment = textSegment.getSegment();
    Duration startTimeOffset = videoSegment.getStartTimeOffset();
    Duration endTimeOffset = videoSegment.getEndTimeOffset();
    // Display the offset times in seconds, 1e9 is part of the formula to convert nanos to seconds
    System.out.println(
        String.format(
            "Start time: %.2f", startTimeOffset.getSeconds() + startTimeOffset.getNanos() / 1e9));
    System.out.println(
        String.format(
            "End time: %.2f", endTimeOffset.getSeconds() + endTimeOffset.getNanos() / 1e9));

    // Show the first result for the first frame in the segment.
    TextFrame textFrame = textSegment.getFrames(0);
    Duration timeOffset = textFrame.getTimeOffset();
    System.out.println(
        String.format(
            "Time offset for the first frame: %.2f",
            timeOffset.getSeconds() + timeOffset.getNanos() / 1e9));

    // Display the rotated bounding box for where the text is on the frame.
    System.out.println("Rotated Bounding Box Vertices:");
    List<NormalizedVertex> vertices = textFrame.getRotatedBoundingBox().getVerticesList();
    for (NormalizedVertex normalizedVertex : vertices) {
      System.out.println(
          String.format(
              "\tVertex.x: %.2f, Vertex.y: %.2f",
              normalizedVertex.getX(), normalizedVertex.getY()));
    }
    return results;
  }
}

Node.js

如需向 Video Intelligence 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。

// Imports the Google Cloud Video Intelligence library + Node's fs library
const Video = require('@google-cloud/video-intelligence');
const fs = require('fs');
const util = require('util');
// Creates a client
const video = new Video.VideoIntelligenceServiceClient();

/**
 * TODO(developer): Uncomment the following line before running the sample.
 */
// const path = 'Local file to analyze, e.g. ./my-file.mp4';

// Reads a local video file and converts it to base64
const file = await util.promisify(fs.readFile)(path);
const inputContent = file.toString('base64');

const request = {
  inputContent: inputContent,
  features: ['TEXT_DETECTION'],
};
// Detects text in a video
const [operation] = await video.annotateVideo(request);
const results = await operation.promise();
console.log('Waiting for operation to complete...');

// Gets annotations for video
const textAnnotations = results[0].annotationResults[0].textAnnotations;
textAnnotations.forEach(textAnnotation => {
  console.log(`Text ${textAnnotation.text} occurs at:`);
  textAnnotation.segments.forEach(segment => {
    const time = segment.segment;
    if (time.startTimeOffset.seconds === undefined) {
      time.startTimeOffset.seconds = 0;
    }
    if (time.startTimeOffset.nanos === undefined) {
      time.startTimeOffset.nanos = 0;
    }
    if (time.endTimeOffset.seconds === undefined) {
      time.endTimeOffset.seconds = 0;
    }
    if (time.endTimeOffset.nanos === undefined) {
      time.endTimeOffset.nanos = 0;
    }
    console.log(
      `\tStart: ${time.startTimeOffset.seconds || 0}` +
        `.${(time.startTimeOffset.nanos / 1e6).toFixed(0)}s`
    );
    console.log(
      `\tEnd: ${time.endTimeOffset.seconds || 0}.` +
        `${(time.endTimeOffset.nanos / 1e6).toFixed(0)}s`
    );
    console.log(`\tConfidence: ${segment.confidence}`);
    segment.frames.forEach(frame => {
      const timeOffset = frame.timeOffset;
      console.log(
        `Time offset for the frame: ${timeOffset.seconds || 0}` +
          `.${(timeOffset.nanos / 1e6).toFixed(0)}s`
      );
      console.log('Rotated Bounding Box Vertices:');
      frame.rotatedBoundingBox.vertices.forEach(vertex => {
        console.log(`Vertex.x:${vertex.x}, Vertex.y:${vertex.y}`);
      });
    });
  });
});

Python

import io

from google.cloud import videointelligence

def video_detect_text(path):
    """Detect text in a local video."""
    video_client = videointelligence.VideoIntelligenceServiceClient()
    features = [videointelligence.Feature.TEXT_DETECTION]
    video_context = videointelligence.VideoContext()

    with io.open(path, "rb") as file:
        input_content = file.read()

    operation = video_client.annotate_video(
        request={
            "features": features,
            "input_content": input_content,
            "video_context": video_context,
        }
    )

    print("\nProcessing video for text detection.")
    result = operation.result(timeout=300)

    # The first result is retrieved because a single video was processed.
    annotation_result = result.annotation_results[0]

    for text_annotation in annotation_result.text_annotations:
        print("\nText: {}".format(text_annotation.text))

        # Get the first text segment
        text_segment = text_annotation.segments[0]
        start_time = text_segment.segment.start_time_offset
        end_time = text_segment.segment.end_time_offset
        print(
            "start_time: {}, end_time: {}".format(
                start_time.seconds + start_time.microseconds * 1e-6,
                end_time.seconds + end_time.microseconds * 1e-6,
            )
        )

        print("Confidence: {}".format(text_segment.confidence))

        # Show the result for the first frame in this segment.
        frame = text_segment.frames[0]
        time_offset = frame.time_offset
        print(
            "Time offset for the first frame: {}".format(
                time_offset.seconds + time_offset.microseconds * 1e-6
            )
        )
        print("Rotated Bounding Box Vertices:")
        for vertex in frame.rotated_bounding_box.vertices:
            print("\tVertex.x: {}, Vertex.y: {}".format(vertex.x, vertex.y))

其他语言

C#：请按照客户端库页面上的 C# 设置说明操作，然后访问 .NET 版 Video Intelligence 参考文档。

PHP：请按照客户端库页面上的 PHP 设置说明操作，然后访问 PHP 版 Video Intelligence 参考文档。

Ruby：请按照客户端库页面上的 Ruby 设置说明操作，然后访问 Ruby 版 Video Intelligence 参考文档。