文本检测功能可执行光学字符识别(OCR),检测并提取输入视频中的文本。
文本检测适用于 Cloud Vision API 支持的所有语言。
请求对 Google Cloud Storage 中的视频执行文本检测
以下示例演示了如何对 Cloud Storage 中的文件执行文本检测。
REST 和命令行
发送视频注释请求
以下代码展示了如何向 videos:annotate
方法发送 POST 请求。本示例针对通过 Cloud SDK 为项目设置的服务帐号使用访问令牌。如需了解有关安装 Cloud SDK、使用服务帐号设置项目以及获取访问令牌的说明,请参阅 Video Intelligence API 快速入门。
在使用下面的请求数据之前,请先进行以下替换:
- input-uri:包含要添加注释的文件的 Cloud Storage 存储分区(包括文件名)。必须以
gs://
开头。
例如:"inputUri": "gs://cloud-videointelligence-demo/assistant.mp4",
。 - language-code:[可选]例如 “en-US”
HTTP 方法和网址:
POST https://videointelligence.googleapis.com/v1/videos:annotate
请求 JSON 正文:
{ "inputUri": "input-uri", "features": ["TEXT_DETECTION"], "videoContext": { "textDetectionConfig": { "languageHints": ["language-code"] } } }
如需发送您的请求,请展开以下选项之一:
curl(Linux、macOS 或 Cloud Shell)
将请求正文保存在名为 request.json
的文件中,然后执行以下命令:
curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
https://videointelligence.googleapis.com/v1/videos:annotate
PowerShell (Windows)
将请求正文保存在名为 request.json
的文件中,然后执行以下命令:
$cred = gcloud auth application-default print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://videointelligence.googleapis.com/v1/videos:annotate" | Select-Object -Expand Content
您应会收到如下所示的 JSON 响应:
{ "name": "projects/project-number/locations/location-id/operations/operation-id" }
如果响应成功,Video Intelligence API 将返回您的操作的 name
。上面的示例展示了此类响应的示例,其中 project-number
是您的项目编号,operation-id
是为请求创建的长时间运行的操作的 ID。
- project-number:您项目的编号
- location-id:在其中添加注释的 Cloud 区域。支持的云区域为:
us-east1
、us-west1
、europe-west1
、asia-east1
。如果未指定区域,系统将根据视频文件位置确定区域。 - operation-id:是为请求创建的长时间运行的操作的 ID,并在启动操作时在响应中提供,例如
12345...
获取注释结果
要检索操作的结果,请使用从 videos:annotate 调用返回的操作名称发出 GET 请求,如以下示例所示。
在使用下面的请求数据之前,请先进行以下替换:
- operation-name:Video Intelligence API 返回的操作名称。操作名称采用
projects/project-number/locations/location-id/operations/operation-id
格式
HTTP 方法和网址:
GET https://videointelligence.googleapis.com/v1/operation-name
如需发送您的请求,请展开以下选项之一:
curl(Linux、macOS 或 Cloud Shell)
执行以下命令:
curl -X GET \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
https://videointelligence.googleapis.com/v1/operation-name
PowerShell (Windows)
执行以下命令:
$cred = gcloud auth application-default print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://videointelligence.googleapis.com/v1/operation-name" | Select-Object -Expand Content
您应该收到类似以下内容的 JSON 响应:
响应
"textAnnotations": [ { "text": "Hair Salon", "segments": [ { "segment": { "startTimeOffset": "0.833333s", "endTimeOffset": "2.291666s" }, "confidence": 0.99438506, "frames": [ { "rotatedBoundingBox": { "vertices": [ { "x": 0.7015625, "y": 0.59583336 }, { "x": 0.7984375, "y": 0.59583336 }, { "x": 0.7984375, "y": 0.64166665 }, { "x": 0.7015625, "y": 0.64166665 } ] }, "timeOffset": "0.833333s" }, { "rotatedBoundingBox": { "vertices": [ { "x": 0.70234376, "y": 0.6 }, { "x": 0.7992188, "y": 0.6 }, { "x": 0.7992188, "y": 0.6333333 }, { "x": 0.70234376, "y": 0.6333333 } ] }, "timeOffset": "1.041666s" }, { "rotatedBoundingBox": { "vertices": [ { "x": 0.70234376, "y": 0.6 }, { "x": 0.7992188, "y": 0.6 }, { "x": 0.7992188, "y": 0.6333333 }, { "x": 0.70234376, "y": 0.6333333 } ] }, "timeOffset": "1.250s" }, { "rotatedBoundingBox": { "vertices": [ { "x": 0.70234376, "y": 0.6 }, { "x": 0.7992188, "y": 0.6 }, { "x": 0.7992188, "y": 0.6319444 }, { "x": 0.70234376, "y": 0.6319444 } ] }, "timeOffset": "1.458333s" }, { "rotatedBoundingBox": { "vertices": [ { "x": 0.70234376, "y": 0.6 }, { "x": 0.7992188, "y": 0.6 }, { "x": 0.7992188, "y": 0.6333333 }, { "x": 0.70234376, "y": 0.6333333 } ] }, "timeOffset": "1.666666s" }, { "rotatedBoundingBox": { "vertices": [ { "x": 0.70234376, "y": 0.6 }, { "x": 0.7992188, "y": 0.6 }, { "x": 0.7992188, "y": 0.6333333 }, { "x": 0.70234376, "y": 0.6333333 } ] }, "timeOffset": "1.875s" }, { "rotatedBoundingBox": { "vertices": [ { "x": 0.70234376, "y": 0.6 }, { "x": 0.7992188, "y": 0.6 }, { "x": 0.7992188, "y": 0.6333333 }, { "x": 0.70234376, "y": 0.6333333 } ] }, "timeOffset": "2.083333s" }, { "rotatedBoundingBox": { "vertices": [ { "x": 0.70234376, "y": 0.6 }, { "x": 0.7992188, "y": 0.6 }, { "x": 0.7992188, "y": 0.6333333 }, { "x": 0.70234376, "y": 0.6333333 } ] }, "timeOffset": "2.291666s" } ] } ] }, { "text": "\"Sure, give me one second.\"", "segments": [ { "segment": { "startTimeOffset": "10.625s", "endTimeOffset": "13.333333s" }, "confidence": 0.98716676, "frames": [ { "rotatedBoundingBox": { "vertices": [ { "x": 0.60859376, "y": 0.59583336 }, { "x": 0.8952959, "y": 0.5903528 }, { "x": 0.89560676, "y": 0.6417387 }, { "x": 0.60890454, "y": 0.64721924 } ] }, "timeOffset": "10.625s" }, ... ] }
textAnnotations
列表的形式返回。注意:仅当值为 True 时,才会返回 done 字段。操作未完成的响应中不包含该字段。C#
public static object DetectTextGcs(string gcsUri)
{
var client = VideoIntelligenceServiceClient.Create();
var request = new AnnotateVideoRequest
{
InputUri = gcsUri,
Features = { Feature.TextDetection },
};
Console.WriteLine("\nProcessing video for text detection.");
var op = client.AnnotateVideo(request).PollUntilCompleted();
// Retrieve the first result because only one video was processed.
var annotationResults = op.Result.AnnotationResults[0];
// Get only the first result.
var textAnnotation = annotationResults.TextAnnotations[0];
Console.WriteLine($"\nText: {textAnnotation.Text}");
// Get the first text segment.
var textSegment = textAnnotation.Segments[0];
var startTime = textSegment.Segment.StartTimeOffset;
var endTime = textSegment.Segment.EndTimeOffset;
Console.Write(
$"Start time: {startTime.Seconds + startTime.Nanos / 1e9 }, ");
Console.WriteLine(
$"End time: {endTime.Seconds + endTime.Nanos / 1e9 }");
Console.WriteLine($"Confidence: {textSegment.Confidence}");
// Show the result for the first frame in this segment.
var frame = textSegment.Frames[0];
var timeOffset = frame.TimeOffset;
Console.Write("Time offset for the first frame: ");
Console.WriteLine(timeOffset.Seconds + timeOffset.Nanos * 1e9);
Console.WriteLine("Rotated Bounding Box Vertices:");
foreach (var vertex in frame.RotatedBoundingBox.Vertices)
{
Console.WriteLine(
$"\tVertex x: {vertex.X}, Vertex.y: {vertex.Y}");
}
return 0;
}
Go
import (
"context"
"fmt"
"io"
video "cloud.google.com/go/videointelligence/apiv1"
"github.com/golang/protobuf/ptypes"
videopb "google.golang.org/genproto/googleapis/cloud/videointelligence/v1"
)
// textDetectionGCS analyzes a video and extracts the text from the video's audio.
func textDetectionGCS(w io.Writer, gcsURI string) error {
// gcsURI := "gs://python-docs-samples-tests/video/googlework_short.mp4"
ctx := context.Background()
// Creates a client.
client, err := video.NewClient(ctx)
if err != nil {
return fmt.Errorf("video.NewClient: %v", err)
}
op, err := client.AnnotateVideo(ctx, &videopb.AnnotateVideoRequest{
InputUri: gcsURI,
Features: []videopb.Feature{
videopb.Feature_TEXT_DETECTION,
},
})
if err != nil {
return fmt.Errorf("AnnotateVideo: %v", err)
}
resp, err := op.Wait(ctx)
if err != nil {
return fmt.Errorf("Wait: %v", err)
}
// Only one video was processed, so get the first result.
result := resp.GetAnnotationResults()[0]
for _, annotation := range result.TextAnnotations {
fmt.Fprintf(w, "Text: %q\n", annotation.GetText())
// Get the first text segment.
segment := annotation.GetSegments()[0]
start, _ := ptypes.Duration(segment.GetSegment().GetStartTimeOffset())
end, _ := ptypes.Duration(segment.GetSegment().GetEndTimeOffset())
fmt.Fprintf(w, "\tSegment: %v to %v\n", start, end)
fmt.Fprintf(w, "\tConfidence: %f\n", segment.GetConfidence())
// Show the result for the first frame in this segment.
frame := segment.GetFrames()[0]
seconds := float32(frame.GetTimeOffset().GetSeconds())
nanos := float32(frame.GetTimeOffset().GetNanos())
fmt.Fprintf(w, "\tTime offset of the first frame: %fs\n", seconds+nanos/1e9)
fmt.Fprintf(w, "\tRotated bounding box vertices:\n")
for _, vertex := range frame.GetRotatedBoundingBox().GetVertices() {
fmt.Fprintf(w, "\t\tVertex x=%f, y=%f\n", vertex.GetX(), vertex.GetY())
}
}
return nil
}
Java
/**
* Detect Text in a video.
*
* @param gcsUri the path to the video file to analyze.
*/
public static VideoAnnotationResults detectTextGcs(String gcsUri) throws Exception {
try (VideoIntelligenceServiceClient client = VideoIntelligenceServiceClient.create()) {
// Create the request
AnnotateVideoRequest request =
AnnotateVideoRequest.newBuilder()
.setInputUri(gcsUri)
.addFeatures(Feature.TEXT_DETECTION)
.build();
// asynchronously perform object tracking on videos
OperationFuture<AnnotateVideoResponse, AnnotateVideoProgress> future =
client.annotateVideoAsync(request);
System.out.println("Waiting for operation to complete...");
// The first result is retrieved because a single video was processed.
AnnotateVideoResponse response = future.get(300, TimeUnit.SECONDS);
VideoAnnotationResults results = response.getAnnotationResults(0);
// Get only the first annotation for demo purposes.
TextAnnotation annotation = results.getTextAnnotations(0);
System.out.println("Text: " + annotation.getText());
// Get the first text segment.
TextSegment textSegment = annotation.getSegments(0);
System.out.println("Confidence: " + textSegment.getConfidence());
// For the text segment display it's time offset
VideoSegment videoSegment = textSegment.getSegment();
Duration startTimeOffset = videoSegment.getStartTimeOffset();
Duration endTimeOffset = videoSegment.getEndTimeOffset();
// Display the offset times in seconds, 1e9 is part of the formula to convert nanos to seconds
System.out.println(
String.format(
"Start time: %.2f", startTimeOffset.getSeconds() + startTimeOffset.getNanos() / 1e9));
System.out.println(
String.format(
"End time: %.2f", endTimeOffset.getSeconds() + endTimeOffset.getNanos() / 1e9));
// Show the first result for the first frame in the segment.
TextFrame textFrame = textSegment.getFrames(0);
Duration timeOffset = textFrame.getTimeOffset();
System.out.println(
String.format(
"Time offset for the first frame: %.2f",
timeOffset.getSeconds() + timeOffset.getNanos() / 1e9));
// Display the rotated bounding box for where the text is on the frame.
System.out.println("Rotated Bounding Box Vertices:");
List<NormalizedVertex> vertices = textFrame.getRotatedBoundingBox().getVerticesList();
for (NormalizedVertex normalizedVertex : vertices) {
System.out.println(
String.format(
"\tVertex.x: %.2f, Vertex.y: %.2f",
normalizedVertex.getX(), normalizedVertex.getY()));
}
return results;
}
}
Node.js
// Imports the Google Cloud Video Intelligence library
const Video = require('@google-cloud/video-intelligence');
// Creates a client
const video = new Video.VideoIntelligenceServiceClient();
/**
* TODO(developer): Uncomment the following line before running the sample.
*/
// const gcsUri = 'GCS URI of the video to analyze, e.g. gs://my-bucket/my-video.mp4';
const request = {
inputUri: gcsUri,
features: ['TEXT_DETECTION'],
};
// Detects text in a video
const [operation] = await video.annotateVideo(request);
const results = await operation.promise();
console.log('Waiting for operation to complete...');
// Gets annotations for video
const textAnnotations = results[0].annotationResults[0].textAnnotations;
textAnnotations.forEach(textAnnotation => {
console.log(`Text ${textAnnotation.text} occurs at:`);
textAnnotation.segments.forEach(segment => {
const time = segment.segment;
console.log(
` Start: ${time.startTimeOffset.seconds || 0}.${(
time.startTimeOffset.nanos / 1e6
).toFixed(0)}s`
);
console.log(
` End: ${time.endTimeOffset.seconds || 0}.${(
time.endTimeOffset.nanos / 1e6
).toFixed(0)}s`
);
console.log(` Confidence: ${segment.confidence}`);
segment.frames.forEach(frame => {
const timeOffset = frame.timeOffset;
console.log(
`Time offset for the frame: ${timeOffset.seconds || 0}` +
`.${(timeOffset.nanos / 1e6).toFixed(0)}s`
);
console.log('Rotated Bounding Box Vertices:');
frame.rotatedBoundingBox.vertices.forEach(vertex => {
console.log(`Vertex.x:${vertex.x}, Vertex.y:${vertex.y}`);
});
});
});
});
PHP
use Google\Cloud\VideoIntelligence\V1\VideoIntelligenceServiceClient;
use Google\Cloud\VideoIntelligence\V1\Feature;
/** Uncomment and populate these variables in your code */
// $uri = 'The cloud storage object to analyze (gs://your-bucket-name/your-object-name)';
// $options = [];
# Instantiate a client.
$video = new VideoIntelligenceServiceClient();
# Execute a request.
$operation = $video->annotateVideo([
'inputUri' => $uri,
'features' => [Feature::TEXT_DETECTION]
]);
# Wait for the request to complete.
$operation->pollUntilComplete($options);
# Print the results.
if ($operation->operationSucceeded()) {
$results = $operation->getResult()->getAnnotationResults()[0];
# Process video/segment level label annotations
foreach ($results->getTextAnnotations() as $text) {
printf('Video text description: %s' . PHP_EOL, $text->getText());
foreach ($text->getSegments() as $segment) {
$start = $segment->getSegment()->getStartTimeOffset();
$end = $segment->getSegment()->getEndTimeOffset();
printf(' Segment: %ss to %ss' . PHP_EOL,
$start->getSeconds() + $start->getNanos()/1000000000.0,
$end->getSeconds() + $end->getNanos()/1000000000.0);
printf(' Confidence: %f' . PHP_EOL, $segment->getConfidence());
}
}
print(PHP_EOL);
} else {
print_r($operation->getError());
}
Python
"""Detect text in a video stored on GCS."""
from google.cloud import videointelligence
video_client = videointelligence.VideoIntelligenceServiceClient()
features = [videointelligence.Feature.TEXT_DETECTION]
operation = video_client.annotate_video(
request={"features": features, "input_uri": input_uri}
)
print("\nProcessing video for text detection.")
result = operation.result(timeout=600)
# The first result is retrieved because a single video was processed.
annotation_result = result.annotation_results[0]
for text_annotation in annotation_result.text_annotations:
print("\nText: {}".format(text_annotation.text))
# Get the first text segment
text_segment = text_annotation.segments[0]
start_time = text_segment.segment.start_time_offset
end_time = text_segment.segment.end_time_offset
print(
"start_time: {}, end_time: {}".format(
start_time.seconds + start_time.microseconds * 1e-6,
end_time.seconds + end_time.microseconds * 1e-6,
)
)
print("Confidence: {}".format(text_segment.confidence))
# Show the result for the first frame in this segment.
frame = text_segment.frames[0]
time_offset = frame.time_offset
print(
"Time offset for the first frame: {}".format(
time_offset.seconds + time_offset.microseconds * 1e-6
)
)
print("Rotated Bounding Box Vertices:")
for vertex in frame.rotated_bounding_box.vertices:
print("\tVertex.x: {}, Vertex.y: {}".format(vertex.x, vertex.y))
Ruby
# path = "Path to a video file on Google Cloud Storage: gs://bucket/video.mp4"
require "google/cloud/video_intelligence"
video = Google::Cloud::VideoIntelligence.video_intelligence_service
# Register a callback during the method call
operation = video.annotate_video features: [:TEXT_DETECTION], input_uri: path
puts "Processing video for text detection:"
operation.wait_until_done!
raise operation.results.message? if operation.error?
puts "Finished Processing."
text_annotations = operation.results.annotation_results.first.text_annotations
print_text_annotations text_annotations
请求对本地文件中的视频执行文本检测
以下示例展示如何对本地存储的文件执行文本检测。
REST 和命令行
发送视频注释请求
要对本地视频文件执行注释,请务必对视频文件的内容进行 base64 编码。在请求的 inputContent
字段中添加 base64 编码的内容。如需了解如何对视频文件的内容进行 base64 编码,请参阅 Base64 编码。
以下代码展示了如何向 videos:annotate
方法发送 POST 请求。本示例针对通过 Cloud SDK 为项目设置的服务帐号使用访问令牌。如需了解有关安装 Cloud SDK、使用服务帐号设置项目以及获取访问令牌的说明,请参阅 Video Intelligence API 快速入门。
在使用下面的请求数据之前,请先进行以下替换:
- "inputContent": base-64-encoded-content
例如:"UklGRg41AwBBVkkgTElTVAwBAABoZHJsYXZpaDgAAAA1ggAAxPMBAAAAAAAQCAA..."
- language-code:[可选]例如 “en-US”
HTTP 方法和网址:
POST https://videointelligence.googleapis.com/v1/videos:annotate
请求 JSON 正文:
{ "inputContent": "base-64-encoded-content", "features": ["TEXT_DETECTION"], "videoContext": { "textDetectionConfig": { "languageHints": ["language-code"] } } }
如需发送您的请求,请展开以下选项之一:
curl(Linux、macOS 或 Cloud Shell)
将请求正文保存在名为 request.json
的文件中,然后执行以下命令:
curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
https://videointelligence.googleapis.com/v1/videos:annotate
PowerShell (Windows)
将请求正文保存在名为 request.json
的文件中,然后执行以下命令:
$cred = gcloud auth application-default print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://videointelligence.googleapis.com/v1/videos:annotate" | Select-Object -Expand Content
您应会收到如下所示的 JSON 响应:
{ "name": "projects/project-number/locations/location-id/operations/operation-id" }
如果响应成功,Video Intelligence API 将返回您的操作的 name
。上面的示例展示了此类响应的示例,其中 project-number
是您的项目名称,operation-id
是为请求创建的长时间运行的操作的 ID。
- operation-id:并在启动操作时在响应中提供,例如
12345...
获取注释结果
要检索操作的结果,请使用从 videos:annotate 调用返回的操作名称发出 GET 请求,如以下示例所示。
HTTP 方法和网址:
GET https://videointelligence.googleapis.com/v1/operation-name
如需发送您的请求,请展开以下选项之一:
curl(Linux、macOS 或 Cloud Shell)
执行以下命令:
curl -X GET \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
https://videointelligence.googleapis.com/v1/operation-name
PowerShell (Windows)
执行以下命令:
$cred = gcloud auth application-default print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://videointelligence.googleapis.com/v1/operation-name" | Select-Object -Expand Content
您应该收到类似以下内容的 JSON 响应:
响应
"textAnnotations": [ { "text": "Hair Salon", "segments": [ { "segment": { "startTimeOffset": "0.833333s", "endTimeOffset": "2.291666s" }, "confidence": 0.99438506, "frames": [ { "rotatedBoundingBox": { "vertices": [ { "x": 0.7015625, "y": 0.59583336 }, { "x": 0.7984375, "y": 0.59583336 }, { "x": 0.7984375, "y": 0.64166665 }, { "x": 0.7015625, "y": 0.64166665 } ] }, "timeOffset": "0.833333s" }, { "rotatedBoundingBox": { "vertices": [ { "x": 0.70234376, "y": 0.6 }, { "x": 0.7992188, "y": 0.6 }, { "x": 0.7992188, "y": 0.6333333 }, { "x": 0.70234376, "y": 0.6333333 } ] }, "timeOffset": "1.041666s" }, { "rotatedBoundingBox": { "vertices": [ { "x": 0.70234376, "y": 0.6 }, { "x": 0.7992188, "y": 0.6 }, { "x": 0.7992188, "y": 0.6333333 }, { "x": 0.70234376, "y": 0.6333333 } ] }, "timeOffset": "1.250s" }, { "rotatedBoundingBox": { "vertices": [ { "x": 0.70234376, "y": 0.6 }, { "x": 0.7992188, "y": 0.6 }, { "x": 0.7992188, "y": 0.6319444 }, { "x": 0.70234376, "y": 0.6319444 } ] }, "timeOffset": "1.458333s" }, { "rotatedBoundingBox": { "vertices": [ { "x": 0.70234376, "y": 0.6 }, { "x": 0.7992188, "y": 0.6 }, { "x": 0.7992188, "y": 0.6333333 }, { "x": 0.70234376, "y": 0.6333333 } ] }, "timeOffset": "1.666666s" }, { "rotatedBoundingBox": { "vertices": [ { "x": 0.70234376, "y": 0.6 }, { "x": 0.7992188, "y": 0.6 }, { "x": 0.7992188, "y": 0.6333333 }, { "x": 0.70234376, "y": 0.6333333 } ] }, "timeOffset": "1.875s" }, { "rotatedBoundingBox": { "vertices": [ { "x": 0.70234376, "y": 0.6 }, { "x": 0.7992188, "y": 0.6 }, { "x": 0.7992188, "y": 0.6333333 }, { "x": 0.70234376, "y": 0.6333333 } ] }, "timeOffset": "2.083333s" }, { "rotatedBoundingBox": { "vertices": [ { "x": 0.70234376, "y": 0.6 }, { "x": 0.7992188, "y": 0.6 }, { "x": 0.7992188, "y": 0.6333333 }, { "x": 0.70234376, "y": 0.6333333 } ] }, "timeOffset": "2.291666s" } ] } ] }, { "text": "\"Sure, give me one second.\"", "segments": [ { "segment": { "startTimeOffset": "10.625s", "endTimeOffset": "13.333333s" }, "confidence": 0.98716676, "frames": [ { "rotatedBoundingBox": { "vertices": [ { "x": 0.60859376, "y": 0.59583336 }, { "x": 0.8952959, "y": 0.5903528 }, { "x": 0.89560676, "y": 0.6417387 }, { "x": 0.60890454, "y": 0.64721924 } ] }, "timeOffset": "10.625s" }, ... ] }
文本检测注释以 textAnnotations
列表的形式返回。注意:仅当值为 True 时,才会返回 done 字段。操作未完成的响应中不包含该字段。
C#
public static object DetectText(string filePath)
{
var client = VideoIntelligenceServiceClient.Create();
var request = new AnnotateVideoRequest
{
InputContent = Google.Protobuf.ByteString.CopyFrom(File.ReadAllBytes(filePath)),
Features = { Feature.TextDetection },
};
Console.WriteLine("\nProcessing video for text detection.");
var op = client.AnnotateVideo(request).PollUntilCompleted();
// Retrieve the first result because only one video was processed.
var annotationResults = op.Result.AnnotationResults[0];
// Get only the first result.
var textAnnotation = annotationResults.TextAnnotations[0];
Console.WriteLine($"\nText: {textAnnotation.Text}");
// Get the first text segment.
var textSegment = textAnnotation.Segments[0];
var startTime = textSegment.Segment.StartTimeOffset;
var endTime = textSegment.Segment.EndTimeOffset;
Console.Write(
$"Start time: {startTime.Seconds + startTime.Nanos / 1e9 }, ");
Console.WriteLine(
$"End time: {endTime.Seconds + endTime.Nanos / 1e9 }");
Console.WriteLine($"Confidence: {textSegment.Confidence}");
// Show the result for the first frame in this segment.
var frame = textSegment.Frames[0];
var timeOffset = frame.TimeOffset;
Console.Write("Time offset for the first frame: ");
Console.WriteLine(timeOffset.Seconds + timeOffset.Nanos * 1e9);
Console.WriteLine("Rotated Bounding Box Vertices:");
foreach (var vertex in frame.RotatedBoundingBox.Vertices)
{
Console.WriteLine(
$"\tVertex x: {vertex.X}, Vertex.y: {vertex.Y}");
}
return 0;
}
Go
import (
"context"
"fmt"
"io"
"io/ioutil"
video "cloud.google.com/go/videointelligence/apiv1"
"github.com/golang/protobuf/ptypes"
videopb "google.golang.org/genproto/googleapis/cloud/videointelligence/v1"
)
// textDetection analyzes a video and extracts the text from the video's audio.
func textDetection(w io.Writer, filename string) error {
// filename := "../testdata/googlework_short.mp4"
ctx := context.Background()
// Creates a client.
client, err := video.NewClient(ctx)
if err != nil {
return fmt.Errorf("video.NewClient: %v", err)
}
fileBytes, err := ioutil.ReadFile(filename)
if err != nil {
return fmt.Errorf("ioutil.ReadFile: %v", err)
}
op, err := client.AnnotateVideo(ctx, &videopb.AnnotateVideoRequest{
InputContent: fileBytes,
Features: []videopb.Feature{
videopb.Feature_TEXT_DETECTION,
},
})
if err != nil {
return fmt.Errorf("AnnotateVideo: %v", err)
}
resp, err := op.Wait(ctx)
if err != nil {
return fmt.Errorf("Wait: %v", err)
}
// Only one video was processed, so get the first result.
result := resp.GetAnnotationResults()[0]
for _, annotation := range result.TextAnnotations {
fmt.Fprintf(w, "Text: %q\n", annotation.GetText())
// Get the first text segment.
segment := annotation.GetSegments()[0]
start, _ := ptypes.Duration(segment.GetSegment().GetStartTimeOffset())
end, _ := ptypes.Duration(segment.GetSegment().GetEndTimeOffset())
fmt.Fprintf(w, "\tSegment: %v to %v\n", start, end)
fmt.Fprintf(w, "\tConfidence: %f\n", segment.GetConfidence())
// Show the result for the first frame in this segment.
frame := segment.GetFrames()[0]
seconds := float32(frame.GetTimeOffset().GetSeconds())
nanos := float32(frame.GetTimeOffset().GetNanos())
fmt.Fprintf(w, "\tTime offset of the first frame: %fs\n", seconds+nanos/1e9)
fmt.Fprintf(w, "\tRotated bounding box vertices:\n")
for _, vertex := range frame.GetRotatedBoundingBox().GetVertices() {
fmt.Fprintf(w, "\t\tVertex x=%f, y=%f\n", vertex.GetX(), vertex.GetY())
}
}
return nil
}
Java
/**
* Detect text in a video.
*
* @param filePath the path to the video file to analyze.
*/
public static VideoAnnotationResults detectText(String filePath) throws Exception {
try (VideoIntelligenceServiceClient client = VideoIntelligenceServiceClient.create()) {
// Read file
Path path = Paths.get(filePath);
byte[] data = Files.readAllBytes(path);
// Create the request
AnnotateVideoRequest request =
AnnotateVideoRequest.newBuilder()
.setInputContent(ByteString.copyFrom(data))
.addFeatures(Feature.TEXT_DETECTION)
.build();
// asynchronously perform object tracking on videos
OperationFuture<AnnotateVideoResponse, AnnotateVideoProgress> future =
client.annotateVideoAsync(request);
System.out.println("Waiting for operation to complete...");
// The first result is retrieved because a single video was processed.
AnnotateVideoResponse response = future.get(300, TimeUnit.SECONDS);
VideoAnnotationResults results = response.getAnnotationResults(0);
// Get only the first annotation for demo purposes.
TextAnnotation annotation = results.getTextAnnotations(0);
System.out.println("Text: " + annotation.getText());
// Get the first text segment.
TextSegment textSegment = annotation.getSegments(0);
System.out.println("Confidence: " + textSegment.getConfidence());
// For the text segment display it's time offset
VideoSegment videoSegment = textSegment.getSegment();
Duration startTimeOffset = videoSegment.getStartTimeOffset();
Duration endTimeOffset = videoSegment.getEndTimeOffset();
// Display the offset times in seconds, 1e9 is part of the formula to convert nanos to seconds
System.out.println(
String.format(
"Start time: %.2f", startTimeOffset.getSeconds() + startTimeOffset.getNanos() / 1e9));
System.out.println(
String.format(
"End time: %.2f", endTimeOffset.getSeconds() + endTimeOffset.getNanos() / 1e9));
// Show the first result for the first frame in the segment.
TextFrame textFrame = textSegment.getFrames(0);
Duration timeOffset = textFrame.getTimeOffset();
System.out.println(
String.format(
"Time offset for the first frame: %.2f",
timeOffset.getSeconds() + timeOffset.getNanos() / 1e9));
// Display the rotated bounding box for where the text is on the frame.
System.out.println("Rotated Bounding Box Vertices:");
List<NormalizedVertex> vertices = textFrame.getRotatedBoundingBox().getVerticesList();
for (NormalizedVertex normalizedVertex : vertices) {
System.out.println(
String.format(
"\tVertex.x: %.2f, Vertex.y: %.2f",
normalizedVertex.getX(), normalizedVertex.getY()));
}
return results;
}
}
Node.js
// Imports the Google Cloud Video Intelligence library + Node's fs library
const Video = require('@google-cloud/video-intelligence');
const fs = require('fs');
const util = require('util');
// Creates a client
const video = new Video.VideoIntelligenceServiceClient();
/**
* TODO(developer): Uncomment the following line before running the sample.
*/
// const path = 'Local file to analyze, e.g. ./my-file.mp4';
// Reads a local video file and converts it to base64
const file = await util.promisify(fs.readFile)(path);
const inputContent = file.toString('base64');
const request = {
inputContent: inputContent,
features: ['TEXT_DETECTION'],
};
// Detects text in a video
const [operation] = await video.annotateVideo(request);
const results = await operation.promise();
console.log('Waiting for operation to complete...');
// Gets annotations for video
const textAnnotations = results[0].annotationResults[0].textAnnotations;
textAnnotations.forEach(textAnnotation => {
console.log(`Text ${textAnnotation.text} occurs at:`);
textAnnotation.segments.forEach(segment => {
const time = segment.segment;
if (time.startTimeOffset.seconds === undefined) {
time.startTimeOffset.seconds = 0;
}
if (time.startTimeOffset.nanos === undefined) {
time.startTimeOffset.nanos = 0;
}
if (time.endTimeOffset.seconds === undefined) {
time.endTimeOffset.seconds = 0;
}
if (time.endTimeOffset.nanos === undefined) {
time.endTimeOffset.nanos = 0;
}
console.log(
`\tStart: ${time.startTimeOffset.seconds || 0}` +
`.${(time.startTimeOffset.nanos / 1e6).toFixed(0)}s`
);
console.log(
`\tEnd: ${time.endTimeOffset.seconds || 0}.` +
`${(time.endTimeOffset.nanos / 1e6).toFixed(0)}s`
);
console.log(`\tConfidence: ${segment.confidence}`);
segment.frames.forEach(frame => {
const timeOffset = frame.timeOffset;
console.log(
`Time offset for the frame: ${timeOffset.seconds || 0}` +
`.${(timeOffset.nanos / 1e6).toFixed(0)}s`
);
console.log('Rotated Bounding Box Vertices:');
frame.rotatedBoundingBox.vertices.forEach(vertex => {
console.log(`Vertex.x:${vertex.x}, Vertex.y:${vertex.y}`);
});
});
});
});
PHP
use Google\Cloud\VideoIntelligence\V1\VideoIntelligenceServiceClient;
use Google\Cloud\VideoIntelligence\V1\Feature;
/** Uncomment and populate these variables in your code */
// $path = 'File path to a video file to analyze';
// $options = [];
# Instantiate a client.
$video = new VideoIntelligenceServiceClient();
# Read the local video file
$inputContent = file_get_contents($path);
# Execute a request.
$operation = $video->annotateVideo([
'inputContent' => $inputContent,
'features' => [Feature::TEXT_DETECTION]
]);
# Wait for the request to complete.
$operation->pollUntilComplete($options);
# Print the results.
if ($operation->operationSucceeded()) {
$results = $operation->getResult()->getAnnotationResults()[0];
# Process video/segment level label annotations
foreach ($results->getTextAnnotations() as $text) {
printf('Video text description: %s' . PHP_EOL, $text->getText());
foreach ($text->getSegments() as $segment) {
$start = $segment->getSegment()->getStartTimeOffset();
$end = $segment->getSegment()->getEndTimeOffset();
printf(' Segment: %ss to %ss' . PHP_EOL,
$start->getSeconds() + $start->getNanos()/1000000000.0,
$end->getSeconds() + $end->getNanos()/1000000000.0);
printf(' Confidence: %f' . PHP_EOL, $segment->getConfidence());
}
}
print(PHP_EOL);
} else {
print_r($operation->getError());
}
Python
"""Detect text in a local video."""
from google.cloud import videointelligence
video_client = videointelligence.VideoIntelligenceServiceClient()
features = [videointelligence.Feature.TEXT_DETECTION]
video_context = videointelligence.VideoContext()
with io.open(path, "rb") as file:
input_content = file.read()
operation = video_client.annotate_video(
request={
"features": features,
"input_content": input_content,
"video_context": video_context,
}
)
print("\nProcessing video for text detection.")
result = operation.result(timeout=300)
# The first result is retrieved because a single video was processed.
annotation_result = result.annotation_results[0]
for text_annotation in annotation_result.text_annotations:
print("\nText: {}".format(text_annotation.text))
# Get the first text segment
text_segment = text_annotation.segments[0]
start_time = text_segment.segment.start_time_offset
end_time = text_segment.segment.end_time_offset
print(
"start_time: {}, end_time: {}".format(
start_time.seconds + start_time.microseconds * 1e-6,
end_time.seconds + end_time.microseconds * 1e-6,
)
)
print("Confidence: {}".format(text_segment.confidence))
# Show the result for the first frame in this segment.
frame = text_segment.frames[0]
time_offset = frame.time_offset
print(
"Time offset for the first frame: {}".format(
time_offset.seconds + time_offset.microseconds * 1e-6
)
)
print("Rotated Bounding Box Vertices:")
for vertex in frame.rotated_bounding_box.vertices:
print("\tVertex.x: {}, Vertex.y: {}".format(vertex.x, vertex.y))
Ruby
# "Path to a local video file: path/to/file.mp4"
require "google/cloud/video_intelligence"
video = Google::Cloud::VideoIntelligence.video_intelligence_service
video_contents = File.binread path
# Register a callback during the method call
operation = video.annotate_video features: [:TEXT_DETECTION], input_content: video_contents
puts "Processing video for text detection:"
operation.wait_until_done!
raise operation.results.message? if operation.error?
puts "Finished Processing."
text_annotations = operation.results.annotation_results.first.text_annotations
print_text_annotations text_annotations