- JSON representation
- VideoAnnotationResults
- LabelAnnotation
- Entity
- LabelSegment
- VideoSegment
- LabelFrame
- ExplicitContentAnnotation
- ExplicitContentFrame
- SpeechTranscription
- SpeechRecognitionAlternative
- WordInfo
Video annotation response. Included in the response
field of the Operation
returned by the operations.get
call of the google::longrunning::Operations
service.
JSON representation | |
---|---|
{
"annotationResults": [
{
object( |
Fields | |
---|---|
annotationResults[] |
Annotation results for all videos specified in |
VideoAnnotationResults
Annotation results for a single video.
JSON representation | |
---|---|
{ "inputUri": string, "segmentLabelAnnotations": [ { object( |
Fields | |
---|---|
inputUri |
Video file location in Google Cloud Storage. |
segmentLabelAnnotations[] |
Label annotations on video level or user specified segment level. There is exactly one element for each unique label. |
shotLabelAnnotations[] |
Label annotations on shot level. There is exactly one element for each unique label. |
frameLabelAnnotations[] |
Label annotations on frame level. There is exactly one element for each unique label. |
shotAnnotations[] |
Shot annotations. Each shot is represented as a video segment. |
explicitAnnotation |
Explicit content annotation. |
speechTranscriptions[] |
Speech transcription. |
error |
If set, indicates an error. Note that for a single |
LabelAnnotation
Label annotation.
JSON representation | |
---|---|
{ "entity": { object( |
Fields | |
---|---|
entity |
Detected entity. |
categoryEntities[] |
Common categories for the detected entity. E.g. when the label is |
segments[] |
All video segments where a label was detected. |
frames[] |
All video frames where a label was detected. |
Entity
Detected entity from video analysis.
JSON representation | |
---|---|
{ "entityId": string, "description": string, "languageCode": string } |
Fields | |
---|---|
entityId |
Opaque entity ID. Some IDs may be available in Google Knowledge Graph Search API. |
description |
Textual description, e.g. |
languageCode |
Language code for |
LabelSegment
Video segment level annotation results for label detection.
JSON representation | |
---|---|
{
"segment": {
object( |
Fields | |
---|---|
segment |
Video segment where a label was detected. |
confidence |
Confidence that the label is accurate. Range: [0, 1]. |
VideoSegment
Video segment.
JSON representation | |
---|---|
{ "startTimeOffset": string, "endTimeOffset": string } |
Fields | |
---|---|
startTimeOffset |
Time-offset, relative to the beginning of the video, corresponding to the start of the segment (inclusive). A duration in seconds with up to nine fractional digits, terminated by ' |
endTimeOffset |
Time-offset, relative to the beginning of the video, corresponding to the end of the segment (inclusive). A duration in seconds with up to nine fractional digits, terminated by ' |
LabelFrame
Video frame level annotation results for label detection.
JSON representation | |
---|---|
{ "timeOffset": string, "confidence": number } |
Fields | |
---|---|
timeOffset |
Time-offset, relative to the beginning of the video, corresponding to the video frame for this location. A duration in seconds with up to nine fractional digits, terminated by ' |
confidence |
Confidence that the label is accurate. Range: [0, 1]. |
ExplicitContentAnnotation
Explicit content annotation (based on per-frame visual signals only). If no explicit content has been detected in a frame, no annotations are present for that frame.
JSON representation | |
---|---|
{
"frames": [
{
object( |
Fields | |
---|---|
frames[] |
All video frames where explicit content was detected. |
ExplicitContentFrame
Video frame level annotation results for explicit content.
JSON representation | |
---|---|
{
"timeOffset": string,
"pornographyLikelihood": enum( |
Fields | |
---|---|
timeOffset |
Time-offset, relative to the beginning of the video, corresponding to the video frame for this location. A duration in seconds with up to nine fractional digits, terminated by ' |
pornographyLikelihood |
Likelihood of the pornography content.. |
SpeechTranscription
A speech recognition result corresponding to a portion of the audio.
JSON representation | |
---|---|
{
"alternatives": [
{
object( |
Fields | |
---|---|
alternatives[] |
Output only. May contain one or more recognition hypotheses (up to the maximum specified in |
SpeechRecognitionAlternative
Alternative hypotheses (a.k.a. n-best list).
JSON representation | |
---|---|
{
"transcript": string,
"confidence": number,
"words": [
{
object( |
Fields | |
---|---|
transcript |
Output only. Transcript text representing the words that the user spoke. |
confidence |
Output only. The confidence estimate between 0.0 and 1.0. A higher number indicates an estimated greater likelihood that the recognized words are correct. This field is typically provided only for the top hypothesis, and only for |
words[] |
Output only. A list of word-specific information for each recognized word. |
WordInfo
Word-specific information for recognized words. Word information is only included in the response when certain request parameters are set, such as enable_word_time_offsets
.
JSON representation | |
---|---|
{ "startTime": string, "endTime": string, "word": string } |
Fields | |
---|---|
startTime |
Output only. Time offset relative to the beginning of the audio, and corresponding to the start of the spoken word. This field is only set if A duration in seconds with up to nine fractional digits, terminated by ' |
endTime |
Output only. Time offset relative to the beginning of the audio, and corresponding to the end of the spoken word. This field is only set if A duration in seconds with up to nine fractional digits, terminated by ' |
word |
Output only. The word corresponding to this set of information. |