AnnotateVideoResponse

JSON representation
VideoAnnotationResults
- JSON representation
LabelAnnotation
- JSON representation
Entity
- JSON representation
LabelSegment
- JSON representation
VideoSegment
- JSON representation
LabelFrame
- JSON representation
ExplicitContentAnnotation
- JSON representation
ExplicitContentFrame
- JSON representation
SpeechTranscription
- JSON representation
SpeechRecognitionAlternative
- JSON representation
WordInfo
- JSON representation

Video annotation response. Included in the response field of the Operation returned by the operations.get call of the google::longrunning::Operations service.

JSON representation
{ "annotationResults": [ { object(`VideoAnnotationResults`) } ] }

Fields

Fields
`annotationResults[]`	`object(VideoAnnotationResults)` Annotation results for all videos specified in `AnnotateVideoRequest`.

annotationResults[]

object(VideoAnnotationResults)

Annotation results for all videos specified in AnnotateVideoRequest.

VideoAnnotationResults

Annotation results for a single video.

JSON representation

JSON representation
{ "inputUri": string, "segmentLabelAnnotations": [ { object(`LabelAnnotation`) } ], "shotLabelAnnotations": [ { object(`LabelAnnotation`) } ], "frameLabelAnnotations": [ { object(`LabelAnnotation`) } ], "shotAnnotations": [ { object(`VideoSegment`) } ], "explicitAnnotation": { object(`ExplicitContentAnnotation`) }, "speechTranscriptions": [ { object(`SpeechTranscription`) } ], "error": { object(`Status`) } }

{
  "inputUri": string,
  "segmentLabelAnnotations": [
    {
      object(LabelAnnotation)
    }
  ],
  "shotLabelAnnotations": [
    {
      object(LabelAnnotation)
    }
  ],
  "frameLabelAnnotations": [
    {
      object(LabelAnnotation)
    }
  ],
  "shotAnnotations": [
    {
      object(VideoSegment)
    }
  ],
  "explicitAnnotation": {
    object(ExplicitContentAnnotation)
  },
  "speechTranscriptions": [
    {
      object(SpeechTranscription)
    }
  ],
  "error": {
    object(Status)
  }
}

Fields
`inputUri`	`string` Video file location in Google Cloud Storage.
`segmentLabelAnnotations[]`	`object(LabelAnnotation)` Label annotations on video level or user specified segment level. There is exactly one element for each unique label.
`shotLabelAnnotations[]`	`object(LabelAnnotation)` Label annotations on shot level. There is exactly one element for each unique label.
`frameLabelAnnotations[]`	`object(LabelAnnotation)` Label annotations on frame level. There is exactly one element for each unique label.
`shotAnnotations[]`	`object(VideoSegment)` Shot annotations. Each shot is represented as a video segment.
`explicitAnnotation`	`object(ExplicitContentAnnotation)` Explicit content annotation.
`speechTranscriptions[]`	`object(SpeechTranscription)` Speech transcription.
`error`	`object(Status)` If set, indicates an error. Note that for a single `AnnotateVideoRequest` some videos may succeed and some may fail.

LabelAnnotation

Label annotation.

JSON representation
{ "entity": { object(`Entity`) }, "categoryEntities": [ { object(`Entity`) } ], "segments": [ { object(`LabelSegment`) } ], "frames": [ { object(`LabelFrame`) } ] }

Fields
`entity`	`object(Entity)` Detected entity.
`categoryEntities[]`	`object(Entity)` Common categories for the detected entity. E.g. when the label is `Terrier` the category is likely `dog`. And in some cases there might be more than one categories e.g. `Terrier` could also be a `pet`.
`segments[]`	`object(LabelSegment)` All video segments where a label was detected.
`frames[]`	`object(LabelFrame)` All video frames where a label was detected.

Entity

Detected entity from video analysis.

JSON representation
{ "entityId": string, "description": string, "languageCode": string }

Fields

Fields
`entityId`	`string` Opaque entity ID. Some IDs may be available in Google Knowledge Graph Search API.
`description`	`string` Textual description, e.g. `Fixed-gear bicycle`.
`languageCode`	`string` Language code for `description` in BCP-47 format.

entityId

string

Opaque entity ID. Some IDs may be available in Google Knowledge Graph Search API.

description

string

Textual description, e.g. Fixed-gear bicycle.

languageCode

string

Language code for description in BCP-47 format.

LabelSegment

Video segment level annotation results for label detection.

JSON representation
{ "segment": { object(`VideoSegment`) }, "confidence": number }

Fields

Fields
`segment`	`object(VideoSegment)` Video segment where a label was detected.
`confidence`	`number` Confidence that the label is accurate. Range: [0, 1].

segment

object(VideoSegment)

Video segment where a label was detected.

confidence

number

Confidence that the label is accurate. Range: [0, 1].

VideoSegment

Video segment.

JSON representation
{ "startTimeOffset": string, "endTimeOffset": string }

Fields

Fields
`startTimeOffset`	`string (Duration format)` Time-offset, relative to the beginning of the video, corresponding to the start of the segment (inclusive). A duration in seconds with up to nine fractional digits, terminated by '`s`'. Example: `"3.5s"`.
`endTimeOffset`	`string (Duration format)` Time-offset, relative to the beginning of the video, corresponding to the end of the segment (inclusive). A duration in seconds with up to nine fractional digits, terminated by '`s`'. Example: `"3.5s"`.

startTimeOffset

string (Duration format)

Time-offset, relative to the beginning of the video, corresponding to the start of the segment (inclusive).

A duration in seconds with up to nine fractional digits, terminated by 's'. Example: "3.5s".

endTimeOffset

string (Duration format)

Time-offset, relative to the beginning of the video, corresponding to the end of the segment (inclusive).

A duration in seconds with up to nine fractional digits, terminated by 's'. Example: "3.5s".

LabelFrame

Video frame level annotation results for label detection.

JSON representation
{ "timeOffset": string, "confidence": number }

Fields

Fields
`timeOffset`	`string (Duration format)` Time-offset, relative to the beginning of the video, corresponding to the video frame for this location. A duration in seconds with up to nine fractional digits, terminated by '`s`'. Example: `"3.5s"`.
`confidence`	`number` Confidence that the label is accurate. Range: [0, 1].

timeOffset

string (Duration format)

Time-offset, relative to the beginning of the video, corresponding to the video frame for this location.

A duration in seconds with up to nine fractional digits, terminated by 's'. Example: "3.5s".

confidence

number

Confidence that the label is accurate. Range: [0, 1].

ExplicitContentAnnotation

Explicit content annotation (based on per-frame visual signals only). If no explicit content has been detected in a frame, no annotations are present for that frame.

JSON representation
{ "frames": [ { object(`ExplicitContentFrame`) } ] }

Fields

Fields
`frames[]`	`object(ExplicitContentFrame)` All video frames where explicit content was detected.

frames[]

object(ExplicitContentFrame)

All video frames where explicit content was detected.

ExplicitContentFrame

Video frame level annotation results for explicit content.

JSON representation
{ "timeOffset": string, "pornographyLikelihood": enum(`Likelihood`) }

Fields

Fields
`timeOffset`	`string (Duration format)` Time-offset, relative to the beginning of the video, corresponding to the video frame for this location. A duration in seconds with up to nine fractional digits, terminated by '`s`'. Example: `"3.5s"`.
`pornographyLikelihood`	`enum(Likelihood)` Likelihood of the pornography content..

timeOffset

string (Duration format)

Time-offset, relative to the beginning of the video, corresponding to the video frame for this location.

A duration in seconds with up to nine fractional digits, terminated by 's'. Example: "3.5s".

pornographyLikelihood

enum(Likelihood)

Likelihood of the pornography content..

SpeechTranscription

A speech recognition result corresponding to a portion of the audio.

JSON representation
{ "alternatives": [ { object(`SpeechRecognitionAlternative`) } ] }

Fields

Fields
`alternatives[]`	`object(SpeechRecognitionAlternative)` Output only. May contain one or more recognition hypotheses (up to the maximum specified in `maxAlternatives`). These alternatives are ordered in terms of accuracy, with the top (first) alternative being the most probable, as ranked by the recognizer.

alternatives[]

object(SpeechRecognitionAlternative)

Output only. May contain one or more recognition hypotheses (up to the maximum specified in maxAlternatives). These alternatives are ordered in terms of accuracy, with the top (first) alternative being the most probable, as ranked by the recognizer.

SpeechRecognitionAlternative

Alternative hypotheses (a.k.a. n-best list).

JSON representation
{ "transcript": string, "confidence": number, "words": [ { object(`WordInfo`) } ] }

Fields

Fields
`transcript`	`string` Output only. Transcript text representing the words that the user spoke.
`confidence`	`number` Output only. The confidence estimate between 0.0 and 1.0. A higher number indicates an estimated greater likelihood that the recognized words are correct. This field is typically provided only for the top hypothesis, and only for `is_final=true` results. Clients should not rely on the `confidence` field as it is not guaranteed to be accurate or consistent. The default of 0.0 is a sentinel value indicating `confidence` was not set.
`words[]`	`object(WordInfo)` Output only. A list of word-specific information for each recognized word.

transcript

string

Output only. Transcript text representing the words that the user spoke.

confidence

number

Output only. The confidence estimate between 0.0 and 1.0. A higher number indicates an estimated greater likelihood that the recognized words are correct. This field is typically provided only for the top hypothesis, and only for is_final=true results. Clients should not rely on the confidence field as it is not guaranteed to be accurate or consistent. The default of 0.0 is a sentinel value indicating confidence was not set.

words[]

object(WordInfo)

Output only. A list of word-specific information for each recognized word.

WordInfo

Word-specific information for recognized words. Word information is only included in the response when certain request parameters are set, such as enable_word_time_offsets.

JSON representation
{ "startTime": string, "endTime": string, "word": string }

Fields

Fields
`startTime`	`string (Duration format)` Output only. Time offset relative to the beginning of the audio, and corresponding to the start of the spoken word. This field is only set if `enable_word_time_offsets=true` and only in the top hypothesis. This is an experimental feature and the accuracy of the time offset can vary. A duration in seconds with up to nine fractional digits, terminated by '`s`'. Example: `"3.5s"`.
`endTime`	`string (Duration format)` Output only. Time offset relative to the beginning of the audio, and corresponding to the end of the spoken word. This field is only set if `enable_word_time_offsets=true` and only in the top hypothesis. This is an experimental feature and the accuracy of the time offset can vary. A duration in seconds with up to nine fractional digits, terminated by '`s`'. Example: `"3.5s"`.
`word`	`string` Output only. The word corresponding to this set of information.

startTime

string (Duration format)

Output only. Time offset relative to the beginning of the audio, and corresponding to the start of the spoken word. This field is only set if enable_word_time_offsets=true and only in the top hypothesis. This is an experimental feature and the accuracy of the time offset can vary.

A duration in seconds with up to nine fractional digits, terminated by 's'. Example: "3.5s".

endTime

string (Duration format)

Output only. Time offset relative to the beginning of the audio, and corresponding to the end of the spoken word. This field is only set if enable_word_time_offsets=true and only in the top hypothesis. This is an experimental feature and the accuracy of the time offset can vary.

A duration in seconds with up to nine fractional digits, terminated by 's'. Example: "3.5s".

word

string

Output only. The word corresponding to this set of information.