AnnotateVideoResponse

Video annotation response. Included in the response field of the Operation returned by the operations.get call of the google::longrunning::Operations service.

JSON representation
{
  "annotationResults": [
    {
      object(VideoAnnotationResults)
    }
  ]
}
Fields
annotationResults[]

object(VideoAnnotationResults)

Annotation results for all videos specified in AnnotateVideoRequest.

VideoAnnotationResults

Annotation results for a single video.

JSON representation
{
  "inputUri": string,
  "segmentLabelAnnotations": [
    {
      object(LabelAnnotation)
    }
  ],
  "shotLabelAnnotations": [
    {
      object(LabelAnnotation)
    }
  ],
  "frameLabelAnnotations": [
    {
      object(LabelAnnotation)
    }
  ],
  "shotAnnotations": [
    {
      object(VideoSegment)
    }
  ],
  "explicitAnnotation": {
    object(ExplicitContentAnnotation)
  },
  "speechTranscriptions": [
    {
      object(SpeechTranscription)
    }
  ],
  "error": {
    object(Status)
  }
}
Fields
inputUri

string

Video file location in Google Cloud Storage.

segmentLabelAnnotations[]

object(LabelAnnotation)

Label annotations on video level or user specified segment level. There is exactly one element for each unique label.

shotLabelAnnotations[]

object(LabelAnnotation)

Label annotations on shot level. There is exactly one element for each unique label.

frameLabelAnnotations[]

object(LabelAnnotation)

Label annotations on frame level. There is exactly one element for each unique label.

shotAnnotations[]

object(VideoSegment)

Shot annotations. Each shot is represented as a video segment.

explicitAnnotation

object(ExplicitContentAnnotation)

Explicit content annotation.

speechTranscriptions[]

object(SpeechTranscription)

Speech transcription.

error

object(Status)

If set, indicates an error. Note that for a single AnnotateVideoRequest some videos may succeed and some may fail.

LabelAnnotation

Label annotation.

JSON representation
{
  "entity": {
    object(Entity)
  },
  "categoryEntities": [
    {
      object(Entity)
    }
  ],
  "segments": [
    {
      object(LabelSegment)
    }
  ],
  "frames": [
    {
      object(LabelFrame)
    }
  ]
}
Fields
entity

object(Entity)

Detected entity.

categoryEntities[]

object(Entity)

Common categories for the detected entity. E.g. when the label is Terrier the category is likely dog. And in some cases there might be more than one categories e.g. Terrier could also be a pet.

segments[]

object(LabelSegment)

All video segments where a label was detected.

frames[]

object(LabelFrame)

All video frames where a label was detected.

Entity

Detected entity from video analysis.

JSON representation
{
  "entityId": string,
  "description": string,
  "languageCode": string
}
Fields
entityId

string

Opaque entity ID. Some IDs may be available in Google Knowledge Graph Search API.

description

string

Textual description, e.g. Fixed-gear bicycle.

languageCode

string

Language code for description in BCP-47 format.

LabelSegment

Video segment level annotation results for label detection.

JSON representation
{
  "segment": {
    object(VideoSegment)
  },
  "confidence": number
}
Fields
segment

object(VideoSegment)

Video segment where a label was detected.

confidence

number

Confidence that the label is accurate. Range: [0, 1].

VideoSegment

Video segment.

JSON representation
{
  "startTimeOffset": string,
  "endTimeOffset": string
}
Fields
startTimeOffset

string (Duration format)

Time-offset, relative to the beginning of the video, corresponding to the start of the segment (inclusive).

A duration in seconds with up to nine fractional digits, terminated by 's'. Example: "3.5s".

endTimeOffset

string (Duration format)

Time-offset, relative to the beginning of the video, corresponding to the end of the segment (inclusive).

A duration in seconds with up to nine fractional digits, terminated by 's'. Example: "3.5s".

LabelFrame

Video frame level annotation results for label detection.

JSON representation
{
  "timeOffset": string,
  "confidence": number
}
Fields
timeOffset

string (Duration format)

Time-offset, relative to the beginning of the video, corresponding to the video frame for this location.

A duration in seconds with up to nine fractional digits, terminated by 's'. Example: "3.5s".

confidence

number

Confidence that the label is accurate. Range: [0, 1].

ExplicitContentAnnotation

Explicit content annotation (based on per-frame visual signals only). If no explicit content has been detected in a frame, no annotations are present for that frame.

JSON representation
{
  "frames": [
    {
      object(ExplicitContentFrame)
    }
  ]
}
Fields
frames[]

object(ExplicitContentFrame)

All video frames where explicit content was detected.

ExplicitContentFrame

Video frame level annotation results for explicit content.

JSON representation
{
  "timeOffset": string,
  "pornographyLikelihood": enum(Likelihood)
}
Fields
timeOffset

string (Duration format)

Time-offset, relative to the beginning of the video, corresponding to the video frame for this location.

A duration in seconds with up to nine fractional digits, terminated by 's'. Example: "3.5s".

pornographyLikelihood

enum(Likelihood)

Likelihood of the pornography content..

SpeechTranscription

A speech recognition result corresponding to a portion of the audio.

JSON representation
{
  "alternatives": [
    {
      object(SpeechRecognitionAlternative)
    }
  ]
}
Fields
alternatives[]

object(SpeechRecognitionAlternative)

Output only. May contain one or more recognition hypotheses (up to the maximum specified in maxAlternatives). These alternatives are ordered in terms of accuracy, with the top (first) alternative being the most probable, as ranked by the recognizer.

SpeechRecognitionAlternative

Alternative hypotheses (a.k.a. n-best list).

JSON representation
{
  "transcript": string,
  "confidence": number,
  "words": [
    {
      object(WordInfo)
    }
  ]
}
Fields
transcript

string

Output only. Transcript text representing the words that the user spoke.

confidence

number

Output only. The confidence estimate between 0.0 and 1.0. A higher number indicates an estimated greater likelihood that the recognized words are correct. This field is typically provided only for the top hypothesis, and only for is_final=true results. Clients should not rely on the confidence field as it is not guaranteed to be accurate or consistent. The default of 0.0 is a sentinel value indicating confidence was not set.

words[]

object(WordInfo)

Output only. A list of word-specific information for each recognized word.

WordInfo

Word-specific information for recognized words. Word information is only included in the response when certain request parameters are set, such as enable_word_time_offsets.

JSON representation
{
  "startTime": string,
  "endTime": string,
  "word": string
}
Fields
startTime

string (Duration format)

Output only. Time offset relative to the beginning of the audio, and corresponding to the start of the spoken word. This field is only set if enable_word_time_offsets=true and only in the top hypothesis. This is an experimental feature and the accuracy of the time offset can vary.

A duration in seconds with up to nine fractional digits, terminated by 's'. Example: "3.5s".

endTime

string (Duration format)

Output only. Time offset relative to the beginning of the audio, and corresponding to the end of the spoken word. This field is only set if enable_word_time_offsets=true and only in the top hypothesis. This is an experimental feature and the accuracy of the time offset can vary.

A duration in seconds with up to nine fractional digits, terminated by 's'. Example: "3.5s".

word

string

Output only. The word corresponding to this set of information.