- JSON representation
 - VideoAnnotationResults
 - LabelAnnotation
 - Entity
 - LabelSegment
 - VideoSegment
 - LabelFrame
 - ExplicitContentAnnotation
 - ExplicitContentFrame
 - SpeechTranscription
 - SpeechRecognitionAlternative
 - WordInfo
 
Video annotation response. Included in the response field of the Operation returned by the operations.get call of the google::longrunning::Operations service.
| JSON representation | |
|---|---|
{
  "annotationResults": [
    {
      object( | 
            |
| Fields | |
|---|---|
annotationResults[] | 
              
                 
 
                  Annotation results for all videos specified in   | 
            
VideoAnnotationResults
Annotation results for a single video.
| JSON representation | |
|---|---|
{ "inputUri": string, "segmentLabelAnnotations": [ { object(  | 
              |
| Fields | |
|---|---|
inputUri | 
                
                   
 Video file location in Google Cloud Storage.  | 
              
segmentLabelAnnotations[] | 
                
                   
 Label annotations on video level or user specified segment level. There is exactly one element for each unique label.  | 
              
shotLabelAnnotations[] | 
                
                   
 Label annotations on shot level. There is exactly one element for each unique label.  | 
              
frameLabelAnnotations[] | 
                
                   
 Label annotations on frame level. There is exactly one element for each unique label.  | 
              
shotAnnotations[] | 
                
                   
 Shot annotations. Each shot is represented as a video segment.  | 
              
explicitAnnotation | 
                
                   
 Explicit content annotation.  | 
              
speechTranscriptions[] | 
                
                   
 Speech transcription.  | 
              
error | 
                
                   
 
                    If set, indicates an error. Note that for a single   | 
              
LabelAnnotation
Label annotation.
| JSON representation | |
|---|---|
{ "entity": { object(  | 
              |
| Fields | |
|---|---|
entity | 
                
                   
 Detected entity.  | 
              
categoryEntities[] | 
                
                   
 
                    Common categories for the detected entity. E.g. when the label is   | 
              
segments[] | 
                
                   
 All video segments where a label was detected.  | 
              
frames[] | 
                
                   
 All video frames where a label was detected.  | 
              
Entity
Detected entity from video analysis.
| JSON representation | |
|---|---|
{ "entityId": string, "description": string, "languageCode": string }  | 
              |
| Fields | |
|---|---|
entityId | 
                
                   
 Opaque entity ID. Some IDs may be available in Google Knowledge Graph Search API.  | 
              
description | 
                
                   
 
                    Textual description, e.g.   | 
              
languageCode | 
                
                   
 
                    Language code for   | 
              
LabelSegment
Video segment level annotation results for label detection.
| JSON representation | |
|---|---|
{
  "segment": {
    object( | 
              |
| Fields | |
|---|---|
segment | 
                
                   
 Video segment where a label was detected.  | 
              
confidence | 
                
                   
 Confidence that the label is accurate. Range: [0, 1].  | 
              
VideoSegment
Video segment.
| JSON representation | |
|---|---|
{ "startTimeOffset": string, "endTimeOffset": string }  | 
              |
| Fields | |
|---|---|
startTimeOffset | 
                
                   
 
 Time-offset, relative to the beginning of the video, corresponding to the start of the segment (inclusive). A duration in seconds with up to nine fractional digits, terminated by '  | 
              
endTimeOffset | 
                
                   
 
 Time-offset, relative to the beginning of the video, corresponding to the end of the segment (inclusive). A duration in seconds with up to nine fractional digits, terminated by '  | 
              
LabelFrame
Video frame level annotation results for label detection.
| JSON representation | |
|---|---|
{ "timeOffset": string, "confidence": number }  | 
              |
| Fields | |
|---|---|
timeOffset | 
                
                   
 
 Time-offset, relative to the beginning of the video, corresponding to the video frame for this location. A duration in seconds with up to nine fractional digits, terminated by '  | 
              
confidence | 
                
                   
 Confidence that the label is accurate. Range: [0, 1].  | 
              
ExplicitContentAnnotation
Explicit content annotation (based on per-frame visual signals only). If no explicit content has been detected in a frame, no annotations are present for that frame.
| JSON representation | |
|---|---|
{
  "frames": [
    {
      object( | 
              |
| Fields | |
|---|---|
frames[] | 
                
                   
 All video frames where explicit content was detected.  | 
              
ExplicitContentFrame
Video frame level annotation results for explicit content.
| JSON representation | |
|---|---|
{
  "timeOffset": string,
  "pornographyLikelihood": enum( | 
              |
| Fields | |
|---|---|
timeOffset | 
                
                   
 
 Time-offset, relative to the beginning of the video, corresponding to the video frame for this location. A duration in seconds with up to nine fractional digits, terminated by '  | 
              
pornographyLikelihood | 
                
                   
 Likelihood of the pornography content..  | 
              
SpeechTranscription
A speech recognition result corresponding to a portion of the audio.
| JSON representation | |
|---|---|
{
  "alternatives": [
    {
      object( | 
              |
| Fields | |
|---|---|
alternatives[] | 
                
                   
 
                    Output only. May contain one or more recognition hypotheses (up to the maximum specified in   | 
              
SpeechRecognitionAlternative
Alternative hypotheses (a.k.a. n-best list).
| JSON representation | |
|---|---|
{
  "transcript": string,
  "confidence": number,
  "words": [
    {
      object( | 
              |
| Fields | |
|---|---|
transcript | 
                
                   
 Output only. Transcript text representing the words that the user spoke.  | 
              
confidence | 
                
                   
 
                    Output only. The confidence estimate between 0.0 and 1.0. A higher number indicates an estimated greater likelihood that the recognized words are correct. This field is typically provided only for the top hypothesis, and only for   | 
              
words[] | 
                
                   
 Output only. A list of word-specific information for each recognized word.  | 
              
WordInfo
Word-specific information for recognized words. Word information is only included in the response when certain request parameters are set, such as enable_word_time_offsets.
| JSON representation | |
|---|---|
{ "startTime": string, "endTime": string, "word": string }  | 
              |
| Fields | |
|---|---|
startTime | 
                
                   
 
 Output only. Time offset relative to the beginning of the audio, and corresponding to the start of the spoken word. This field is only set if  A duration in seconds with up to nine fractional digits, terminated by '  | 
              
endTime | 
                
                   
 
 Output only. Time offset relative to the beginning of the audio, and corresponding to the end of the spoken word. This field is only set if  A duration in seconds with up to nine fractional digits, terminated by '  | 
              
word | 
                
                   
 Output only. The word corresponding to this set of information.  |