Package com.google.cloud.speech.v1 (2.2.15)

A client to Cloud Speech-to-Text API

The interfaces provided are listed below, along with usage samples.

SpeechClient

Service Description: Service that implements Google Cloud Speech API.

Sample for SpeechClient:


 try (SpeechClient speechClient = SpeechClient.create()) {
   RecognitionConfig config = RecognitionConfig.newBuilder().build();
   RecognitionAudio audio = RecognitionAudio.newBuilder().build();
   RecognizeResponse response = speechClient.recognize(config, audio);
 }
 

Classes

CustomClass

A set of words or phrases that represents a common concept likely to appear in your audio, for example a list of passenger ship names. CustomClass items can be substituted into placeholders that you set in PhraseSet phrases.

Protobuf type google.cloud.speech.v1.CustomClass

CustomClass.Builder

A set of words or phrases that represents a common concept likely to appear in your audio, for example a list of passenger ship names. CustomClass items can be substituted into placeholders that you set in PhraseSet phrases.

Protobuf type google.cloud.speech.v1.CustomClass

CustomClass.ClassItem

An item of the class.

Protobuf type google.cloud.speech.v1.CustomClass.ClassItem

CustomClass.ClassItem.Builder

An item of the class.

Protobuf type google.cloud.speech.v1.CustomClass.ClassItem

LongRunningRecognizeMetadata

Describes the progress of a long-running LongRunningRecognize call. It is included in the metadata field of the Operation returned by the GetOperation call of the google::longrunning::Operations service.

Protobuf type google.cloud.speech.v1.LongRunningRecognizeMetadata

LongRunningRecognizeMetadata.Builder

Describes the progress of a long-running LongRunningRecognize call. It is included in the metadata field of the Operation returned by the GetOperation call of the google::longrunning::Operations service.

Protobuf type google.cloud.speech.v1.LongRunningRecognizeMetadata

LongRunningRecognizeRequest

The top-level message sent by the client for the LongRunningRecognize method.

Protobuf type google.cloud.speech.v1.LongRunningRecognizeRequest

LongRunningRecognizeRequest.Builder

The top-level message sent by the client for the LongRunningRecognize method.

Protobuf type google.cloud.speech.v1.LongRunningRecognizeRequest

LongRunningRecognizeResponse

The only message returned to the client by the LongRunningRecognize method. It contains the result as zero or more sequential SpeechRecognitionResult messages. It is included in the result.response field of the Operation returned by the GetOperation call of the google::longrunning::Operations service.

Protobuf type google.cloud.speech.v1.LongRunningRecognizeResponse

LongRunningRecognizeResponse.Builder

The only message returned to the client by the LongRunningRecognize method. It contains the result as zero or more sequential SpeechRecognitionResult messages. It is included in the result.response field of the Operation returned by the GetOperation call of the google::longrunning::Operations service.

Protobuf type google.cloud.speech.v1.LongRunningRecognizeResponse

PhraseSet

Provides "hints" to the speech recognizer to favor specific words and phrases in the results.

Protobuf type google.cloud.speech.v1.PhraseSet

PhraseSet.Builder

Provides "hints" to the speech recognizer to favor specific words and phrases in the results.

Protobuf type google.cloud.speech.v1.PhraseSet

PhraseSet.Phrase

A phrases containing words and phrase "hints" so that the speech recognition is more likely to recognize them. This can be used to improve the accuracy for specific words and phrases, for example, if specific commands are typically spoken by the user. This can also be used to add additional words to the vocabulary of the recognizer. See usage limits. List items can also include pre-built or custom classes containing groups of words that represent common concepts that occur in natural language. For example, rather than providing a phrase hint for every month of the year (e.g. "i was born in january", "i was born in febuary", ...), use the pre-built $MONTH class improves the likelihood of correctly transcribing audio that includes months (e.g. "i was born in $month"). To refer to pre-built classes, use the class' symbol prepended with $ e.g. $MONTH. To refer to custom classes that were defined inline in the request, set the class's custom_class_id to a string unique to all class resources and inline classes. Then use the class' id wrapped in ${...} e.g. "${my-months}". To refer to custom classes resources, use the class' id wrapped in ${} (e.g. ${my-months}). Speech-to-Text supports three locations: global, us (US North America), and eu (Europe). If you are calling the speech.googleapis.com endpoint, use the global location. To specify a region, use a regional endpoint with matching us or eu location value.

Protobuf type google.cloud.speech.v1.PhraseSet.Phrase

PhraseSet.Phrase.Builder

A phrases containing words and phrase "hints" so that the speech recognition is more likely to recognize them. This can be used to improve the accuracy for specific words and phrases, for example, if specific commands are typically spoken by the user. This can also be used to add additional words to the vocabulary of the recognizer. See usage limits. List items can also include pre-built or custom classes containing groups of words that represent common concepts that occur in natural language. For example, rather than providing a phrase hint for every month of the year (e.g. "i was born in january", "i was born in febuary", ...), use the pre-built $MONTH class improves the likelihood of correctly transcribing audio that includes months (e.g. "i was born in $month"). To refer to pre-built classes, use the class' symbol prepended with $ e.g. $MONTH. To refer to custom classes that were defined inline in the request, set the class's custom_class_id to a string unique to all class resources and inline classes. Then use the class' id wrapped in ${...} e.g. "${my-months}". To refer to custom classes resources, use the class' id wrapped in ${} (e.g. ${my-months}). Speech-to-Text supports three locations: global, us (US North America), and eu (Europe). If you are calling the speech.googleapis.com endpoint, use the global location. To specify a region, use a regional endpoint with matching us or eu location value.

Protobuf type google.cloud.speech.v1.PhraseSet.Phrase

RecognitionAudio

Contains audio data in the encoding specified in the RecognitionConfig. Either content or uri must be supplied. Supplying both or neither returns google.rpc.Code.INVALID_ARGUMENT. See content limits.

Protobuf type google.cloud.speech.v1.RecognitionAudio

RecognitionAudio.Builder

Contains audio data in the encoding specified in the RecognitionConfig. Either content or uri must be supplied. Supplying both or neither returns google.rpc.Code.INVALID_ARGUMENT. See content limits.

Protobuf type google.cloud.speech.v1.RecognitionAudio

RecognitionConfig

Provides information to the recognizer that specifies how to process the request.

Protobuf type google.cloud.speech.v1.RecognitionConfig

RecognitionConfig.Builder

Provides information to the recognizer that specifies how to process the request.

Protobuf type google.cloud.speech.v1.RecognitionConfig

RecognitionMetadata

Description of audio data to be recognized.

Protobuf type google.cloud.speech.v1.RecognitionMetadata

RecognitionMetadata.Builder

Description of audio data to be recognized.

Protobuf type google.cloud.speech.v1.RecognitionMetadata

RecognizeRequest

The top-level message sent by the client for the Recognize method.

Protobuf type google.cloud.speech.v1.RecognizeRequest

RecognizeRequest.Builder

The top-level message sent by the client for the Recognize method.

Protobuf type google.cloud.speech.v1.RecognizeRequest

RecognizeResponse

The only message returned to the client by the Recognize method. It contains the result as zero or more sequential SpeechRecognitionResult messages.

Protobuf type google.cloud.speech.v1.RecognizeResponse

RecognizeResponse.Builder

The only message returned to the client by the Recognize method. It contains the result as zero or more sequential SpeechRecognitionResult messages.

Protobuf type google.cloud.speech.v1.RecognizeResponse

SpeakerDiarizationConfig

Config to enable speaker diarization.

Protobuf type google.cloud.speech.v1.SpeakerDiarizationConfig

SpeakerDiarizationConfig.Builder

Config to enable speaker diarization.

Protobuf type google.cloud.speech.v1.SpeakerDiarizationConfig

SpeechAdaptation

Speech adaptation configuration.

Protobuf type google.cloud.speech.v1.SpeechAdaptation

SpeechAdaptation.Builder

Speech adaptation configuration.

Protobuf type google.cloud.speech.v1.SpeechAdaptation

SpeechClient

Service Description: Service that implements Google Cloud Speech API.

This class provides the ability to make remote calls to the backing service through method calls that map to API methods. Sample code to get started:


 try (SpeechClient speechClient = SpeechClient.create()) {
   RecognitionConfig config = RecognitionConfig.newBuilder().build();
   RecognitionAudio audio = RecognitionAudio.newBuilder().build();
   RecognizeResponse response = speechClient.recognize(config, audio);
 }
 

Note: close() needs to be called on the SpeechClient object to clean up resources such as threads. In the example above, try-with-resources is used, which automatically calls close().

The surface of this class includes several types of Java methods for each of the API's methods:

  1. A "flattened" method. With this type of method, the fields of the request type have been converted into function parameters. It may be the case that not all fields are available as parameters, and not every API method will have a flattened method entry point.
  2. A "request object" method. This type of method only takes one parameter, a request object, which must be constructed before the call. Not every API method will have a request object method.
  3. A "callable" method. This type of method takes no parameters and returns an immutable API callable object, which can be used to initiate calls to the service.

See the individual methods for example code.

Many parameters require resource names to be formatted in a particular way. To assist with these names, this class includes a format method for each type of name, and additionally a parse method to extract the individual identifiers contained within names that are returned.

This class can be customized by passing in a custom instance of SpeechSettings to create(). For example:

To customize credentials:


 SpeechSettings speechSettings =
     SpeechSettings.newBuilder()
         .setCredentialsProvider(FixedCredentialsProvider.create(myCredentials))
         .build();
 SpeechClient speechClient = SpeechClient.create(speechSettings);
 

To customize the endpoint:


 SpeechSettings speechSettings = SpeechSettings.newBuilder().setEndpoint(myEndpoint).build();
 SpeechClient speechClient = SpeechClient.create(speechSettings);
 

Please refer to the GitHub repository's samples for more quickstart code snippets.

SpeechContext

Provides "hints" to the speech recognizer to favor specific words and phrases in the results.

Protobuf type google.cloud.speech.v1.SpeechContext

SpeechContext.Builder

Provides "hints" to the speech recognizer to favor specific words and phrases in the results.

Protobuf type google.cloud.speech.v1.SpeechContext

SpeechGrpc

Service that implements Google Cloud Speech API.

SpeechGrpc.SpeechBlockingStub

Service that implements Google Cloud Speech API.

SpeechGrpc.SpeechFutureStub

Service that implements Google Cloud Speech API.

SpeechGrpc.SpeechImplBase

Service that implements Google Cloud Speech API.

SpeechGrpc.SpeechStub

Service that implements Google Cloud Speech API.

SpeechProto

SpeechRecognitionAlternative

Alternative hypotheses (a.k.a. n-best list).

Protobuf type google.cloud.speech.v1.SpeechRecognitionAlternative

SpeechRecognitionAlternative.Builder

Alternative hypotheses (a.k.a. n-best list).

Protobuf type google.cloud.speech.v1.SpeechRecognitionAlternative

SpeechRecognitionResult

A speech recognition result corresponding to a portion of the audio.

Protobuf type google.cloud.speech.v1.SpeechRecognitionResult

SpeechRecognitionResult.Builder

A speech recognition result corresponding to a portion of the audio.

Protobuf type google.cloud.speech.v1.SpeechRecognitionResult

SpeechResourceProto

SpeechSettings

Settings class to configure an instance of SpeechClient.

The default instance has everything set to sensible defaults:

  • The default service address (speech.googleapis.com) and default port (443) are used.
  • Credentials are acquired automatically through Application Default Credentials.
  • Retries are configured for idempotent methods but not for non-idempotent methods.

The builder of this class is recursive, so contained classes are themselves builders. When build() is called, the tree of builders is called to create the complete settings object.

For example, to set the total timeout of recognize to 30 seconds:


 SpeechSettings.Builder speechSettingsBuilder = SpeechSettings.newBuilder();
 speechSettingsBuilder
     .recognizeSettings()
     .setRetrySettings(
         speechSettingsBuilder
             .recognizeSettings()
             .getRetrySettings()
             .toBuilder()
             .setTotalTimeout(Duration.ofSeconds(30))
             .build());
 SpeechSettings speechSettings = speechSettingsBuilder.build();
 

SpeechSettings.Builder

Builder for SpeechSettings.

StreamingRecognitionConfig

Provides information to the recognizer that specifies how to process the request.

Protobuf type google.cloud.speech.v1.StreamingRecognitionConfig

StreamingRecognitionConfig.Builder

Provides information to the recognizer that specifies how to process the request.

Protobuf type google.cloud.speech.v1.StreamingRecognitionConfig

StreamingRecognitionResult

A streaming speech recognition result corresponding to a portion of the audio that is currently being processed.

Protobuf type google.cloud.speech.v1.StreamingRecognitionResult

StreamingRecognitionResult.Builder

A streaming speech recognition result corresponding to a portion of the audio that is currently being processed.

Protobuf type google.cloud.speech.v1.StreamingRecognitionResult

StreamingRecognizeRequest

The top-level message sent by the client for the StreamingRecognize method. Multiple StreamingRecognizeRequest messages are sent. The first message must contain a streaming_config message and must not contain audio_content. All subsequent messages must contain audio_content and must not contain a streaming_config message.

Protobuf type google.cloud.speech.v1.StreamingRecognizeRequest

StreamingRecognizeRequest.Builder

The top-level message sent by the client for the StreamingRecognize method. Multiple StreamingRecognizeRequest messages are sent. The first message must contain a streaming_config message and must not contain audio_content. All subsequent messages must contain audio_content and must not contain a streaming_config message.

Protobuf type google.cloud.speech.v1.StreamingRecognizeRequest

StreamingRecognizeResponse

StreamingRecognizeResponse is the only message returned to the client by StreamingRecognize. A series of zero or more StreamingRecognizeResponse messages are streamed back to the client. If there is no recognizable audio, and single_utterance is set to false, then no messages are streamed back to the client. Here's an example of a series of StreamingRecognizeResponses that might be returned while processing audio:

  1. results { alternatives { transcript: "tube" } stability: 0.01 }
  2. results { alternatives { transcript: "to be a" } stability: 0.01 }
  3. results { alternatives { transcript: "to be" } stability: 0.9 } results { alternatives { transcript: " or not to be" } stability: 0.01 }
  4. results { alternatives { transcript: "to be or not to be" confidence: 0.92 } alternatives { transcript: "to bee or not to bee" } is_final: true }
  5. results { alternatives { transcript: " that's" } stability: 0.01 }
  6. results { alternatives { transcript: " that is" } stability: 0.9 } results { alternatives { transcript: " the question" } stability: 0.01 }
  7. results { alternatives { transcript: " that is the question" confidence: 0.98 } alternatives { transcript: " that was the question" } is_final: true } Notes:
  8. Only two of the above responses #4 and #7 contain final results; they are indicated by is_final: true. Concatenating these together generates the full transcript: "to be or not to be that is the question".
  9. The others contain interim results. #3 and #6 contain two interim results: the first portion has a high stability and is less likely to change; the second portion has a low stability and is very likely to change. A UI designer might choose to show only high stability results.
  10. The specific stability and confidence values shown above are only for illustrative purposes. Actual values may vary.
  11. In each response, only one of these fields will be set: error, speech_event_type, or one or more (repeated) results.

Protobuf type google.cloud.speech.v1.StreamingRecognizeResponse

StreamingRecognizeResponse.Builder

StreamingRecognizeResponse is the only message returned to the client by StreamingRecognize. A series of zero or more StreamingRecognizeResponse messages are streamed back to the client. If there is no recognizable audio, and single_utterance is set to false, then no messages are streamed back to the client. Here's an example of a series of StreamingRecognizeResponses that might be returned while processing audio:

  1. results { alternatives { transcript: "tube" } stability: 0.01 }
  2. results { alternatives { transcript: "to be a" } stability: 0.01 }
  3. results { alternatives { transcript: "to be" } stability: 0.9 } results { alternatives { transcript: " or not to be" } stability: 0.01 }
  4. results { alternatives { transcript: "to be or not to be" confidence: 0.92 } alternatives { transcript: "to bee or not to bee" } is_final: true }
  5. results { alternatives { transcript: " that's" } stability: 0.01 }
  6. results { alternatives { transcript: " that is" } stability: 0.9 } results { alternatives { transcript: " the question" } stability: 0.01 }
  7. results { alternatives { transcript: " that is the question" confidence: 0.98 } alternatives { transcript: " that was the question" } is_final: true } Notes:
  8. Only two of the above responses #4 and #7 contain final results; they are indicated by is_final: true. Concatenating these together generates the full transcript: "to be or not to be that is the question".
  9. The others contain interim results. #3 and #6 contain two interim results: the first portion has a high stability and is less likely to change; the second portion has a low stability and is very likely to change. A UI designer might choose to show only high stability results.
  10. The specific stability and confidence values shown above are only for illustrative purposes. Actual values may vary.
  11. In each response, only one of these fields will be set: error, speech_event_type, or one or more (repeated) results.

Protobuf type google.cloud.speech.v1.StreamingRecognizeResponse

TranscriptOutputConfig

Specifies an optional destination for the recognition results.

Protobuf type google.cloud.speech.v1.TranscriptOutputConfig

TranscriptOutputConfig.Builder

Specifies an optional destination for the recognition results.

Protobuf type google.cloud.speech.v1.TranscriptOutputConfig

WordInfo

Word-specific information for recognized words.

Protobuf type google.cloud.speech.v1.WordInfo

WordInfo.Builder

Word-specific information for recognized words.

Protobuf type google.cloud.speech.v1.WordInfo

Interfaces

CustomClass.ClassItemOrBuilder

CustomClassOrBuilder

LongRunningRecognizeMetadataOrBuilder

LongRunningRecognizeRequestOrBuilder

LongRunningRecognizeResponseOrBuilder

PhraseSet.PhraseOrBuilder

PhraseSetOrBuilder

RecognitionAudioOrBuilder

RecognitionConfigOrBuilder

RecognitionMetadataOrBuilder

RecognizeRequestOrBuilder

RecognizeResponseOrBuilder

SpeakerDiarizationConfigOrBuilder

SpeechAdaptationOrBuilder

SpeechContextOrBuilder

SpeechRecognitionAlternativeOrBuilder

SpeechRecognitionResultOrBuilder

StreamingRecognitionConfigOrBuilder

StreamingRecognitionResultOrBuilder

StreamingRecognizeRequestOrBuilder

StreamingRecognizeResponseOrBuilder

TranscriptOutputConfigOrBuilder

WordInfoOrBuilder

Enums

RecognitionAudio.AudioSourceCase

RecognitionConfig.AudioEncoding

The encoding of the audio data sent in the request. All encodings support only 1 channel (mono) audio, unless the audio_channel_count and enable_separate_recognition_per_channel fields are set. For best results, the audio source should be captured and transmitted using a lossless encoding (FLAC or LINEAR16). The accuracy of the speech recognition can be reduced if lossy codecs are used to capture or transmit audio, particularly if background noise is present. Lossy codecs include MULAW, AMR, AMR_WB, OGG_OPUS, SPEEX_WITH_HEADER_BYTE, MP3, and WEBM_OPUS. The FLAC and WAV audio file formats include a header that describes the included audio content. You can request recognition for WAV files that contain either LINEAR16 or MULAW encoded audio. If you send FLAC or WAV audio file format in your request, you do not need to specify an AudioEncoding; the audio encoding format is determined from the file header. If you specify an AudioEncoding when you send send FLAC or WAV audio, the encoding configuration must match the encoding described in the audio header; otherwise the request returns an google.rpc.Code.INVALID_ARGUMENT error code.

Protobuf enum google.cloud.speech.v1.RecognitionConfig.AudioEncoding

RecognitionMetadata.InteractionType

Use case categories that the audio recognition request can be described by.

Protobuf enum google.cloud.speech.v1.RecognitionMetadata.InteractionType

RecognitionMetadata.MicrophoneDistance

Enumerates the types of capture settings describing an audio file.

Protobuf enum google.cloud.speech.v1.RecognitionMetadata.MicrophoneDistance

RecognitionMetadata.OriginalMediaType

The original media the speech was recorded on.

Protobuf enum google.cloud.speech.v1.RecognitionMetadata.OriginalMediaType

RecognitionMetadata.RecordingDeviceType

The type of device the speech was recorded with.

Protobuf enum google.cloud.speech.v1.RecognitionMetadata.RecordingDeviceType

StreamingRecognizeRequest.StreamingRequestCase

StreamingRecognizeResponse.SpeechEventType

Indicates the type of speech event.

Protobuf enum google.cloud.speech.v1.StreamingRecognizeResponse.SpeechEventType

TranscriptOutputConfig.OutputTypeCase