Try Gemini 1.5 Pro, our most advanced multimodal model in Vertex AI, and see what you can build with a 1M token context window. Try Gemini 1.5 Pro, our most advanced multimodal model in Vertex AI, and see what you can build with a 1M token context window.

Speech models

Dialogflow voice agents use Speech-to-Text for speech recognition, which is included in Dialogflow pricing. Dialogflow automatically selects a speech recognition model for you, but you can optionally specify the model.

Available models

All available models are listed at Speech-to-Text models. Select a model that is best suited to your domain and supports your agent language and speech features.

If a model is not explicitly specified, then Dialogflow auto-selects a model based on the audio configuration in API requests and agent settings.

The following models typically have the best performance:

telephony_short (best for telephony Dialogflow)
telephony (best for Agent Assist) (also good for telephony Dialogflow when advanced timeout-based end of speech sensitivity is enabled)
phone_call (good for Agent Assist and telephony Dialogflow)
latest_short (best for non-telephony Dialogflow)
command_and_search (best for languages where other models are not available)

Specify a model

You can supply the model for an agent, flow, or page with the model selection setting.

You can also supply the model when calling the Sessions.detectIntent or Sessions.streamingDetectIntent methods;

Select a protocol and version for the Session reference:

Protocol	V3	V3beta1
REST	Session resource	Session resource
RPC	Session interface	Session interface
C++	SessionsClient	Not available
C#	SessionsClient	Not available
Go	SessionsClient	Not available
Java	SessionsClient	SessionsClient
Node.js	SessionsClient	SessionsClient
PHP	Not available	Not available
Python	SessionsClient	SessionsClient
Ruby	Not available	Not available

or when configuring the ConversationProfile for Agent Assist. Specifying the model in a detect intent or conversation profile API call will override any model selections applied to the agent, flow, or page, unless you enable the Override request-level speech model setting.

Webhooks

Speech adaptation