Method: projects.agent.sessions.detectIntent

Processes a natural language query and returns structured, actionable data as a result. This method is not idempotent, because it may cause contexts and session entity types to be updated, which in turn might affect results of future queries.

HTTP request

POST https://dialogflow.googleapis.com/v2/{session=projects/*/agent/sessions/*}:detectIntent

The URL uses gRPC Transcoding syntax.

Path parameters

Parameters
session

string

Required. The name of the session this query is sent to. Format: projects/<Project ID>/agent/sessions/<Session ID>. It's up to the API caller to choose an appropriate session ID. It can be a random number or some type of user identifier (preferably hashed). The length of the session ID must not exceed 36 bytes.

Authorization requires the following Google IAM permission on the specified resource session:

  • dialogflow.sessions.detectIntent

Request body

The request body contains data with the following structure:

JSON representation
{
  "queryParams": {
    object(QueryParameters)
  },
  "queryInput": {
    object(QueryInput)
  },
  "outputAudioConfig": {
    object(OutputAudioConfig)
  },
  "inputAudio": string
}
Fields
queryParams

object(QueryParameters)

Optional. The parameters of this query.

queryInput

object(QueryInput)

Required. The input specification. It can be set to:

  1. an audio config which instructs the speech recognizer how to process the speech audio,

  2. a conversational query in the form of text, or

  3. an event that specifies which intent to trigger.

outputAudioConfig

object(OutputAudioConfig)

Optional. Instructs the speech synthesizer how to generate the output audio. If this field is not set and agent-level speech synthesizer is not configured, no output audio is generated.

inputAudio

string (bytes format)

Optional. The natural language speech audio to be processed. This field should be populated iff queryInput is set to an input audio config. A single request can contain up to 1 minute of speech audio data.

A base64-encoded string.

Response body

If successful, the response body contains data with the following structure:

The message returned from the sessions.detectIntent method.

JSON representation
{
  "responseId": string,
  "queryResult": {
    object(QueryResult)
  },
  "webhookStatus": {
    object(Status)
  },
  "outputAudio": string,
  "outputAudioConfig": {
    object(OutputAudioConfig)
  }
}
Fields
responseId

string

The unique identifier of the response. It can be used to locate a response in the training example set or for reporting issues.

queryResult

object(QueryResult)

The selected results of the conversational query or event processing. See alternativeQueryResults for additional potential results.

webhookStatus

object(Status)

Specifies the status of the webhook request.

outputAudio

string (bytes format)

The audio data bytes encoded as specified in the request. Note: The output audio is generated based on the values of default platform text responses found in the queryResult.fulfillment_messages field. If multiple default text responses exist, they will be concatenated when generating audio. If no default platform text responses exist, the generated audio content will be empty.

A base64-encoded string.

outputAudioConfig

object(OutputAudioConfig)

The config used by the speech synthesizer to generate the output audio.

Authorization Scopes

Requires one of the following OAuth scopes:

  • https://www.googleapis.com/auth/cloud-platform
  • https://www.googleapis.com/auth/dialogflow

For more information, see the Authentication Overview.

QueryParameters

Represents the parameters of the conversational query.

JSON representation
{
  "timeZone": string,
  "geoLocation": {
    object(LatLng)
  },
  "contexts": [
    {
      object(Context)
    }
  ],
  "resetContexts": boolean,
  "sessionEntityTypes": [
    {
      object(SessionEntityType)
    }
  ],
  "payload": {
    object
  },
  "sentimentAnalysisRequestConfig": {
    object(SentimentAnalysisRequestConfig)
  }
}
Fields
timeZone

string

Optional. The time zone of this conversational query from the time zone database, e.g., America/New_York, Europe/Paris. If not provided, the time zone specified in agent settings is used.

geoLocation

object(LatLng)

Optional. The geo location of this conversational query.

contexts[]

object(Context)

Optional. The collection of contexts to be activated before this query is executed.

resetContexts

boolean

Optional. Specifies whether to delete all contexts in the current session before the new ones are activated.

sessionEntityTypes[]

object(SessionEntityType)

Optional. Additional session entity types to replace or extend developer entity types with. The entity synonyms apply to all languages and persist for the session of this query.

payload

object (Struct format)

Optional. This field can be used to pass custom data into the webhook associated with the agent. Arbitrary JSON objects are supported.

sentimentAnalysisRequestConfig

object(SentimentAnalysisRequestConfig)

Optional. Configures the type of sentiment analysis to perform. If not provided, sentiment analysis is not performed.

LatLng

An object representing a latitude/longitude pair. This is expressed as a pair of doubles representing degrees latitude and degrees longitude. Unless specified otherwise, this must conform to the WGS84 standard. Values must be within normalized ranges.

JSON representation
{
  "latitude": number,
  "longitude": number
}
Fields
latitude

number

The latitude in degrees. It must be in the range [-90.0, +90.0].

longitude

number

The longitude in degrees. It must be in the range [-180.0, +180.0].

SentimentAnalysisRequestConfig

Configures the types of sentiment analysis to perform.

JSON representation
{
  "analyzeQueryTextSentiment": boolean
}
Fields
analyzeQueryTextSentiment

boolean

Optional. Instructs the service to perform sentiment analysis on queryText. If not provided, sentiment analysis is not performed on queryText.

QueryInput

Represents the query input. It can contain either:

  1. An audio config which instructs the speech recognizer how to process the speech audio.

  2. A conversational query in the form of text,.

  3. An event that specifies which intent to trigger.

JSON representation
{

  // Union field input can be only one of the following:
  "audioConfig": {
    object(InputAudioConfig)
  },
  "text": {
    object(TextInput)
  },
  "event": {
    object(EventInput)
  }
  // End of list of possible types for union field input.
}
Fields
Union field input. Required. The input specification. input can be only one of the following:
audioConfig

object(InputAudioConfig)

Instructs the speech recognizer how to process the speech audio.

text

object(TextInput)

The natural language text to be processed.

event

object(EventInput)

The event to be processed.

InputAudioConfig

Instructs the speech recognizer how to process the audio content.

JSON representation
{
  "audioEncoding": enum(AudioEncoding),
  "sampleRateHertz": number,
  "languageCode": string,
  "phraseHints": [
    string
  ]
}
Fields
audioEncoding

enum(AudioEncoding)

Required. Audio encoding of the audio content to process.

sampleRateHertz

number

Required. Sample rate (in Hertz) of the audio content sent in the query. Refer to Cloud Speech API documentation for more details.

languageCode

string

Required. The language of the supplied audio. Dialogflow does not do translations. See Language Support for a list of the currently supported language codes. Note that queries in the same session do not necessarily need to specify the same language.

phraseHints[]

string

Optional. The collection of phrase hints which are used to boost accuracy of speech recognition. Refer to Cloud Speech API documentation for more details.

AudioEncoding

Audio encoding of the audio content sent in the conversational query request. Refer to the Cloud Speech API documentation for more details.

Enums
AUDIO_ENCODING_UNSPECIFIED Not specified.
AUDIO_ENCODING_LINEAR_16 Uncompressed 16-bit signed little-endian samples (Linear PCM).
AUDIO_ENCODING_FLAC FLAC (Free Lossless Audio Codec) is the recommended encoding because it is lossless (therefore recognition is not compromised) and requires only about half the bandwidth of LINEAR16. FLAC stream encoding supports 16-bit and 24-bit samples, however, not all fields in STREAMINFO are supported.
AUDIO_ENCODING_MULAW 8-bit samples that compand 14-bit audio samples using G.711 PCMU/mu-law.
AUDIO_ENCODING_AMR Adaptive Multi-Rate Narrowband codec. sampleRateHertz must be 8000.
AUDIO_ENCODING_AMR_WB Adaptive Multi-Rate Wideband codec. sampleRateHertz must be 16000.
AUDIO_ENCODING_OGG_OPUS Opus encoded audio frames in Ogg container (OggOpus). sampleRateHertz must be 16000.
AUDIO_ENCODING_SPEEX_WITH_HEADER_BYTE Although the use of lossy encodings is not recommended, if a very low bitrate encoding is required, OGG_OPUS is highly preferred over Speex encoding. The Speex encoding supported by Dialogflow API has a header byte in each block, as in MIME type audio/x-speex-with-header-byte. It is a variant of the RTP Speex encoding defined in RFC 5574. The stream is a sequence of blocks, one block per RTP packet. Each block starts with a byte containing the length of the block, in bytes, followed by one or more frames of Speex data, padded to an integral number of bytes (octets) as specified in RFC 5574. In other words, each RTP header is replaced with a single byte containing the block length. Only Speex wideband is supported. sampleRateHertz must be 16000.

TextInput

Represents the natural language text to be processed.

JSON representation
{
  "text": string,
  "languageCode": string
}
Fields
text

string

Required. The UTF-8 encoded natural language text to be processed. Text length must not exceed 256 characters.

languageCode

string

Required. The language of this conversational query. See Language Support for a list of the currently supported language codes. Note that queries in the same session do not necessarily need to specify the same language.

EventInput

Events allow for matching intents by event name instead of the natural language input. For instance, input <event: { name: "welcome_event", parameters: { name: "Sam" } }> can trigger a personalized welcome response. The parameter name may be used by the agent in the response: "Hello #welcome_event.name! What can I do for you today?".

JSON representation
{
  "name": string,
  "parameters": {
    object
  },
  "languageCode": string
}
Fields
name

string

Required. The unique identifier of the event.

parameters

object (Struct format)

Optional. The collection of parameters associated with the event.

languageCode

string

Required. The language of this query. See Language Support for a list of the currently supported language codes. Note that queries in the same session do not necessarily need to specify the same language.

OutputAudioConfig

Instructs the speech synthesizer how to generate the output audio content.

JSON representation
{
  "audioEncoding": enum(OutputAudioEncoding),
  "sampleRateHertz": number,
  "synthesizeSpeechConfig": {
    object(SynthesizeSpeechConfig)
  }
}
Fields
audioEncoding

enum(OutputAudioEncoding)

Required. Audio encoding of the synthesized audio content.

sampleRateHertz

number

Optional. The synthesis sample rate (in hertz) for this audio. If not provided, then the synthesizer will use the default sample rate based on the audio encoding. If this is different from the voice's natural sample rate, then the synthesizer will honor this request by converting to the desired sample rate (which might result in worse audio quality).

synthesizeSpeechConfig

object(SynthesizeSpeechConfig)

Optional. Configuration of how speech should be synthesized.

OutputAudioEncoding

Audio encoding of the output audio format in Text-To-Speech.

Enums
OUTPUT_AUDIO_ENCODING_UNSPECIFIED Not specified.
OUTPUT_AUDIO_ENCODING_LINEAR_16 Uncompressed 16-bit signed little-endian samples (Linear PCM). Audio content returned as LINEAR16 also contains a WAV header.
OUTPUT_AUDIO_ENCODING_MP3 MP3 audio.
OUTPUT_AUDIO_ENCODING_OGG_OPUS Opus encoded audio wrapped in an ogg container. The result will be a file which can be played natively on Android, and in browsers (at least Chrome and Firefox). The quality of the encoding is considerably higher than MP3 while using approximately the same bitrate.

SynthesizeSpeechConfig

Configuration of how speech should be synthesized.

JSON representation
{
  "speakingRate": number,
  "pitch": number,
  "volumeGainDb": number,
  "effectsProfileId": [
    string
  ],
  "voice": {
    object(VoiceSelectionParams)
  }
}
Fields
speakingRate

number

Optional. Speaking rate/speed, in the range [0.25, 4.0]. 1.0 is the normal native speed supported by the specific voice. 2.0 is twice as fast, and 0.5 is half as fast. If unset(0.0), defaults to the native 1.0 speed. Any other values < 0.25 or > 4.0 will return an error.

pitch

number

Optional. Speaking pitch, in the range [-20.0, 20.0]. 20 means increase 20 semitones from the original pitch. -20 means decrease 20 semitones from the original pitch.

volumeGainDb

number

Optional. Volume gain (in dB) of the normal native volume supported by the specific voice, in the range [-96.0, 16.0]. If unset, or set to a value of 0.0 (dB), will play at normal native signal amplitude. A value of -6.0 (dB) will play at approximately half the amplitude of the normal native signal amplitude. A value of +6.0 (dB) will play at approximately twice the amplitude of the normal native signal amplitude. We strongly recommend not to exceed +10 (dB) as there's usually no effective increase in loudness for any value greater than that.

effectsProfileId[]

string

Optional. An identifier which selects 'audio effects' profiles that are applied on (post synthesized) text to speech. Effects are applied on top of each other in the order they are given.

voice

object(VoiceSelectionParams)

Optional. The desired voice of the synthesized audio.

VoiceSelectionParams

Description of which voice to use for speech synthesis.

JSON representation
{
  "name": string,
  "ssmlGender": enum(SsmlVoiceGender)
}
Fields
name

string

Optional. The name of the voice. If not set, the service will choose a voice based on the other parameters such as languageCode and gender.

ssmlGender

enum(SsmlVoiceGender)

Optional. The preferred gender of the voice. If not set, the service will choose a voice based on the other parameters such as languageCode and name. Note that this is only a preference, not requirement. If a voice of the appropriate gender is not available, the synthesizer should substitute a voice with a different gender rather than failing the request.

SsmlVoiceGender

Gender of the voice as described in SSML voice element.

Enums
SSML_VOICE_GENDER_UNSPECIFIED An unspecified gender, which means that the client doesn't care which gender the selected voice will have.
SSML_VOICE_GENDER_MALE A male voice.
SSML_VOICE_GENDER_FEMALE A female voice.
SSML_VOICE_GENDER_NEUTRAL A gender-neutral voice.

QueryResult

Represents the result of conversational query or event processing.

JSON representation
{
  "queryText": string,
  "languageCode": string,
  "speechRecognitionConfidence": number,
  "action": string,
  "parameters": {
    object
  },
  "allRequiredParamsPresent": boolean,
  "fulfillmentText": string,
  "fulfillmentMessages": [
    {
      object(Message)
    }
  ],
  "webhookSource": string,
  "webhookPayload": {
    object
  },
  "outputContexts": [
    {
      object(Context)
    }
  ],
  "intent": {
    object(Intent)
  },
  "intentDetectionConfidence": number,
  "diagnosticInfo": {
    object
  },
  "sentimentAnalysisResult": {
    object(SentimentAnalysisResult)
  }
}
Fields
queryText

string

The original conversational query text: - If natural language text was provided as input, queryText contains a copy of the input. - If natural language speech audio was provided as input, queryText contains the speech recognition result. If speech recognizer produced multiple alternatives, a particular one is picked. - If an event was provided as input, queryText is not set.

languageCode

string

The language that was triggered during intent detection. See Language Support for a list of the currently supported language codes.

speechRecognitionConfidence

number

The Speech recognition confidence between 0.0 and 1.0. A higher number indicates an estimated greater likelihood that the recognized words are correct. The default of 0.0 is a sentinel value indicating that confidence was not set.

This field is not guaranteed to be accurate or set. In particular this field isn't set for StreamingDetectIntent since the streaming endpoint has separate confidence estimates per portion of the audio in StreamingRecognitionResult.

action

string

The action name from the matched intent.

parameters

object (Struct format)

The collection of extracted parameters.

allRequiredParamsPresent

boolean

This field is set to: - false if the matched intent has required parameters and not all of the required parameter values have been collected. - true if all required parameter values have been collected, or if the matched intent doesn't contain any required parameters.

fulfillmentText

string

The text to be pronounced to the user or shown on the screen. Note: This is a legacy field, fulfillmentMessages should be preferred.

fulfillmentMessages[]

object(Message)

The collection of rich messages to present to the user.

webhookSource

string

If the query was fulfilled by a webhook call, this field is set to the value of the source field returned in the webhook response.

webhookPayload

object (Struct format)

If the query was fulfilled by a webhook call, this field is set to the value of the payload field returned in the webhook response.

outputContexts[]

object(Context)

The collection of output contexts. If applicable, outputContexts.parameters contains entries with name <parameter name>.original containing the original parameter values before the query.

intent

object(Intent)

The intent that matched the conversational query. Some, not all fields are filled in this message, including but not limited to: name, displayName and webhookState.

intentDetectionConfidence

number

The intent detection confidence. Values range from 0.0 (completely uncertain) to 1.0 (completely certain). If there are multiple knowledgeAnswers messages, this value is set to the greatest knowledgeAnswers.match_confidence value in the list.

diagnosticInfo

object (Struct format)

The free-form diagnostic info. For example, this field could contain webhook call latency. The string keys of the Struct's fields map can change without notice.

sentimentAnalysisResult

object(SentimentAnalysisResult)

The sentiment analysis result, which depends on the sentimentAnalysisRequestConfig specified in the request.

SentimentAnalysisResult

The result of sentiment analysis as configured by sentimentAnalysisRequestConfig.

JSON representation
{
  "queryTextSentiment": {
    object(Sentiment)
  }
}
Fields
queryTextSentiment

object(Sentiment)

The sentiment analysis result for queryText.

Sentiment

The sentiment, such as positive/negative feeling or association, for a unit of analysis, such as the query text.

JSON representation
{
  "score": number,
  "magnitude": number
}
Fields
score

number

Sentiment score between -1.0 (negative sentiment) and 1.0 (positive sentiment).

magnitude

number

A non-negative number in the [0, +inf) range, which represents the absolute magnitude of sentiment, regardless of score (positive or negative).

Try it!

Was this page helpful? Let us know how we did:

Send feedback about...

Dialogflow Enterprise Edition