Detect intent with audio input stream

This page shows how to stream audio input to a detect intent request using the API. Dialogflow processes the audio and converts it to text before attempting an intent match. This conversion is known as audio input, speech recognition, speech-to-text, or STT.

Before you begin

You should do the following before reading this guide:

  1. Read Dialogflow basics.
  2. Perform setup steps.

Create an agent

The steps in this guide make assumptions about your agent, so it's best to start with a new agent. You should delete any existing agent for your project before creating a new one. To delete an existing agent:

  1. Go to the Dialogflow Console.
  2. If requested, sign in to the Dialogflow Console. See Dialogflow console overview for more information.
  3. Select the agent you wish to delete.
  4. Click the settings settings button next to the agent's name.
  5. Scroll down to the bottom of the General settings tab.
  6. Click Delete this agent.
  7. Enter DELETE in the text field.
  8. Click Delete.

To create an agent:

  1. Go to the Dialogflow Console.
  2. If requested, sign in to the Dialogflow Console. See Dialogflow console overview for more information.
  3. Click Create Agent in the left sidebar menu. (If you already have other agents, click the agent name, scroll to the bottom and click Create new agent.)
  4. Enter your agent's name, default language, and default time zone.
  5. If you have already created a project, enter that project. If you want to allow the Dialogflow Console to create the project, select Create a new Google project.
  6. Click the Create button.

Import the example file to your agent

Importing will add intents and entities to your agent. If any existing intents or entities have the same name as those in the imported file, they will be replaced.

To import the file, follow these steps:

  1. Download the file
  2. Go to the Dialogflow Console
  3. Select your agent
  4. Click the settings settings button next to the agent name
  5. Select the Export and Import tab
  6. Select Import From Zip and import the zip file that you downloaded

Streaming basics

The Session type's streamingDetectIntent method returns a bidirectional gRPC streaming object. The available methods for this object vary by language, so see the reference documentation for your client library for details.

The streaming object is used to send and receive data concurrently. Using this object, your client streams audio content to Dialogflow, while concurrently listening for a StreamingDetectIntentResponse.

The streamingDetectIntent method has a single_utterance parameter that affects speech recognition:

  • If false (default), speech recognition does not cease until the client closes the stream.
  • If true, Dialogflow will detect a single spoken utterance in input audio. When Dialogflow detects the audio's voice has stopped or paused, it ceases speech recognition and sends a StreamingDetectIntentResponse with a recognition result of END_OF_SINGLE_UTTERANCE to your client. Any audio sent to Dialogflow on the stream after receipt of END_OF_SINGLE_UTTERANCE is ignored by Dialogflow.

In bidirectional streaming, a client can half-close the stream object to signal to the server that it won't send more data. For example, in Java and Go, this method is called closeSend. It is important to half-close (but not abort) streams in the following situations:

  • Your client has finished sending data.
  • Your client is configured with single_utterance set to true, and it receives a StreamingDetectIntentResponse with a recognition result of END_OF_SINGLE_UTTERANCE.

After closing a stream, your client should start a new request with a new stream as needed.

Streaming detect intent

The following samples use the Session type's streamingDetectIntent method to stream audio.


func DetectIntentStream(projectID, sessionID, audioFile, languageCode string) (string, error) {
	ctx := context.Background()

	sessionClient, err := dialogflow.NewSessionsClient(ctx)
	if err != nil {
		return "", err
	defer sessionClient.Close()

	if projectID == "" || sessionID == "" {
		return "", errors.New(fmt.Sprintf("Received empty project (%s) or session (%s)", projectID, sessionID))

	sessionPath := fmt.Sprintf("projects/%s/agent/sessions/%s", projectID, sessionID)

	// In this example, we hard code the encoding and sample rate for simplicity.
	audioConfig := dialogflowpb.InputAudioConfig{AudioEncoding: dialogflowpb.AudioEncoding_AUDIO_ENCODING_LINEAR_16, SampleRateHertz: 16000, LanguageCode: languageCode}

	queryAudioInput := dialogflowpb.QueryInput_AudioConfig{AudioConfig: &audioConfig}

	queryInput := dialogflowpb.QueryInput{Input: &queryAudioInput}

	streamer, err := sessionClient.StreamingDetectIntent(ctx)
	if err != nil {
		return "", err

	f, err := os.Open(audioFile)
	if err != nil {
		return "", err

	defer f.Close()

	go func() {
		audioBytes := make([]byte, 1024)

		request := dialogflowpb.StreamingDetectIntentRequest{Session: sessionPath, QueryInput: &queryInput}
		err = streamer.Send(&request)
		if err != nil {

		for {
			_, err := f.Read(audioBytes)
			if err == io.EOF {
			if err != nil {

			request = dialogflowpb.StreamingDetectIntentRequest{InputAudio: audioBytes}
			err = streamer.Send(&request)
			if err != nil {

	var queryResult *dialogflowpb.QueryResult

	for {
		response, err := streamer.Recv()
		if err == io.EOF {
		if err != nil {

		recognitionResult := response.GetRecognitionResult()
		transcript := recognitionResult.GetTranscript()
		log.Printf("Recognition transcript: %s\n", transcript)

		queryResult = response.GetQueryResult()

	fulfillmentText := queryResult.GetFulfillmentText()
	return fulfillmentText, nil


// Imports the Google Cloud client library

 * DialogFlow API Detect Intent sample with audio files processes as an audio stream.
class DetectIntentStream {

  static void detectIntentStream(String projectId, String audioFilePath, String sessionId) {
    // String projectId = "YOUR_PROJECT_ID";
    // String audioFilePath = "path_to_your_audio_file";
    // Using the same `sessionId` between requests allows continuation of the conversation.
    // String sessionId = "Identifier of the DetectIntent session";

    // Instantiates a client
    try (SessionsClient sessionsClient = SessionsClient.create()) {
      // Set the session name using the sessionId (UUID) and projectID (my-project-id)
      SessionName session = SessionName.of(projectId, sessionId);

      // Instructs the speech recognizer how to process the audio content.
      // Note: hard coding audioEncoding and sampleRateHertz for simplicity.
      // Audio encoding of the audio content sent in the query request.
      InputAudioConfig inputAudioConfig = InputAudioConfig.newBuilder()
          .setLanguageCode("en-US") // languageCode = "en-US"
          .setSampleRateHertz(16000) // sampleRateHertz = 16000

      // Build the query with the InputAudioConfig
      QueryInput queryInput = QueryInput.newBuilder().setAudioConfig(inputAudioConfig).build();

      // Create the Bidirectional stream
      BidiStream<StreamingDetectIntentRequest, StreamingDetectIntentResponse> bidiStream =

      // The first request must **only** contain the audio configuration:

      try (FileInputStream audioStream = new FileInputStream(audioFilePath)) {
        // Subsequent requests must **only** contain the audio data.
        // Following messages: audio chunks. We just read the file in fixed-size chunks. In reality
        // you would split the user input by time.
        byte[] buffer = new byte[4096];
        int bytes;
        while ((bytes = != -1) {
                  .setInputAudio(ByteString.copyFrom(buffer, 0, bytes))

      // Tell the service you are done sending data

      for (StreamingDetectIntentResponse response : bidiStream) {
        QueryResult queryResult = response.getQueryResult();
        System.out.format("Intent Display Name: %s\n", queryResult.getIntent().getDisplayName());
        System.out.format("Query Text: '%s'\n", queryResult.getQueryText());
        System.out.format("Detected Intent: %s (confidence: %f)\n",
            queryResult.getIntent().getDisplayName(), queryResult.getIntentDetectionConfidence());
        System.out.format("Fulfillment Text: '%s'\n", queryResult.getFulfillmentText());

    } catch (IOException e) {


// Imports the Dialogflow library
const dialogflow = require('dialogflow');

// Instantiates a session client
const sessionClient = new dialogflow.SessionsClient();

// The path to the local file on which to perform speech recognition, e.g.
// /path/to/audio.raw const filename = '/path/to/audio.raw';

// The encoding of the audio file, e.g. 'AUDIO_ENCODING_LINEAR_16'
// const encoding = 'AUDIO_ENCODING_LINEAR_16';

// The sample rate of the audio file in hertz, e.g. 16000
// const sampleRateHertz = 16000;

// The BCP-47 language code to use, e.g. 'en-US'
// const languageCode = 'en-US';
const sessionPath = sessionClient.sessionPath(projectId, sessionId);

const initialStreamRequest = {
  session: sessionPath,
  queryInput: {
    audioConfig: {
      audioEncoding: encoding,
      sampleRateHertz: sampleRateHertz,
      languageCode: languageCode,
    singleUtterance: true,

// Create a stream for the streaming request.
const detectStream = sessionClient
  .on('error', console.error)
  .on('data', data => {
    if (data.recognitionResult) {
        `Intermediate transcript: ${data.recognitionResult.transcript}`
    } else {
      console.log(`Detected intent:`);
      logQueryResult(sessionClient, data.queryResult);

// Write the initial stream request to config for audio input.

// Stream an audio file from disk to the Conversation API, e.g.
// "./resources/audio.raw"
await pump(
  // Format the audio stream into the request format.
  new Transform({
    objectMode: true,
    transform: (obj, _, next) => {
      next(null, {inputAudio: obj});


namespace Google\Cloud\Samples\Dialogflow;

use Google\Cloud\Dialogflow\V2\SessionsClient;
use Google\Cloud\Dialogflow\V2\AudioEncoding;
use Google\Cloud\Dialogflow\V2\InputAudioConfig;
use Google\Cloud\Dialogflow\V2\QueryInput;
use Google\Cloud\Dialogflow\V2\StreamingDetectIntentRequest;

* Returns the result of detect intent with streaming audio as input.
* Using the same `session_id` between requests allows continuation
* of the conversation.
function detect_intent_stream($projectId, $path, $sessionId, $languageCode = 'en-US')
    // need to use gRPC
    if (!defined('Grpc\STATUS_OK')) {
        throw new \Exception('Install the grpc extension ' .
            '(pecl install grpc)');

    // new session
    $sessionsClient = new SessionsClient();
    $session = $sessionsClient->sessionName($projectId, $sessionId ?: uniqid());
    printf('Session path: %s' . PHP_EOL, $session);

    // hard coding audio_encoding and sample_rate_hertz for simplicity
    $audioConfig = new InputAudioConfig();

    // create query input
    $queryInput = new QueryInput();

    // first request contains the configuration
    $request = new StreamingDetectIntentRequest();
    $requests = [$request];

    // we are going to read small chunks of audio data from
    // a local audio file. in practice, these chunks should
    // come from an audio input device.
    $audioStream = fopen($path, 'rb');
    while (true) {
        $chunk = stream_get_contents($audioStream, 4096);
        if (!$chunk) {
        $request = new StreamingDetectIntentRequest();
        $requests[] = $request;

    // intermediate transcript info
    print(PHP_EOL . str_repeat("=", 20) . PHP_EOL);
    $stream = $sessionsClient->streamingDetectIntent();
    foreach ($requests as $request) {
    foreach ($stream->closeWriteAndReadAll() as $response) {
        $recognitionResult = $response->getRecognitionResult();
        if ($recognitionResult) {
            $transcript = $recognitionResult->getTranscript();
            printf('Intermediate transcript: %s' . PHP_EOL, $transcript);

    // get final response and relevant info
    if ($response) {
        print(str_repeat("=", 20) . PHP_EOL);
        $queryResult = $response->getQueryResult();
        $queryText = $queryResult->getQueryText();
        $intent = $queryResult->getIntent();
        $displayName = $intent->getDisplayName();
        $confidence = $queryResult->getIntentDetectionConfidence();
        $fulfilmentText = $queryResult->getFulfillmentText();

        // output relevant info
        printf('Query text: %s' . PHP_EOL, $queryText);
        printf('Detected intent: %s (confidence: %f)' . PHP_EOL, $displayName,
        printf('Fulfilment text: %s' . PHP_EOL, $fulfilmentText);



def detect_intent_stream(project_id, session_id, audio_file_path,
    """Returns the result of detect intent with streaming audio as input.

    Using the same `session_id` between requests allows continuation
    of the conversation."""
    import dialogflow_v2 as dialogflow
    session_client = dialogflow.SessionsClient()

    # Note: hard coding audio_encoding and sample_rate_hertz for simplicity.
    audio_encoding = dialogflow.enums.AudioEncoding.AUDIO_ENCODING_LINEAR_16
    sample_rate_hertz = 16000

    session_path = session_client.session_path(project_id, session_id)
    print('Session path: {}\n'.format(session_path))

    def request_generator(audio_config, audio_file_path):
        query_input = dialogflow.types.QueryInput(audio_config=audio_config)

        # The first request contains the configuration.
        yield dialogflow.types.StreamingDetectIntentRequest(
            session=session_path, query_input=query_input)

        # Here we are reading small chunks of audio data from a local
        # audio file.  In practice these chunks should come from
        # an audio input device.
        with open(audio_file_path, 'rb') as audio_file:
            while True:
                chunk =
                if not chunk:
                # The later requests contains audio data.
                yield dialogflow.types.StreamingDetectIntentRequest(

    audio_config = dialogflow.types.InputAudioConfig(
        audio_encoding=audio_encoding, language_code=language_code,

    requests = request_generator(audio_config, audio_file_path)
    responses = session_client.streaming_detect_intent(requests)

    print('=' * 20)
    for response in responses:
        print('Intermediate transcript: "{}".'.format(

    # Note: The result from the last response is the final transcript along
    # with the detected content.
    query_result = response.query_result

    print('=' * 20)
    print('Query text: {}'.format(query_result.query_text))
    print('Detected intent: {} (confidence: {})\n'.format(
    print('Fulfillment text: {}\n'.format(


# project_id = "Your Google Cloud project ID"
# session_id = "mysession"
# audio_file_path = "resources/book_a_room.wav"
# language_code = "en-US"

require "google/cloud/dialogflow"
require "monitor"

session_client =
session = session_client.class.session_path project_id, session_id
puts "Session path: #{session}"

audio_config = {
  audio_encoding:    :AUDIO_ENCODING_LINEAR_16,
  sample_rate_hertz: 16_000,
  language_code:     language_code
query_input = { audio_config: audio_config }
streaming_config = { session: session, query_input: query_input }

# To signal the main thread when all responses have been processed
completed = false

# Use session_client as the sentinel to signal the end of queue
request_queue = session_client

# The first request needs to be the configuration.
request_queue.push streaming_config

# Consume the queue and process responses in a separate thread do
  session_client.streaming_detect_intent(request_queue.each_item).each do |response|
    if response.recognition_result
      puts "Intermediate transcript: #{response.recognition_result.transcript}\n"
      # the last response has the actual query result
      query_result = response.query_result
      puts "Query text:        #{query_result.query_text}"
      puts "Intent detected:   #{query_result.intent.display_name}"
      puts "Intent confidence: #{query_result.intent_detection_confidence}"
      puts "Fulfillment text:  #{query_result.fulfillment_text}\n"
  completed = true

# While the main thread adds chunks of audio data to the queue
  audio_file = audio_file_path, "rb"
  loop do
    chunk = 4096
    break unless chunk
    request_queue.push input_audio: chunk
    sleep 0.5
  # pushing the sentinel session_client to end the streaming queues
  request_queue.push session_client

# Do not exit the main thread until the processing thread is completed
sleep 1 until completed

Var denne siden nyttig? Si fra hva du synes:

Send tilbakemelding om ...

Dialogflow Documentation
Trenger du hjelp? Gå til brukerstøttesiden vår.