Detecta intents en un archivo de audio

En esta página, se muestra cómo enviar la entrada de audio a una solicitud de intent de detección mediante la API. Dialogflow procesa el audio y lo convierte en texto antes de buscar una coincidencia del intent. Esta conversión se conoce como reconocimiento de voz, voz a texto o STT.

Configura el proyecto de GCP y la autenticación

Crea un agente

Importa el archivo de ejemplo al agente

La importación agregará intents y entidades al agente. Si hay intents o entidades con el mismo nombre que los del archivo importado, se reemplazarán los primeros por los segundos.

A fin de importar el archivo, sigue estos pasos:

  1. Descarga el archivo RoomReservation.zip.
  2. Ve a la Consola de Dialogflow.
  3. Selecciona el agente.
  4. Haz clic en el botón configuración (settings) junto al nombre del agente.
  5. Selecciona la pestaña Importar y exportar (Export and Import).
  6. Selecciona Importar desde el archivo ZIP y, luego, importa el archivo ZIP que descargaste.

Detecta intents

LÍNEA DE REST Y CMD

Para detectar el intent, llama el método detectIntent en el recurso session.

Descarga el archivo de entrada de audio de muestra book_a_room.wav, que dice "reservar una habitación". El archivo de audio debe estar codificado en base64 para este ejemplo, por lo que se puede proporcionar en la solicitud JSON a continuación. El siguiente es un ejemplo de Linux:

wget https://raw.githubusercontent.com/GoogleCloudPlatform/java-docs-samples/master/dialogflow/cloud-client/resources/book_a_room.wav
base64 -w 0 book_a_room.wav > book_a_room.b64

Para ver ejemplos en otras plataformas, consulta Contenido de audio codificado en Base64 en la documentación de la API de Cloud Speech-to-Text.

Antes de usar cualquiera de los datos de solicitud a continuación, realiza los siguientes reemplazos:

  • project-id: tu ID del proyecto de GCP
  • audio: contenido de audio codificado en Base64

Método HTTP y URL:

POST https://dialogflow.googleapis.com/v2/projects/project-id/agent/sessions/123456789:detectIntent

Cuerpo JSON de la solicitud:

{
  "queryInput": {
    "audioConfig": {
      "languageCode": "en-US"
    }
  },
  "inputAudio": "audio"
}

Para enviar tu solicitud, expande una de estas opciones:

Deberías recibir una respuesta JSON similar a la que se muestra a continuación:

{
  "responseId": "3c1e5a89-75b9-4c3f-b63d-4b1351dd5e32",
  "queryResult": {
    "queryText": "book a room",
    "action": "room.reservation",
    "parameters": {
      "time": "",
      "date": "",
      "guests": "",
      "duration": "",
      "location": ""
    },
    "fulfillmentText": "I can help with that. Where would you like to reserve a room?",
    "fulfillmentMessages": [
      {
        "text": {
          "text": [
            "I can help with that. Where would you like to reserve a room?"
          ]
        },
        "platform": "FACEBOOK"
      },
      {
        "text": {
          "text": [
            "I can help with that. Where would you like to reserve a room?"
          ]
        }
      }
    ],
    "outputContexts": [
      {
        "name": "projects/project-id/agent/sessions/123456789/contexts/e8f6a63e-73da-4a1a-8bfc-857183f71228_id_dialog_context",
        "lifespanCount": 2,
        "parameters": {
          "date": "",
          "guests": "",
          "duration": "",
          "location.original": "",
          "guests.original": "",
          "location": "",
          "date.original": "",
          "time.original": "",
          "time": "",
          "duration.original": ""
        }
      },
      {
        "name": "projects/project-id/agent/sessions/123456789/contexts/room_reservation_dialog_params_location",
        "lifespanCount": 1,
        "parameters": {
          "date.original": "",
          "time.original": "",
          "time": "",
          "duration.original": "",
          "date": "",
          "guests": "",
          "duration": "",
          "location.original": "",
          "guests.original": "",
          "location": ""
        }
      },
      {
        "name": "projects/project-id/agent/sessions/123456789/contexts/room_reservation_dialog_context",
        "lifespanCount": 2,
        "parameters": {
          "time.original": "",
          "time": "",
          "duration.original": "",
          "date": "",
          "guests": "",
          "duration": "",
          "location.original": "",
          "guests.original": "",
          "location": "",
          "date.original": ""
        }
      }
    ],
    "intent": {
      "name": "projects/project-id/agent/intents/e8f6a63e-73da-4a1a-8bfc-857183f71228",
      "displayName": "room.reservation"
    },
    "intentDetectionConfidence": 1,
    "diagnosticInfo": {},
    "languageCode": "en-us"
  }
}

Ten en cuenta que el valor del campo queryResult.action es "room.reservation" y el valor del campo queryResult.fulfillmentMessages[0|1].text.text[0] solicita más información al usuario.

Go

Esta muestra utiliza SessionsClient para detectar el intent.

func DetectIntentAudio(projectID, sessionID, audioFile, languageCode string) (string, error) {
	ctx := context.Background()

	sessionClient, err := dialogflow.NewSessionsClient(ctx)
	if err != nil {
		return "", err
	}
	defer sessionClient.Close()

	if projectID == "" || sessionID == "" {
		return "", errors.New(fmt.Sprintf("Received empty project (%s) or session (%s)", projectID, sessionID))
	}

	sessionPath := fmt.Sprintf("projects/%s/agent/sessions/%s", projectID, sessionID)

	// In this example, we hard code the encoding and sample rate for simplicity.
	audioConfig := dialogflowpb.InputAudioConfig{AudioEncoding: dialogflowpb.AudioEncoding_AUDIO_ENCODING_LINEAR_16, SampleRateHertz: 16000, LanguageCode: languageCode}

	queryAudioInput := dialogflowpb.QueryInput_AudioConfig{AudioConfig: &audioConfig}

	audioBytes, err := ioutil.ReadFile(audioFile)
	if err != nil {
		return "", err
	}

	queryInput := dialogflowpb.QueryInput{Input: &queryAudioInput}
	request := dialogflowpb.DetectIntentRequest{Session: sessionPath, QueryInput: &queryInput, InputAudio: audioBytes}

	response, err := sessionClient.DetectIntent(ctx, &request)
	if err != nil {
		return "", err
	}

	queryResult := response.GetQueryResult()
	fulfillmentText := queryResult.GetFulfillmentText()
	return fulfillmentText, nil
}

Java

Esta muestra utiliza SessionsClient para detectar el intent.

/**
 * Returns the result of detect intent with an audio file as input.
 *
 * Using the same `session_id` between requests allows continuation of the conversation.
 *
 * @param projectId     Project/Agent Id.
 * @param audioFilePath Path to the audio file.
 * @param sessionId     Identifier of the DetectIntent session.
 * @param languageCode  Language code of the query.
 * @return QueryResult for the request.
 */
public static QueryResult detectIntentAudio(
    String projectId,
    String audioFilePath,
    String sessionId,
    String languageCode)
    throws Exception {
  // Instantiates a client
  try (SessionsClient sessionsClient = SessionsClient.create()) {
    // Set the session name using the sessionId (UUID) and projectID (my-project-id)
    SessionName session = SessionName.of(projectId, sessionId);
    System.out.println("Session Path: " + session.toString());

    // Note: hard coding audioEncoding and sampleRateHertz for simplicity.
    // Audio encoding of the audio content sent in the query request.
    AudioEncoding audioEncoding = AudioEncoding.AUDIO_ENCODING_LINEAR_16;
    int sampleRateHertz = 16000;

    // Instructs the speech recognizer how to process the audio content.
    InputAudioConfig inputAudioConfig = InputAudioConfig.newBuilder()
        .setAudioEncoding(audioEncoding) // audioEncoding = AudioEncoding.AUDIO_ENCODING_LINEAR_16
        .setLanguageCode(languageCode) // languageCode = "en-US"
        .setSampleRateHertz(sampleRateHertz) // sampleRateHertz = 16000
        .build();

    // Build the query with the InputAudioConfig
    QueryInput queryInput = QueryInput.newBuilder().setAudioConfig(inputAudioConfig).build();

    // Read the bytes from the audio file
    byte[] inputAudio = Files.readAllBytes(Paths.get(audioFilePath));

    // Build the DetectIntentRequest
    DetectIntentRequest request = DetectIntentRequest.newBuilder()
        .setSession(session.toString())
        .setQueryInput(queryInput)
        .setInputAudio(ByteString.copyFrom(inputAudio))
        .build();

    // Performs the detect intent request
    DetectIntentResponse response = sessionsClient.detectIntent(request);

    // Display the query result
    QueryResult queryResult = response.getQueryResult();
    System.out.println("====================");
    System.out.format("Query Text: '%s'\n", queryResult.getQueryText());
    System.out.format("Detected Intent: %s (confidence: %f)\n",
        queryResult.getIntent().getDisplayName(), queryResult.getIntentDetectionConfidence());
    System.out.format("Fulfillment Text: '%s'\n", queryResult.getFulfillmentText());

    return queryResult;
  }
}

Node.js

Esta muestra utiliza SessionsClient para detectar el intent.

// Imports the Dialogflow library
const dialogflow = require('dialogflow');

// Instantiates a session client
const sessionClient = new dialogflow.SessionsClient();

// The path to identify the agent that owns the created intent.
const sessionPath = sessionClient.sessionPath(projectId, sessionId);

// Read the content of the audio file and send it as part of the request.
const readFile = util.promisify(fs.readFile);
const inputAudio = await readFile(filename);
const request = {
  session: sessionPath,
  queryInput: {
    audioConfig: {
      audioEncoding: encoding,
      sampleRateHertz: sampleRateHertz,
      languageCode: languageCode,
    },
  },
  inputAudio: inputAudio,
};

// Recognizes the speech in the audio and detects its intent.
const [response] = await sessionClient.detectIntent(request);

console.log('Detected intent:');
logQueryResult(sessionClient, response.queryResult);

PHP

Esta muestra utiliza SessionsClient para detectar el intent.

namespace Google\Cloud\Samples\Dialogflow;

use Google\Cloud\Dialogflow\V2\SessionsClient;
use Google\Cloud\Dialogflow\V2\AudioEncoding;
use Google\Cloud\Dialogflow\V2\InputAudioConfig;
use Google\Cloud\Dialogflow\V2\QueryInput;

/**
* Returns the result of detect intent with an audio file as input.
* Using the same `session_id` between requests allows continuation
* of the conversation.
*/
function detect_intent_audio($projectId, $path, $sessionId, $languageCode = 'en-US')
{
    // new session
    $sessionsClient = new SessionsClient();
    $session = $sessionsClient->sessionName($projectId, $sessionId ?: uniqid());
    printf('Session path: %s' . PHP_EOL, $session);

    // load audio file
    $inputAudio = file_get_contents($path);

    // hard coding audio_encoding and sample_rate_hertz for simplicity
    $audioConfig = new InputAudioConfig();
    $audioConfig->setAudioEncoding(AudioEncoding::AUDIO_ENCODING_LINEAR_16);
    $audioConfig->setLanguageCode($languageCode);
    $audioConfig->setSampleRateHertz(16000);

    // create query input
    $queryInput = new QueryInput();
    $queryInput->setAudioConfig($audioConfig);

    // get response and relevant info
    $response = $sessionsClient->detectIntent($session, $queryInput, ['inputAudio' => $inputAudio]);
    $queryResult = $response->getQueryResult();
    $queryText = $queryResult->getQueryText();
    $intent = $queryResult->getIntent();
    $displayName = $intent->getDisplayName();
    $confidence = $queryResult->getIntentDetectionConfidence();
    $fulfilmentText = $queryResult->getFulfillmentText();

    // output relevant info
    print(str_repeat("=", 20) . PHP_EOL);
    printf('Query text: %s' . PHP_EOL, $queryText);
    printf('Detected intent: %s (confidence: %f)' . PHP_EOL, $displayName,
        $confidence);
    print(PHP_EOL);
    printf('Fulfilment text: %s' . PHP_EOL, $fulfilmentText);

    $sessionsClient->close();
}

Python

Esta muestra utiliza SessionsClient para detectar el intent.

def detect_intent_audio(project_id, session_id, audio_file_path,
                        language_code):
    """Returns the result of detect intent with an audio file as input.

    Using the same `session_id` between requests allows continuation
    of the conversation."""
    import dialogflow_v2 as dialogflow

    session_client = dialogflow.SessionsClient()

    # Note: hard coding audio_encoding and sample_rate_hertz for simplicity.
    audio_encoding = dialogflow.enums.AudioEncoding.AUDIO_ENCODING_LINEAR_16
    sample_rate_hertz = 16000

    session = session_client.session_path(project_id, session_id)
    print('Session path: {}\n'.format(session))

    with open(audio_file_path, 'rb') as audio_file:
        input_audio = audio_file.read()

    audio_config = dialogflow.types.InputAudioConfig(
        audio_encoding=audio_encoding, language_code=language_code,
        sample_rate_hertz=sample_rate_hertz)
    query_input = dialogflow.types.QueryInput(audio_config=audio_config)

    response = session_client.detect_intent(
        session=session, query_input=query_input,
        input_audio=input_audio)

    print('=' * 20)
    print('Query text: {}'.format(response.query_result.query_text))
    print('Detected intent: {} (confidence: {})\n'.format(
        response.query_result.intent.display_name,
        response.query_result.intent_detection_confidence))
    print('Fulfillment text: {}\n'.format(
        response.query_result.fulfillment_text))

Ruby

Esta muestra utiliza SessionsClient para detectar el intent.

# project_id = "Your Google Cloud project ID"
# session_id = "mysession"
# audio_file_path = "resources/book_a_room.wav"
# language_code = "en-US"

require "google/cloud/dialogflow"

session_client = Google::Cloud::Dialogflow::Sessions.new
session = session_client.class.session_path project_id, session_id
puts "Session path: #{session}"

begin
  audio_file = File.open audio_file_path, "rb"
  input_audio = audio_file.read
ensure
  audio_file.close
end

audio_config = {
  audio_encoding:    :AUDIO_ENCODING_LINEAR_16,
  sample_rate_hertz: 16_000,
  language_code:     language_code
}

query_input = { audio_config: audio_config }

response = session_client.detect_intent session, query_input, input_audio: input_audio
query_result = response.query_result

puts "Query text:        #{query_result.query_text}"
puts "Intent detected:   #{query_result.intent.display_name}"
puts "Intent confidence: #{query_result.intent_detection_confidence}"
puts "Fulfillment text:  #{query_result.fulfillment_text}"

¿Te sirvió esta página? Envíanos tu opinión:

Enviar comentarios sobre…

Documentación de Dialogflow
¿Necesitas ayuda? Visita nuestra página de asistencia.