利用音訊輸入檔案偵測意圖

本指南說明如何使用 API,將音訊輸入內容傳送至偵測意圖要求。Dialogflow 會處理音訊並將其轉換為文字,接著再嘗試比對意圖。這項轉換作業稱為「音訊輸入」、「語音辨識」、「語音轉文字」或「STT」

事前準備

這項功能僅適用於使用 API 進行使用者互動的情況。如果您使用的是整合項目,可以略過本指南。

閱讀本指南之前,請先完成下列工作:

  1. 詳閱 Dialogflow 基本概念
  2. 執行設定步驟

建立虛擬服務專員

如果尚未建立代理程式,請立即建立:

  1. 前往 Dialogflow ES 主控台
  2. 按照系統要求登入 Dialogflow 主控台。詳情請參閱 Dialogflow 主控台總覽
  3. 按一下左側欄選單中的 [Create Agent] (建立代理程式)。(如果您已有其他代理程式,請按一下代理程式名稱然後捲動至底部,再按一下 [Create new agent] (建立新代理程式)。)
  4. 輸入代理程式的名稱、預設語言和預設時區。
  5. 如果您已建立專案,請輸入該項專案的資料。如要允許 Dialogflow 主控台建立專案,請選取 [Create a new Google project] (建立新 Google 專案)
  6. 按一下 [Create] (建立) 按鈕。

將範例檔案匯入代理程式

本指南中的步驟會假設您的代理程式符合某些條件,因此您需要匯入為本指南準備的代理程式。匯入時,這些步驟會使用「還原」選項,覆寫所有代理程式設定、意圖和實體。

如要匯入檔案,請按照下列步驟操作:

  1. 下載 room-booking-agent.zip 檔案。
  2. 前往 Dialogflow ES 主控台
  3. 選取代理程式。
  4. 按一下代理程式名稱旁邊的設定 按鈕。
  5. 選取「匯出與匯入」分頁標籤。
  6. 選取「從 ZIP 檔案還原」 然後按照操作說明還原您下載的 ZIP 檔案。

偵測意圖

如要偵測意圖,請呼叫 Sessions 類型的 detectIntent 方法。

REST

下載 book-a-room.wav 範例輸入音訊檔案,其內容為「book a room」(預訂會議室)。這個範例音訊檔案必須採用 Base64 編碼,才能透過下方的 JSON 要求提供。以下是 Linux 範例:

wget https://cloud.google.com/dialogflow/es/docs/data/book-a-room.wav
base64 -w 0 book-a-room.wav > book-a-room.b64

如需其他平台的範例,請參閱 Cloud Speech-to-Text API 說明文件中的「Base64 編碼音訊內容」一文。

使用任何要求資料之前,請先替換以下項目:

  • PROJECT_ID:您的 Google Cloud 專案 ID
  • AUDIO:Base64 編碼音訊內容

HTTP 方法和網址:

POST https://dialogflow.googleapis.com/v2/projects/PROJECT_ID/agent/sessions/123456789:detectIntent

JSON 要求主體:

{
  "queryInput": {
    "audioConfig": {
      "languageCode": "en-US"
    }
  },
  "inputAudio": "AUDIO"
}

如要傳送要求,請展開以下其中一個選項:

您應該會收到如下的 JSON 回應:

{
  "responseId": "3c1e5a89-75b9-4c3f-b63d-4b1351dd5e32",
  "queryResult": {
    "queryText": "book a room",
    "action": "room.reservation",
    "parameters": {
      "time": "",
      "date": "",
      "guests": "",
      "duration": "",
      "location": ""
    },
    "fulfillmentText": "I can help with that. Where would you like to reserve a room?",
    "fulfillmentMessages": [
      {
        "text": {
          "text": [
            "I can help with that. Where would you like to reserve a room?"
          ]
        }
      }
    ],
    "intent": {
      "name": "projects/PROJECT_ID/agent/intents/e8f6a63e-73da-4a1a-8bfc-857183f71228",
      "displayName": "room.reservation"
    },
    "intentDetectionConfidence": 1,
    "diagnosticInfo": {},
    "languageCode": "en-us"
  }
}

請注意,queryResult.action 欄位的值為「room.reservation」,而 queryResult.fulfillmentMessages[0|1].text.text[0] 欄位的值會要求使用者提供更多資訊。

Go

如要向 Dialogflow 進行驗證,請設定應用程式預設憑證。 詳情請參閱「為本機開發環境設定驗證」。

func DetectIntentAudio(projectID, sessionID, audioFile, languageCode string) (string, error) {
	ctx := context.Background()

	sessionClient, err := dialogflow.NewSessionsClient(ctx)
	if err != nil {
		return "", err
	}
	defer sessionClient.Close()

	if projectID == "" || sessionID == "" {
		return "", fmt.Errorf("detect.DetectIntentAudio empty project (%s) or session (%s)", projectID, sessionID)
	}

	sessionPath := fmt.Sprintf("projects/%s/agent/sessions/%s", projectID, sessionID)

	// In this example, we hard code the encoding and sample rate for simplicity.
	audioConfig := dialogflowpb.InputAudioConfig{AudioEncoding: dialogflowpb.AudioEncoding_AUDIO_ENCODING_LINEAR_16, SampleRateHertz: 16000, LanguageCode: languageCode}

	queryAudioInput := dialogflowpb.QueryInput_AudioConfig{AudioConfig: &audioConfig}

	audioBytes, err := os.ReadFile(audioFile)
	if err != nil {
		return "", err
	}

	queryInput := dialogflowpb.QueryInput{Input: &queryAudioInput}
	request := dialogflowpb.DetectIntentRequest{Session: sessionPath, QueryInput: &queryInput, InputAudio: audioBytes}

	response, err := sessionClient.DetectIntent(ctx, &request)
	if err != nil {
		return "", err
	}

	queryResult := response.GetQueryResult()
	fulfillmentText := queryResult.GetFulfillmentText()
	return fulfillmentText, nil
}

Java

如要向 Dialogflow 進行驗證,請設定應用程式預設憑證。 詳情請參閱「為本機開發環境設定驗證」。


import com.google.api.gax.rpc.ApiException;
import com.google.cloud.dialogflow.v2.AudioEncoding;
import com.google.cloud.dialogflow.v2.DetectIntentRequest;
import com.google.cloud.dialogflow.v2.DetectIntentResponse;
import com.google.cloud.dialogflow.v2.InputAudioConfig;
import com.google.cloud.dialogflow.v2.QueryInput;
import com.google.cloud.dialogflow.v2.QueryResult;
import com.google.cloud.dialogflow.v2.SessionName;
import com.google.cloud.dialogflow.v2.SessionsClient;
import com.google.protobuf.ByteString;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;

public class DetectIntentAudio {

  // DialogFlow API Detect Intent sample with audio files.
  public static QueryResult detectIntentAudio(
      String projectId, String audioFilePath, String sessionId, String languageCode)
      throws IOException, ApiException {
    // Instantiates a client
    try (SessionsClient sessionsClient = SessionsClient.create()) {
      // Set the session name using the sessionId (UUID) and projectID (my-project-id)
      SessionName session = SessionName.of(projectId, sessionId);
      System.out.println("Session Path: " + session.toString());

      // Note: hard coding audioEncoding and sampleRateHertz for simplicity.
      // Audio encoding of the audio content sent in the query request.
      AudioEncoding audioEncoding = AudioEncoding.AUDIO_ENCODING_LINEAR_16;
      int sampleRateHertz = 16000;

      // Instructs the speech recognizer how to process the audio content.
      InputAudioConfig inputAudioConfig =
          InputAudioConfig.newBuilder()
              .setAudioEncoding(
                  audioEncoding) // audioEncoding = AudioEncoding.AUDIO_ENCODING_LINEAR_16
              .setLanguageCode(languageCode) // languageCode = "en-US"
              .setSampleRateHertz(sampleRateHertz) // sampleRateHertz = 16000
              .build();

      // Build the query with the InputAudioConfig
      QueryInput queryInput = QueryInput.newBuilder().setAudioConfig(inputAudioConfig).build();

      // Read the bytes from the audio file
      byte[] inputAudio = Files.readAllBytes(Paths.get(audioFilePath));

      // Build the DetectIntentRequest
      DetectIntentRequest request =
          DetectIntentRequest.newBuilder()
              .setSession(session.toString())
              .setQueryInput(queryInput)
              .setInputAudio(ByteString.copyFrom(inputAudio))
              .build();

      // Performs the detect intent request
      DetectIntentResponse response = sessionsClient.detectIntent(request);

      // Display the query result
      QueryResult queryResult = response.getQueryResult();
      System.out.println("====================");
      System.out.format("Query Text: '%s'\n", queryResult.getQueryText());
      System.out.format(
          "Detected Intent: %s (confidence: %f)\n",
          queryResult.getIntent().getDisplayName(), queryResult.getIntentDetectionConfidence());
      System.out.format(
          "Fulfillment Text: '%s'\n",
          queryResult.getFulfillmentMessagesCount() > 0
              ? queryResult.getFulfillmentMessages(0).getText()
              : "Triggered Default Fallback Intent");

      return queryResult;
    }
  }
}

Node.js

如要向 Dialogflow 進行驗證,請設定應用程式預設憑證。 詳情請參閱「為本機開發環境設定驗證」。

const fs = require('fs');
const util = require('util');
const {struct} = require('pb-util');
// Imports the Dialogflow library
const dialogflow = require('@google-cloud/dialogflow');

// Instantiates a session client
const sessionClient = new dialogflow.SessionsClient();

// The path to identify the agent that owns the created intent.
const sessionPath = sessionClient.projectAgentSessionPath(
  projectId,
  sessionId
);

// Read the content of the audio file and send it as part of the request.
const readFile = util.promisify(fs.readFile);
const inputAudio = await readFile(filename);
const request = {
  session: sessionPath,
  queryInput: {
    audioConfig: {
      audioEncoding: encoding,
      sampleRateHertz: sampleRateHertz,
      languageCode: languageCode,
    },
  },
  inputAudio: inputAudio,
};

// Recognizes the speech in the audio and detects its intent.
const [response] = await sessionClient.detectIntent(request);

console.log('Detected intent:');
const result = response.queryResult;
// Instantiates a context client
const contextClient = new dialogflow.ContextsClient();

console.log(`  Query: ${result.queryText}`);
console.log(`  Response: ${result.fulfillmentText}`);
if (result.intent) {
  console.log(`  Intent: ${result.intent.displayName}`);
} else {
  console.log('  No intent matched.');
}
const parameters = JSON.stringify(struct.decode(result.parameters));
console.log(`  Parameters: ${parameters}`);
if (result.outputContexts && result.outputContexts.length) {
  console.log('  Output contexts:');
  result.outputContexts.forEach(context => {
    const contextId =
      contextClient.matchContextFromProjectAgentSessionContextName(
        context.name
      );
    const contextParameters = JSON.stringify(
      struct.decode(context.parameters)
    );
    console.log(`    ${contextId}`);
    console.log(`      lifespan: ${context.lifespanCount}`);
    console.log(`      parameters: ${contextParameters}`);
  });
}

Python

如要向 Dialogflow 進行驗證,請設定應用程式預設憑證。 詳情請參閱「為本機開發環境設定驗證」。

def detect_intent_audio(project_id, session_id, audio_file_path, language_code):
    """Returns the result of detect intent with an audio file as input.

    Using the same `session_id` between requests allows continuation
    of the conversation."""
    from google.cloud import dialogflow

    session_client = dialogflow.SessionsClient()

    # Note: hard coding audio_encoding and sample_rate_hertz for simplicity.
    audio_encoding = dialogflow.AudioEncoding.AUDIO_ENCODING_LINEAR_16
    sample_rate_hertz = 16000

    session = session_client.session_path(project_id, session_id)
    print("Session path: {}\n".format(session))

    with open(audio_file_path, "rb") as audio_file:
        input_audio = audio_file.read()

    audio_config = dialogflow.InputAudioConfig(
        audio_encoding=audio_encoding,
        language_code=language_code,
        sample_rate_hertz=sample_rate_hertz,
    )
    query_input = dialogflow.QueryInput(audio_config=audio_config)

    request = dialogflow.DetectIntentRequest(
        session=session,
        query_input=query_input,
        input_audio=input_audio,
    )
    response = session_client.detect_intent(request=request)

    print("=" * 20)
    print("Query text: {}".format(response.query_result.query_text))
    print(
        "Detected intent: {} (confidence: {})\n".format(
            response.query_result.intent.display_name,
            response.query_result.intent_detection_confidence,
        )
    )
    print("Fulfillment text: {}\n".format(response.query_result.fulfillment_text))

其他語言

C#: 請按照用戶端程式庫頁面上的C# 設定說明操作, 然後前往 .NET 適用的 Dialogflow 參考說明文件

PHP: 請按照用戶端程式庫頁面上的 PHP 設定說明 操作,然後前往 PHP 適用的 Dialogflow 參考文件。

Ruby: 請按照用戶端程式庫頁面的 Ruby 設定說明 操作,然後前往 Ruby 適用的 Dialogflow 參考說明文件