Join the Dialogflow CX competition! Get trained in Dialogflow CX, work on open-source conversational components, get a free Dialogflow t-shirt and win cool prizes! Learn more.

Auto speech adaptation

The auto speech adaptation feature improves the speech recognition accuracy of your agent by automatically using conversation state to pass relevant entities and training phrases as speech context hints for all detect intent requests. This feature is disabled by default.

Enable or disable auto speech adaptation

To enable or disable auto speech adaptation:

Console

  1. Open the Dialogflow CX Console.
  2. Choose your GCP project.
  3. Select your agent.
  4. Click Agent Settings.
  5. Click the Speech and IVR tab.
  6. Toggle Enable speech adaptation on or off.
  7. Click Save.

API

See the get and patch/update methods for the Agent type.

Select a protocol and version for the Agent reference:

Protocol V3 V3beta1
REST Agent resource Agent resource
RPC Agent interface Agent interface
C# Not available Not available
Go Not available Not available
Java AgentsClient AgentsClient
Node.js AgentsClient AgentsClient
PHP Not available Not available
Python AgentsClient AgentsClient
Ruby Not available Not available

Example speech recognition improvements

With auto speech adaptation enabled, you can build your agent in ways to take advantage of it. The following sections explain how speech recognition may be improved with certain changes to your agent's training phrases, and entities.

Training Phrases

  • If you define training phrases with a phrase like "stuffy nose", a similar sounding end-user utterance is reliably recognized as "stuffy nose" and not "stuff he knows".
  • When you have a required parameter that forces Dialogflow into form-filling prompts, auto speech adaptation will strongly bias towards the entity being filled.

In all cases, auto speech adaptation is only biasing the speech recognition, not limiting it. For example, even if Dialogflow is prompting a user for a required parameter, users will still be able to trigger other intents such as a top-level "talk to an agent" intent.

System Entities

If you define a training phrase that uses the @sys.number system entity, and the end user says "I want two", it may be recognized as "to", "too", "2", or "two".

With auto speech adaptation enabled, Dialogflow uses the @sys.number entity as a hint during speech recognition, and the parameter is more likely to be extracted as "2".

Custom Entities

  • If you define entities for product or service names offered by your company, and the end-user mentions these terms in an utterance, they are more likely to be recognized. A training phrase "I love Dialogflow", where "Dialogflow" is annotated as the @product entity, will tell auto speech adaptation to bias for "I love Dialogflow", "I love Cloud Speech", and all other entries in the @product entity.

  • It is especially important to define clean entity synonyms when using Dialogflow to detect speech. Imagine you have two @product entity entries, "Dialogflow" and "Dataflow". Your synonyms for "Dialogflow" might be "Dialogflow", "dialogue flow", "dialogue builder", "Speaktoit", "speak to it", "API.ai", "API dot AI". These are good synonyms because they cover the most common variations. You don't need to add "the dialogue flow builder" because "dialogue flow" already covers that.

  • User utterances with consecutive but distinct number entities can be ambiguous. For example, "I want two sixteen packs" might mean 2 quantities of 16 packs, or 216 quantities of packs. Speech adaptation can help disambiguate these cases if you set up entities with spelled-out values:
    • Define a quantity entity with entries:
      zero
      one
      ...
      twenty
    • Define a product or size entity with entries:
      sixteen pack
      two ounce
      ...
      five liter
    • Only entity synonyms are used in speech adaptation, so you can define an entity with reference value 1 and single synonym one to simplify your fulfillment logic.

Regexp Entities

Regexp entities can trigger auto speech adaptation for alphanumeric and digit sequences like "ABC123" or "12345" when configured properly. While any regular expression can be used to extract entities in the NLP, only certain expressions will tell auto speech adaptation to bias for spelled-out alphanumeric or digit sequences when recognizing speech.

Check that you are following all of the following requirements if you want to recognize these sequences over voice:

  1. At least one of your regexp entity entries follows all of these rules:

    • Can use character sets []
    • Can use repetition quantifiers like *, ?, +, {3,5}
    • Does not contain whitespace or \s, though \s* and \s? are allowed
    • Does not contain capture groups ()
    • Does not try to match any special characters or punctuation like: ` ~ ! @ # $ % ^ & * ( ) - _ = + , . < > / ? ; ' : " [ ] { } \ |
  2. In your intent, mark the regexp entity as a required parameter, so it can be collected during slot-filling. This allows auto speech adaptation to strongly bias for sequence recognition instead of trying to recognize an intent and sequence at the same time. Otherwise, "Where is my package for ABC123" might be misrecognized as "Where is my package 4ABC123".

For example, a regexp entity with a single entry ([a-zA-Z0-9]\s?){5,9} will not trigger the speech sequence recognizer because it contains a capture group. To fix this, simply add another entry for [a-zA-Z0-9]{5,9}. Now you will benfit from the sequence recognizer when form-filling over voice for "ABC123", yet the NLP will still match inputs like "ABC 123" thanks to the original rule that allows spaces.

The following examples of regular expressions adapt for alphanumeric sequences:

^[A-Za-z0-9]{1,10}$
WAC\d+
215[2-8]{3}[A-Z]+
[a-zA-Z]\s?[a-zA-Z]\s?[0-9]\s?[0-9]\s?[0-9]\s?[a-zA-Z]\s?[a-zA-Z]

The following examples of regular expressions adapt for digit sequences:

\d{2,8}
^[0-9]+$
2[0-9]{7}
[2-9]\d{2}[0-8]{3}\d{4}

Also consider using @sys.number-sequence for accepting any digit sequence, and @sys.phone-number for a localized phone number recognizer. System entities and non-regexp custom entities work well with auto speech adaptation even outside of required form-filling prompts.

Limitations

The following limitations apply:

  1. Recognizing long character sequences is challenging. On phone channels with 8kHz audio, for example, you may not consistently recognize sequences longer than 16 digits or 10 alphanumerics. Consider more conversational alternatives, for example:

    • When validating the sequence against a database, consider cross-referencing other collected parameters like dates, names, or phone numbers to allow for incomplete matches. For example, instead of just asking a user for their order number, also ask for their phone number. Now, when your webhook queries your database for order status, it can rely first on the phone number, then return the closest matching order for that account. This could allow Dialogflow to mishear "ABC" as "AVC", yet still return the correct order status for the user.
    • For extra long sequences, consider designing a flow that encourages end-users to pause in the middle so that the bot can confirm as you go.
  2. Auto speech adaptation's built-in support for system and regexp entities varies by language. Check Speech class tokens for $OOV_CLASS_ALPHANUMERIC_SEQUENCE and $OOV_CLASS_DIGIT_SEQUENCE supported languages. If your language is not listed, you can work around this limitation. For example, if you want an employee ID that is three letters followed by three digits to be accurately recognized, you could build your agent with the following entities and parameters:

    • Define a digit entity that contains 10 entity entries (with synonyms):
      0, 0
      1, 1
      ...
      9, 9
    • Define a letter entity that contains 26 entity entries (with synonyms):
      A, A
      B, B
      ...
      Z, Z
    • Define a employee-id entity that contains a single entity entry (without synonyms):
      @letter @letter @letter @digit @digit @digit
    • Use @employee-id as a parameter in a training phrase.