Regexp entities

Some entities need to match patterns rather than specific terms. For example, national identification numbers, IDs, license plates, and so on. With regexp entities, you can provide regular expressions for matching.

Compound regular expressions

Each regexp entity corresponds to a single pattern, but you can provide multiple regular expressions if they all represent variations of a single pattern. During agent training, all regular expressions of a single entity are combined with the alternation operator (|) to form one compound regular expression.

For example, if you provide the following regular expressions for a phone number:

  • ^[2-9]\d{2}-\d{3}-\d{4}$
  • ^(1?(-?\d{3})-?)?(\d{3})(-?\d{4})$

The compound regular expression becomes:

  • ^[2-9]\d{2}-\d{3}-\d{4}$|^(1?(-?\d{3})-?)?(\d{3})(-?\d{4})$

The ordering of regular expressions matters. Each of the regular expressions in the compound regular expression are processed in order. Searching stops once a valid match is found. For example, for an end user expression of "Seattle":

  • Sea|Seattle matches "Sea"
  • Seattle|Sea matches "Seattle"

Special handling for speech recognition

If your agent uses speech recognition (also known as audio input, speech-to-text, or STT), your regular expressions will need special handling when matching letters and numbers. A spoken end-user utterance is first processed by the speech recognizer before entities are matched. When an utterance contains a series of letters or numbers, the recognizer may pad each character with spaces. In addition, the recognizer may interpret digits in word form. For example, an end-user utterance of "My ID is 123" may be recognized as any of the following:

  • "My ID is 123"
  • "My ID is 1 2 3"
  • "My ID is one two three"

To accommodate three digit numbers, you could use the following regular expressions:

\d{3}
\d \d \d
(zero|one|two|three|four|five|six|seven|eight|nine) (zero|one|two|three|four|five|six|seven|eight|nine) (zero|one|two|three|four|five|six|seven|eight|nine)

Create a regexp entity

Console

  1. Open the Dialogflow CX console.
  2. Choose your GCP project.
  3. Select your agent.
  4. Select the Manage tab.
  5. Click Entity Types.
  6. Click Create.
  7. Check Regexp entities.
  8. Complete remaining fields.
  9. Click Save.

API

Set the EntityType.kind field to KIND_REGEXP.

Select a protocol and version for the EntityType reference:

Protocol V3 V3beta1
REST EntityType resource EntityType resource
RPC EntityType interface EntityType interface
C++ EntityTypesClient Not available
C# EntityTypesClient Not available
Go EntityTypesClient Not available
Java EntityTypesClient EntityTypesClient
Node.js EntityTypesClient EntityTypesClient
PHP Not available Not available
Python EntityTypesClient EntityTypesClient
Ruby Not available Not available

Limitations

The following limitations apply:

  • Fuzzy matching cannot be enabled for regexp entities. These features are mutually exclusive.
  • Each agent can have a maximum of 50 regexp entities.
  • The compound regular expression for an entity has a maximum length of 2000 characters.