Some entities need to match patterns rather than specific terms. For example, national identification numbers, IDs, license plates, and so on. With regexp entities, you can provide regular expressions for matching.
Compound regular expressions
Each regexp entity corresponds to a single pattern,
but you can provide multiple regular expressions
if they all represent variations of a single pattern.
During agent training, all regular expressions of a single entity are combined
with the alternation operator (|
) to form one compound regular expression.
For example, if you provide the following regular expressions for a phone number:
^[2-9]\d{2}-\d{3}-\d{4}$
^(1?(-?\d{3})-?)?(\d{3})(-?\d{4})$
The compound regular expression becomes:
^[2-9]\d{2}-\d{3}-\d{4}$|^(1?(-?\d{3})-?)?(\d{3})(-?\d{4})$
The ordering of regular expressions matters. Each of the regular expressions in the compound regular expression are processed in order. Searching stops once a valid match is found. For example, for an end user expression of "Seattle":
Sea|Seattle
matches "Sea"Seattle|Sea
matches "Seattle"
Special handling for speech recognition
If your agent uses speech recognition (also known as audio input, speech-to-text, or STT), your regular expressions will need special handling when matching letters and numbers. A spoken end-user utterance is first processed by the speech recognizer before entities are matched. When an utterance contains a series of letters or numbers, the recognizer may pad each character with spaces. In addition, the recognizer may interpret digits in word form. For example, an end-user utterance of "My ID is 123" may be recognized as any of the following:
- "My ID is 123"
- "My ID is 1 2 3"
- "My ID is one two three"
To accommodate three digit numbers, you could use the following regular expressions:
\d{3}
\d \d \d
(zero|one|two|three|four|five|six|seven|eight|nine) (zero|one|two|three|four|five|six|seven|eight|nine) (zero|one|two|three|four|five|six|seven|eight|nine)
Create a regexp entity
Console
- Open the Dialogflow CX console.
- Choose your GCP project.
- Select your agent.
- Select the Manage tab.
- Click Entity Types.
- Click Create.
- Check Regexp entities.
- Complete remaining fields.
- Click Save.
API
Set the EntityType.kind
field to KIND_REGEXP
.
Select a protocol and version for the EntityType reference:
Protocol | V3 | V3beta1 |
---|---|---|
REST | EntityType resource | EntityType resource |
RPC | EntityType interface | EntityType interface |
C++ | EntityTypesClient | Not available |
C# | EntityTypesClient | Not available |
Go | EntityTypesClient | Not available |
Java | EntityTypesClient | EntityTypesClient |
Node.js | EntityTypesClient | EntityTypesClient |
PHP | Not available | Not available |
Python | EntityTypesClient | EntityTypesClient |
Ruby | Not available | Not available |
Limitations
The following limitations apply:
- Fuzzy matching cannot be enabled for regexp entities. These features are mutually exclusive.
- Each agent can have a maximum of 50 regexp entities.
- The compound regular expression for an entity has a maximum length of 2000 characters.