This guide provides best practices for using the Dialogflow service. These guidelines are designed for greater efficiency and accuracy as well as optimal response times from the service.
You should also see the general agent design guide for all agent types, and the voice agent design guide specifically for designing voice agents.
Productionization
Before running your agent in production, be sure to implement the following best practices:
Enable audit logs
Enable Data Access audit logs for Dialogflow API in your project. This can help you track design-time changes in the Dialogflow agents linked to this project.
Agent versions
You should always use agent versions for your production traffic. See Versions and environments for details.
Create agent backup
Keep an up-to-date exported agent backup. This will allow you to quickly recover if you or your team members accidentally delete the agent or the project.
Client reuse
You can improve the performance of your application
by reusing *Client
client library instances
for the duration of your application's execution lifetime.
Most importantly,
you can improve the performance of detect intent API calls
by reusing a SessionsClient
client library instance.
For more information on this, see the Best Practices with Client Libraries guide.
Batch updates to agent
If you are sending many individual agent update API requests over a short period of time, your requests may be throttled. These design-time API methods are not implemented to handle high update rates for a single agent.
Some data types have batch methods for this purpose:
- Instead of sending many
EntityTypes
create
,patch
, ordelete
requests, use thebatchUpdate
orbatchDelete
methods. - Instead of sending many
Intents
create
,patch
, ordelete
requests, use thebatchUpdate
orbatchDelete
methods.
API error retries
When calling API methods, you may receive error responses. There are some errors which should be retried, because they are often due to transient issues. There are two types of errors:
- Cloud API errors.
- Errors sent from your webhook service.
In addition, you should implement an exponential backoff for retries. This allows your system to find an acceptable rate while the API service is under heavy load.
Cloud API errors
If you are using a Google supplied client library, Cloud API error retries with exponential backoff are implemented for you.
If you have implemented your own client library using REST or gRPC, you must implement retries for your client. For information on the errors that you should or should not retry, see API Improvement Proposals: Automatic retry configuration.
Webhook errors
If your API call triggers a webhook call,
your webhook may return an error.
Even if you are using a Google supplied client library,
webhook errors will not be retried automatically.
Your code should retry 503 Service Unavailable
errors received from your webhook.
See the
webhook service
documentation for information on the types of webhook errors
and how to check for them.
Load testing
It is a best practice to execute load testing for your system before you release code to production. Consider these points before implementing your load tests:
Summary | Details |
---|---|
Ramp up load. | Your load test must ramp up the load applied to the Dialogflow service. The service is not designed to handle abrupt bursts of load, which are rarely experienced with real traffic. It takes time for the service to adjust to load demands, so ramp up the request rate slowly, until your test achieves the desired load. |
API calls are charged. | You will be charged for API calls during a test, and the calls will be limited by project quota. |
Use test doubles. | You may not need to call the API during your load test. If the purpose of your load test is to determine how your system handles load, it is often best to use a test double in place of actual calls to the API. Your test double can simulate the behavior of the API under load. |
Use retries. | Your load test must perform retries with a backoff. |
Calling Dialogflow securely from an end-user device
You should never store your private keys used to access the Dialogflow API on an end-user device. This applies to storing keys on the device directly and to hard coding keys in applications. When your client application needs to call the Dialogflow API, it should send requests to a developer-owned proxy service on a secure platform. The proxy service can make the actual, authenticated Dialogflow calls.
For example, you should not create a mobile application that calls Dialogflow directly. Doing so would require you to store private keys on an end-user device. Your mobile application should instead pass requests through a secure proxy service.
Performance
This section outlines performance information for various operations within Dialogflow. Understanding latency is important for designing responsive agents and setting realistic performance expectations, although these values are not part of the Dialogflow SLA.
When building monitoring and alerting tools, note that Large Language Models (LLMs) and speech processing are typically handled using streaming methods. Responses are sent to the client as soon as possible, often much earlier than the total duration of the method call. For more information, see the Best practices with large language models (LLMs).
Performance per operation
The following table provides information about the typical performance of Dialogflow operations:
Action | Notes |
---|---|
Intent detection (text) | Fast operation |
Parameter detection (text) | Fast operation |
Speech recognition (streaming) | Data is processed and responses are returned as soon as possible. The total execution time is primarily determined by the length of the input audio. Measuring latency using the total execution time is not recommended. |
Speech synthesis (streaming) | The total execution time is primarily determined by the length of the output audio. Data is processed and responses are returned as quickly as possible. |
Webhook calls | Performance is directly determined by the execution time of your code in the webhook. |
Import / Export agent | Performance depends on the size of the agent. |
Agent training | Performance depends on the number of flows, intents, and training phrases. Training large agents can take tens of minutes. |
Environment creation | Creating an environment involves training the agent, so the total time will depend on the size and complexity of the agent. |
Key Notes:
- Streaming: For streaming calls (speech recognition and synthesis), data is processed as it arrives, and responses are returned as soon as possible. This means the initial response is typically much faster than the total time of the call.
- Playbooks: An LLM prompt is constructed based on the playbook instructions, the conversation context and the tool input. Multiple LLM prompts can be executed in a single playbook call. This is why the playbook execution is variable, depending on the amount of prompts issued and the complexity of the calls.
Important Latency Considerations
- No Latency Guarantees: Dialogflow SLAs do not consider latency, even under Provisioned Throughput.
- LLM Latency: Be aware that LLM processing can introduce significant latency. Factor this into your agent design and user expectations.
- Monitoring and Alerting: When setting up monitoring and alerting, account for the streamed nature of responses from LLMs and speech services. Don't assume full response time is equal to time to first response.