Service use best practices

This guide provides best practices for using the Dialogflow service. These guidelines are designed for greater efficiency and accuracy as well as optimal response times from the service.

You should also see the general agent design guide for all agent types, and the voice agent design guide specifically for designing voice agents.

Productionization

Before running your agent in production, be sure to implement the following best practices:

Use agent versions
Reuse session clients
Implement error handling with retries

Enable audit logs

Enable Data Access audit logs for Dialogflow API in your project. This can help you track design-time changes in the Dialogflow agents linked to this project.

Agent versions

You should always use agent versions for your production traffic. See Versions and environments for details.

Create agent backup

Keep an up-to-date exported agent backup. This will allow you to quickly recover if you or your team members accidentally delete the agent or the project.

Client reuse

You can improve the performance of your application by reusing *Client client library instances for the duration of your application's execution lifetime.

Most importantly, you can improve the performance of detect intent API calls by reusing a SessionsClient client library instance.

See the Sessions reference.

For more information on this, see the Best Practices with Client Libraries guide.

Batch updates to agent

If you are sending many individual agent update API requests over a short period of time, your requests may be throttled. These design-time API methods are not implemented to handle high update rates for a single agent.

Some data types have batch methods for this purpose:

Instead of sending many EntityTypes create, patch, or delete requests, use the batchUpdate or batchDelete methods.
Instead of sending many Intents create, patch, or delete requests, use the batchUpdate or batchDelete methods.

API error retries

When calling API methods, you may receive error responses. There are some errors which should be retried, because they are often due to transient issues. There are two types of errors:

Cloud API errors.
Errors sent from your webhook service.

In addition, you should implement an exponential backoff for retries. This allows your system to find an acceptable rate while the API service is under heavy load.

Cloud API errors

If you are using a Google supplied client library, Cloud API error retries with exponential backoff are implemented for you.

If you have implemented your own client library using REST or gRPC, you must implement retries for your client. For information on the errors that you should or should not retry, see API Improvement Proposals: Automatic retry configuration.

Webhook errors

If your API call triggers a webhook call, your webhook may return an error. Even if you are using a Google supplied client library, webhook errors will not be retried automatically. Your code should retry 503 Service Unavailable errors received from your webhook. See the webhook service documentation for information on the types of webhook errors and how to check for them.

Load testing

It is a best practice to execute load testing for your system before you release code to production. Consider these points before implementing your load tests:

Summary	Details
Ramp up load.	Your load test must ramp up the load applied to the Dialogflow service. The service is not designed to handle abrupt bursts of load, which are rarely experienced with real traffic. It takes time for the service to adjust to load demands, so ramp up the request rate slowly, until your test achieves the desired load.
API calls are charged.	You will be charged for API calls during a test, and the calls will be limited by project quota.
Use test doubles.	You may not need to call the API during your load test. If the purpose of your load test is to determine how your system handles load, it is often best to use a test double in place of actual calls to the API. Your test double can simulate the behavior of the API under load.
Use retries.	Your load test must perform retries with a backoff.

Calling Dialogflow securely from an end-user device

You should never store your private keys used to access the Dialogflow API on an end-user device. This applies to storing keys on the device directly and to hard coding keys in applications. When your client application needs to call the Dialogflow API, it should send requests to a developer-owned proxy service on a secure platform. The proxy service can make the actual, authenticated Dialogflow calls.

For example, you should not create a mobile application that calls Dialogflow directly. Doing so would require you to store private keys on an end-user device. Your mobile application should instead pass requests through a secure proxy service.

Performance

This section outlines performance information for various operations within Dialogflow. Understanding latency is important for designing responsive agents and setting realistic performance expectations, although these values are not part of the Dialogflow SLA.

When building monitoring and alerting tools, note that Large Language Models (LLMs) and speech processing are typically handled using streaming methods. Responses are sent to the client as soon as possible, often much earlier than the total duration of the method call. For more information, see the Best practices with large language models (LLMs).

Performance per operation

The following table provides information about the typical performance of Dialogflow operations:

Action	Notes
Intent detection (text)	Fast operation
Parameter detection (text)	Fast operation
Speech recognition (streaming)	Data is processed and responses are returned as soon as possible. The total execution time is primarily determined by the length of the input audio. Measuring latency using the total execution time is not recommended.
Speech synthesis (streaming)	The total execution time is primarily determined by the length of the output audio. Data is processed and responses are returned as quickly as possible.
Webhook calls	Performance is directly determined by the execution time of your code in the webhook.
Import / Export agent	Performance depends on the size of the agent.
Agent training	Performance depends on the number of flows, intents, and training phrases. Training large agents can take tens of minutes.
Environment creation	Creating an environment involves training the agent, so the total time will depend on the size and complexity of the agent.

Key Notes:

Streaming: For streaming calls (speech recognition and synthesis), data is processed as it arrives, and responses are returned as soon as possible. This means the initial response is typically much faster than the total time of the call.
Playbooks: An LLM prompt is constructed based on the playbook instructions, the conversation context and the tool input. Multiple LLM prompts can be executed in a single playbook call. This is why the playbook execution is variable, depending on the amount of prompts issued and the complexity of the calls.

Important Latency Considerations

No Latency Guarantees: Dialogflow SLAs do not consider latency, even under Provisioned Throughput.
LLM Latency: Be aware that LLM processing can introduce significant latency. Factor this into your agent design and user expectations.
Monitoring and Alerting: When setting up monitoring and alerting, account for the streamed nature of responses from LLMs and speech services. Don't assume full response time is equal to time to first response.