How to keep sensitive data out of your chatbots
Anthony Okwechime
Customer Engineer
Max Saltonstall
Developer Advocate
As virtual agent adoption has grown, the use cases have increasingly begun to span conversation flows that include Personal Identifiable Information (PII) and sensitive data.
Organizations and government agencies often view storing this data as an unacceptable risk and require automated redaction of sensitive information.
But how? What's the fastest, easiest way to achieve the right level of data privacy?
Custom solutions take time
Many tools can enable redaction of sensitive information but using them requires custom integration into your virtual agent implementations. This can require costly development and take time to fine-tune for different platforms.
On top of that time cost, the data redaction systems don't always play nicely with the tools you already use to handle requests or process tickets.
Integrations may be incomplete
Building integrations to automatically redact sensitive data can be complex. Getting the correct level of granularity of redaction right makes it even more complicated.
For example:
Is redaction only required for sensitive PII data or is it required for complete user utterances?
Does data need to be completely redacted or will de-identification methods like tokenization and masking be sufficient?
The options are myriad and aligning the capabilities of a redaction solution with an organization’s virtual agent strategy can be a challenge.
Dialogflow supports automated redaction
Dialogflow has grown into a popular development platform for creating rich, intuitive customer conversations and is powered by Google AI. Dialogflow CX now includes three options for redacting sensitive information;
Parameter redaction, which is enabled by selecting the Redact in log option in the console or via the Dialogflow API.
Support for redaction via the SecuritySettings API.
Security Settings in the Dialogflow CX console
By default, Dialogflow does not send its logs to Cloud Logging. If you do not need logging for analytics or other purposes, then you do not need to turn this on. However, this is rarely the case and so In order to ensure virtual agent logs are written to Cloud Logging the Enable Stackdriver Logging option needs to be selected in the General tab of the virtual Agent Settings as shown below.
Parameter redaction
Once this option is set, parameter redaction can be configured and the results can be reviewed in Cloud Logging. Parameter redaction can be applied to any parameters defined in an intent or form. Doing so redacts the selected parameters in Dialogflow storage as well as in Cloud Logging.
The image below shows the Redact in Log checkbox selected for the form parameter named social-security-number.
Examining the logs in the log explorer provided by Cloud Logging will show that the appropriate parameter has been redacted.
The parameter named social-security-number has had its data replaced.
The Security Settings API
The SecuritySettings API provides the ability to manage settings related to security issues such as data redaction and data retention.
The API enables users to set a strategy which determines if data is redacted or not. The strategy guides the overall security posture of the virtual agent with regards to handling sensitive data..
The table below shows the redaction strategy options.
Setting the REDACTION_STRATEGY_UNSPECIFIED
switch will result in no redaction-related action being taken. The entire conversation is stored without modification. Setting the REDACT_WITH_SERVICE
switch enables the redaction process.
Consider the REDACT_WITH_SERVICE
option as the “On” switch. Once activated, it enables redaction of personally identifiable information and provides the ability to configure more granular options like selecting the scope of redaction and the type of data to purge.
When redaction is enabled, the specified data types are purged prior to being written to permanent Dialogflow storage. This ensures that the defined data types are never persisted within Dialogflow.
Redaction via the Security Settings API - process overview
The redaction process requires Dialogflow to call Data Loss Prevention (DLP) which is a fully managed service designed to help discover, classify, and protect sensitive information. It does this by using the configuration options set in the Security Settings API. When this happens, Dialogflow becomes the data source and client that makes the DLP API request as shown in the figure below:
The DLP service redacts sensitive data contained in the Dialogflow User Utterance and Agent response based on the Redaction strategy for the Dialogflow agent. The API request itself is encrypted in transit, stateless, not persisted and supports data residency.
Security Settings in the Dialogflow CX console
The Security tab of the Virtual Agent Settings page provides a means to configure data redaction and data retention in the Dialogflow CX console instead of using the API directly.
As with using the API, a Security Setting must be configured that can be applied to the virtual agent.
Clicking Manage Security Settings will take you to the Create Security Settings page where the specifics of your policy can be configured.
The Create Security Settings form allows the use of a Cloud DLP Inspection Template. The steps to create a template can be found here.
Once Security Settings have been configured, the appropriate policy can be applied to the virtual agent on the Security tab of the Agent Settings page.
Reviewing data after Security Settings based redaction
When redaction is configured via Security Settings it applies to the conversation transcript. The data types selected for redaction are replaced with the text [redacted] in both the user and virtual agent message text in Cloud Logging.
The images below show the text of the user and virtual agent messages in Cloud Logging before and after redaction is enabled.
In the above example, the social security number and date of birth have been replaced by the text string [redacted].
Fully managed data protection
Organizations and agencies often cannot determine when user conversations with virtual agents will include sensitive information so having an easy-to-use tool for enabling automated redaction helps alleviate the operational burden of ensuring that sensitive user data is protected.
The Parameter Redaction and Security Settings features simplify the management of data within Dialogflow and in adjacent systems. Users can specify that particular parameters be redacted, utilize the default DLP inspect configuration, or create a custom inspect template that is unique to their environment’s needs.
Conversational architects implementing virtual agents can now take advantage of automated redaction in Dialogflow. Doing so ensures that sensitive user data is handled in line with industry best practices and in accordance with security and compliance requirements.
Try it today
You can get started today and try it yourself. We recommend some basic knowledge of DLP and Dialogflow which you can find in these tutorials and How-to guides.
Keep in mind that DLP and Dialogflow both have costs associated, so turn off your experiments when you're done to avoid surprises.
Ready to go? Check out the Parameter Redaction section of the Dialogflow CX documentation, the Security Settings API reference and the Security Settings overview page to set up data protection for Dialogflow.