Use the API to import conversations

Learn how you can import audio and transcript files with their metadata using the API. You can import a single file using the UploadConversation API, or you can bulk import all the files from a Cloud Storage bucket using the IngestConversations API.

The two request commands UploadConversation and IngestConversations support the following functions:

Request command Number of files Speech-to-Text Redaction Metadata ingestion Automatic analysis
UploadConversation 1 ✔ metadata in the request
IngestConversations All files in a bucket ✔ metadata in the request

Prerequisites

  1. Enable the Cloud Storage, Speech-to-Text, Cloud Data Loss Prevention, and Conversational Insights APIs on your Google Cloud project.
  2. Save your conversation data (dual-channel audio and transcript files) in a Cloud Storage bucket. Note the object path with the following format: gs://<bucket>/<object>
  3. Give the Speech-to-Text and Conversational Insights service agents access to the objects in your Cloud Storage bucket. See this troubleshooting page for help with service accounts.
  4. If you opt to import conversation metadata, ensure that metadata files are in their own bucket and the metadata filenames match their corresponding conversation filename.

    For example, a conversation with the Cloud Storage URI gs://transcript-bucket-name/conversation.mp3, must have a corresponding metadata file such as gs://metadata-bucket-name/conversation.json.

Conversation data

Conversation data consists of voice or chat transcripts and audio.

Transcripts

Chat transcripts must be supplied as JSON-formatted files that match the CCAI conversation data format.

Voice transcripts can be supplied in the CCAI conversation data format or as the returned speech recognition result of a Speech-to-Text API transcription. The response is identical for synchronous and asynchronous recognition across all Speech-to-Text API versions.

Audio

Conversational Insights uses Cloud Speech-to-Text batch recognition to transcribe audio. Insights configures Speech-to-Text transcription settings with Recognizer resources. You can create a custom recognizer in the request, or if you don't provide a recognizer either in Settings or in the request, Insights creates a default ccai-insights-recognizer in your project.

The Insights recognizer transcribes English speech using the telephony model, and the default language is en-US. For a full list of Speech-to-Text support per region, language, model, and recognition feature, refer to the Speech-to-Text language support docs.

Before your first audio import to Insights, assess whether you would like to:

  • Use a custom Speech-to-Text transcription configuration.
  • Analyze the (optionally) redacted conversations.

You can configure these actions to run by default in each UploadConversation or IngestConversation request by setting the proper fields in the project Settings resource. The speech and redaction settings can also be overridden per-request. If you don't specify any speech settings, Insights will use the default speech settings and won't redact the transcripts.

Redaction

Cloud Data Loss Prevention does not redact transcripts unless you explicitly supply redaction configs in the project Settings, the UploadConversationRequest, or in the IngestConversationsRequest. Cloud Data Loss Prevention supports both inspection templates and de-identification templates for redaction.

Configure project settings

Redaction and speech can be configured for UploadConversation and IngestConversations requests by setting the corresponding project settings parameters. These configurations can also be set individually per request, which overrides the project settings. UploadConversation also supports analysis percentage configuration, thoughIngestConversations does not.

Save the request body in a file called request.json, and execute the following command:

curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json; charset=utf-8" \
    -d @request.json \
    "https://contactcenterinsights.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/settings?updateMask=redaction_config,speech_config,analysis_config.upload_conversation_analysis_percentage"

Metadata

Use metadata to perform a single-file or bulk import.

Import one file

For a single file import, include your quality metadata in the curl command for UploadConversationsRequest.

curl --request POST \
  'https://contactcenterinsights.googleapis.com/v1/projects/project-id/locations/location-id/conversations:upload' \
  --header 'Authorization: Bearer [YOUR_ACCESS_TOKEN]' \
  --header 'Accept: application/json' \
  --header 'Content-Type: application/json' \
  --data '{"conversation":{"qualityMetadata":{"agentInfo":[{"agentId":"agent-id","displayName":"agent-name"}]},"dataSource":{"gcsSource":{"transcriptUri":"gs://path/to_transcript"}}}}'

Do a bulk import

Supply conversation metadata files as JSON-formatted files in a bucket specified in the gcs_source.metadata_bucket_uri field of the IngestConversationsRequest. Insights populates conversation quality metadata found in the file, but you can also create custom metadata.

For example, to specify a custom conversation ID for each conversation in your dataset, specify custom metadata on the conversation object within Cloud Storage. Set the key to ccai_insights_conversation_id. The value is your custom conversation ID. Custom conversation IDs can also be provided within the metadata file.

If you provide any custom metadata in the custom_metadata_keys field of an IngestConversationsRequest, Insights stores that custom metadata in the conversation labels. It supports up to 100 labels.

See the following example of a valid metadata file:

{
  "customer_satisfaction_rating": 5,
  "agent_info": [
    {
      "agent_id": "123456",
      "display_name": "Agent Name",
      "team": "Agent Team",
      "disposition_code": "resolved"
    }
  ],
  "custom_key": "custom value"
  "conversation_id": "custom-conversation-id"
}

Import a single audio file

The UploadConversation API creates a long-running operation that transcribes and optionally redacts your conversations. An audio file will be transcribed if the conversation contains only an audio_uri in the DataSource. Otherwise, the provided transcript_uri will be read and used.

Request JSON body:

{ 
  "conversation": { 
    "data_source": { 
      "gcs_source": { "audio_uri": AUDIO_URI }
    }
  },
  "redaction_config": {
    "deidentify_template": DEIDENTIFY_TEMPLATE,
    "inspect_template": INSPECT_TEMPLATE
  },
  "speech_config": {
    "speech_recognizer": RECOGNIZER_NAME
  }
}

Save the request body in a file called request.json, and execute the following command:

curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json; charset=utf-8" \
    -d @request.json \
    "https://contactcenterinsights.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/conversations:upload"

Bulk import

REST

Refer to the conversations:ingest API endpoint for complete details.

Before using any of the request data, make the following replacements:

  • PROJECT_ID: your Google Cloud Platform project ID.
  • GCS_BUCKET_URI: the Cloud Storage URI that points to the bucket containing the conversation transcripts. May contain a prefix. For example gs://BUCKET_NAME or gs://BUCKET_NAME/PREFIX. Wildcards are not supported.
  • MEDIUM: set to either PHONE_CALL or CHAT depending on the data type. If unspecified the default value is PHONE_CALL.
  • AGENT_ID: Optional. Agent Id for the entire bucket.

HTTP method and URL:

POST https://contactcenterinsights.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/conversations:ingest

Request JSON body:

{
  "gcsSource":  {
    "bucketUri": "GCS_BUCKET_URI",
    "bucketObjectType": "AUDIO"
  },
  "transcriptObjectConfig": { "medium": "PHONE_CALL" },
  "conversationConfig": {
    "agentId": "AGENT_ID",
    "agentChannel": "AGENT_CHANNEL",
    "customerChannel": "CUSTOMER_CHANNEL"
  }
}

Or

{
  "gcsSource":  {
    "bucketUri": "GCS_BUCKET_URI",
    "bucketObjectType": "TRANSCRIPT"
  },
  "transcriptObjectConfig": { "medium": "MEDIUM" },
  "conversationConfig": {"agentId": "AGENT_ID"}
}

To send your request, expand one of these options:

You should receive a JSON response similar to the following:


{
  "name": "projects/PROJECT_ID/locations/us-central1/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.contactcenterinsights.v1main.IngestConversationsMetadata",
    "createTime": "...",
    "request": {
      "parent": "projects/PROJECT_ID/locations/us-central1",
      "gcsSource": {
        "bucketUri": "GCS_BUCKET_URI",
        "bucketObjectType": "BUCKET_OBJECT_TYPE"
      },
      "transcriptObjectConfig": {
        "medium": "MEDIUM"
      },
      "conversationConfig": {
        "agentId": "AGENT_ID"
      }
    }
  }
}

Poll the operation

Both the UploadConversation and IngestConversation requests return a long-running operation. Long-running methods are asynchronous, and the operation might not yet be completed when the method returns a response. You can poll the operation to check on its status. See the long-running operations page for details and code samples.

Speech-to-Text quotas

Conversational Insights uses two different Speech-to-Text APIs: BatchRecognize and GetOperation. Conversational Insights makes a BatchRecognizerequest to start the Speech-to-Text transcription and a GetOperation request to monitor whether or not the transcription is finished. To start BatchRecognize operations, a BatchRecognize request is made to use a per-minute, per-region quota. To monitor the operations, a GetOperation request is made to use a per-minute, per-region quota.

For a single UploadConversation call, Conversational Insights consumes one BatchRecognize, but possibly more GetOperation requests, depending on the duration of the task. For a bulk import, Conversational Insights consumes 100 requests of each type.