CCAI Insights supports conversation creation and analysis using the Console and by using the REST API. Once conversations are uploaded, you can view them along with their respective analysis results in the Console. In order to make uploading a large number of conversations more efficient, CCAI Insights provides customized tooling (a Python script) that allows you to perform a bulk upload.
Prerequisites
- Make sure that the Storage, Speech-to-Text and Insights APIs are enabled on your Google Cloud Platform project.
- Create and analyze a conversation using the Insights API.
- Set up the Google Cloud CLI.
- Set up service account authentication.
Access the import tool
You can access the import using git
. If you have never used git
before, you
might need to first set it up.
To access the client tooling, enter the following command:
git clone "https://partner-code.googlesource.com/ccai-insights-client"
Set up your development environment
The import tooling has some Python package dependencies that must be installed. Run the following commands from the directory where you have saved the downloaded tooling:
python3 -m pip install --user --upgrade pip pip install -r requirements.txt
Set up import tool inputs
The tooling expects a Google Cloud Storage bucket containing the conversations that you want to upload into CCAI Insights. The inputs can be one of the following:
- Chat transcripts supplied as JSON-formatted files that match the CCAI conversation data format.
- Voice transcripts supplied as JSON-formatted files that match the response format from Cloud Speech-to-Text recognition.
- Two-channel audio files of a uniform sample rate and encoding that are supported by Cloud Speech-to-Text.
For audio files, the tooling will automatically transcribe the files and place
the resulting transcripts in the specified dest_gcs_bucket
.
For each audio and transcript file, a conversation will be created in CCAI Insights and analyzed.
Run the import tool
Import tool usage:
usage: import_conversations.py [-h] (--source_local_audio_path SOURCE_LOCAL_AUDIO_PATH | \ --source_audio_gcs_bucket SOURCE_AUDIO_GCS_BUCKET | \ --source_voice_transcript_gcs_bucket SOURCE_VOICE_TRANSCRIPT_GCS_BUCKET | \ --source_chat_transcript_gcs_bucket SOURCE_CHAT_TRANSCRIPT_GCS_BUCKET) [--dest_gcs_bucket DEST_GCS_BUCKET] [--impersonated_service_account IMPERSONATED_SERVICE_ACCOUNT] [--redact REDACT] [--analyze ANALYZE] [--insights_endpoint INSIGHTS_ENDPOINT] [--language_code LANGUAGE_CODE] [--encoding ENCODING] [--sample_rate_hertz SAMPLE_RATE_HERTZ] [--agent_id AGENT_ID] PROJECT
An example command for importing conversations from a Google Cloud Storage bucket containing audio files:
python3 import_conversations.py --source_audio_gcs_bucket my_audios_bucket --dest_gcs_bucket my_transcripts_bucket --encoding MP3 --sample_rate 44100 my-project-id
An example command for importing conversations from a Google Cloud Storage bucket containing chat transcripts:
python3 import_conversations.py --source_chat_transcript_gcs_bucket my_chat_transcripts_bucket my-project-id
An example command for importing conversations from a Google Cloud Storage bucket containing voice transcripts:
python3 import_conversations.py --source_voice_transcript_gcs_bucket my_voice_transcripts_bucket my-project-id
If you are using an impersonated service account, be sure to set up your ability to impersonate that account beforehand. Then, before running the tool, set your default gcloud credential to your user credential by running:
gcloud auth application-default login