Objectives
Pass text recognized by the Cloud Vision API to the Cloud Translation API.
Create and use Cloud Translation glossaries to personalize Cloud Translation API translations.
Create an audio representation of translated text using the Text-to-Speech API.
Costs
Each Google Cloud API uses a separate pricing structure.For pricing details, refer to the Cloud Vision pricing guide, the Cloud Translation pricing guide, and the Text-to-Speech pricing guide.
Before you begin
Make sure that you have:- A project in the Google Cloud console with the Vision API, the Cloud Translation API, and the Text-to-Speech API enabled
- A basic familiarity with Python programming
Downloading the code samples
This tutorial uses code in the samples/snippets/hybrid_glossaries
directory of the
Cloud Client Libraries for Python.
To download and navigate to the code for this tutorial, run the following commands from the terminal.
git clone https://github.com/googleapis/python-translate.git cd samples/snippets/hybrid_glossaries/
Setting up client libraries
This tutorial uses Vision, Translation, and Text-to-Speech client libraries.
To install the relevant client libraries, run the following commands from the terminal.
pip install --upgrade google-cloud-vision pip install --upgrade google-cloud-translate pip install --upgrade google-cloud-texttospeech
Setting up permissions for glossary creation
Creating Translation glossaries requires using a service account key with "Cloud Translation API Editor" permissions.
To set up a service account key with Cloud Translation API Editor permissions, do the following:
Create a service account:
In the Google Cloud console, go to the Service Accounts page.
Select your project.
Click
Create Service Account.In the Service account name field, enter a name. The Google Cloud console fills in the Service account ID field based on this name.
Optional: In the Service account description field, enter a description for the service account.
Click Create and continue.
Click the Select a role field and select Cloud Translation > Cloud Translation API Editor
Click Done to finish creating the service account.
Do not close your browser window. You will use it in the next step.
Download a JSON key for the service account you just created:
- In the Google Cloud console, click the email address for the service account that you created.
- Click Keys.
- Click Add key, then click Create new key.
Click Create. A JSON key file is downloaded to your computer.
Make sure to store the key file securely, because it can be used to authenticate as your service account. You can move and rename this file however you would like.
Click Close.
From the
hybrid_glossaries
folder in terminal, set theGOOGLE_APPLICATION_CREDENTIALS
variable using the following command. Replacepath_to_key
with the path to the downloaded JSON file containing your new service account key.Linux or macOS
export GOOGLE_APPLICATION_CREDENTIALS=path_to_key
Windows
set GOOGLE_APPLICATION_CREDENTIALS=path_to_key
Importing libraries
This tutorial uses the following system imports and client library imports.
Setting your project ID
You must associate a Google Cloud project with
each request to a Google Cloud API. Designate your Google Cloud project
by setting the GOOGLE_CLOUD_PROJECT
environment variable from the
terminal.
In the following command, replace PROJECT_NUMBER_OR_ID with your Google Cloud project number or ID. Run the following command from the terminal.
Linux or macOS
export GOOGLE_CLOUD_PROJECT=PROJECT_NUMBER_OR_ID
Windows
set GOOGLE_CLOUD_PROJECT=PROJECT_NUMBER_OR_ID
This tutorial uses the following global project ID variable.
Using Vision to detect text from an image
Use the Vision API to detect and extract text from an image.
The Vision API uses Optical Character Recognition (OCR)
to support two text-detection features:
detection of dense text, or DOCUMENT_TEXT_DETECTION
,
and sparse text detection, or
TEXT_DETECTION
.
The following code shows how to use the Vision API
DOCUMENT_TEXT_DETECTION
feature to detect
text in a photo with dense text.
Using Translation with glossaries
After extracting text from an image, use Translation glossaries to personalize the translation of the extracted text. Glossaries provide pre-defined translations that override the Cloud Translation API translations of designated terms.
Glossary use cases include:
Product names: For example, 'Google Home' must translate to 'Google Home'.
Ambiguous words: For example, the word 'bat' can mean a piece of sports equipment or an animal. If you know that you are translating words about sports, you might want to use a glossary to feed the Cloud Translation API the sports translation of 'bat', not the animal translation.
Borrowed words: For example, 'bouillabaisse' in French translates to 'bouillabaisse' in English; the English language borrowed the word 'bouillabaisse' from the French language. An English speaker lacking French cultural context might not know that bouillabaisse is a French fish stew dish. Glossaries can override a translation so that 'bouillabaisse' in French translates to 'fish stew' in English.
Making a glossary file
The Cloud Translation API accepts TSV, CSV, or TMX glossary files. This tutorial uses a CSV file uploaded to Cloud Storage to define sets of equivalent terms.
To make a glossary CSV file:
Designate the language of a column using either ISO-639 or BCP-47 language codes in the first row of the CSV file.
fr,en,
List pairs of equivalent terms in each row of the CSV file. Separate terms with commas. The following example defines the English translation for several culinary French words.
fr,en, chèvre,goat cheese, crème brulée,crème brulée, bouillabaisse,fish stew, steak frites,steak with french fries,
Define variants of a word. The Cloud Translation API is case-sensitive and sensitive to special characters such as accented words. Ensure that your glossary handles variations on a word by explicitly defining different spellings of the word.
fr,en, chevre,goat cheese, Chevre,Goat cheese, chèvre,goat cheese, Chèvre,Goat cheese, crème brulée,crème brulée, Crème brulée,Crème brulée, Crème Brulée,Crème Brulée, bouillabaisse,fish stew, Bouillabaisse,Fish stew, steak frites,steak with french fries, Steak frites,Steak with french fries, Steak Frites,Steak with French Fries,
Upload the glossary to a Cloud Storage bucket. For the purposes of this tutorial, you do not need to upload a glossary file to a Cloud Storage bucket nor do you need to create a Cloud Storage bucket. Instead, use the publicly-available glossary file created for this tutorial to avoid incurring any Cloud Storage costs. Send the URI of a glossary file in Cloud Storage to the Cloud Translation API to create a glossary resource. The URI of the publicly-available glossary file for this tutorial is gs://cloud-samples-data/translation/bistro_glossary.csv. To download the glossary, click on the above URI link, but do not open it in a new tab.
Creating a glossary resource
In order to use a glossary, you must create a glossary resource with the Cloud Translation API. To create a glossary resource, send the URI of a glossary file in Cloud Storage to the Cloud Translation API.
Make sure that you are using a service account key with "Cloud Translation API Editor" permissions and make sure that you have set your project ID from the terminal.
The following function creates a glossary resource. With this glossary resource, you can personalize the translation request in the next step of this tutorial.
Translating with glossaries
Once you create a glossary resource, you can use the glossary resource to personalize translations of text that you send to the Cloud Translation API.
The following function uses your previously-created glossary resource to personalize the translation of text.
Using Text-to-Speech with Speech Synthesis Markup Language
Now that you have personalized a translation of image-detected text, you are ready to use the Text-to-Speech API. The Text-to-Speech API can create synthetic audio of your translated text.
The Text-to-Speech API generates synthetic audio from either a string of plain text or a string of text marked up with Speech Synthesis Markup Language (SSML). SSML is a markup language which supports annotating text with SSML tags. You can use SSML tags to influence how the Text-to-Speech API formats synthetic speech creation.
The following function converts a string of SSML to an MP3 file of synthetic speech.
Putting it all together
In the previous steps, you defined functions in hybrid_glossaries.py
that use Vision,
Translation, and Text-to-Speech. Now, you are ready
to use these functions to generate synthetic speech of translated text from
the following photo.
The following code calls functions defined in hybrid_glossaries.py
to:
create a Cloud Translation API glossary resource
use the Vision API to detect text in the above image
perform a Cloud Translation API glossary translation of the detected text
generate Text-to-Speech synthetic speech of the translated text
Running the code
To run the code, enter the following command in terminal in your cloned hybrid_glossaries directory:
python hybrid_tutorial.py
The following output appears:
Created glossary bistro-glossary. Audio content written to file resources/example.mp3
After running hybrid_glossaries.py
, navigate into the resources
directory from the hybrid_glossaries directory.
Check the resources directory for an example.mp3
file.
Listen to the following audio clip to check that your example.mp3
file
sounds the same.
Troubleshooting error messages
403 IAM permission 'cloudtranslate.glossaries.create' denied.
Using a service account key without "Cloud Translation API Editor" permissions raises this exception.
KeyError: 'GOOGLE_CLOUD_PROJECT'
Not setting your GOOGLE_CLOUD_PROJECT variable generates this error.
400 Invalid resource name project id
Either using a glossary name which contains characters other than lowercase letters, digits, periods, a colon, or hyphens, or using a service account key without "Cloud Translation API Editor" permissions raises this exception.
File filename was not found.
Setting the GOOGLE_APPLICATION_CREDENTIALS variable to an invalid filepath raises this exception.
Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application
Not setting the GOOGLE_APPLICATION_CREDENTIALS variable raises this exception.
Forbidden: 403 POST API has not been used or is disabled
Calling the Cloud Translation API, the Cloud Vision API, or the Text-to-Speech API without enabling their APIs generates this warning.
AttributeError: 'module' object has no attribute 'escape'
Python 2.7.10 or earlier is not compatible with
HTML
. To fix this error, use a Python virtual environment. The virtual environment will use the newest version of Python.UnicodeEncodeError
Python 2.7.10 or earlier is not compatible with
HTML
. To fix this error, use a Python virtual environment. The virtual environment will use the newest version of Python.
Cleaning up
Use the Google Cloud console to delete your project if you do not need it. Deleting your project prevents incurring additional charges to your Cloud Billing account for the resources used in this tutorial.
Deleting your project
- In the Google Cloud console, go to the Projects page.
- In the project list, select the project you want to delete and click Delete.
- In the dialog box, type the project ID, and click Shut down to delete the project.
What's next
Congratulations! You just used Vision OCR to detect text in an image. Then, you created a Translation glossary and performed a translated with that glossary. Afterwards, you used Text-to-Speech to generate synthetic audio of the translated text.
To build on your knowledge of Vision, Cloud Translation, and Text-to-Speech:
- Make your own glossary. Learn how to create a Cloud Storage bucket and to upload your glossary CSV file to the bucket.
- Experiment with other ways to use Translation glossaries.
- Learn how to use Cloud Storage with Cloud Vision OCR.
- Learn more about how to use SSML with Text-to-Speech.
- Learn how to use the Vision API
imageContext
field to pass along additional context about a photo when using Vision OCR. - Explore community tutorials.