A glossary is a custom dictionary the Cloud Translation API uses to consistently translate the customer's domain-specific terminology. This typically involves specifying how to translate a named entity.
Glossary use cases may involve:
- Product names: For example, "Google Home" must translate to "Google Home".
- Ambiguous words: For example, the word "bat" can mean a piece of sports equipment or an animal. If you know that you are translating words about sports, you might want to use a glossary to feed the Cloud Translation API the sports translation of "bat", not the translation for the animal.
- Borrowed words: For example, "bouillabaisse" in French translates to "bouillabaisse" in English. English borrowed the word "bouillabaisse" from French in the 19th century. An English speaker lacking French cultural context might not know that bouillabaisse is a fish stew dish. Glossaries can override a translation so that "bouillabaisse" in French translates to "fish stew" in English.
Before you begin
Before you can start using the Cloud Translation API, you must have a project that has the Cloud Translation API enabled and the appropriate credentials. You can also install client libraries for common programming languages to help you make calls to the API.
For more information, see the Setup page.
Required permissions
To work with glossaries, your service account requires glossary-specific
permissions. You can grant a role to your service account by using one of the
pre-defined IAM roles, such as
Cloud Translation API Editor (roles/cloudtranslate.editor
), or you can create a custom
role that grants the necessary
permissions. You can view all of the Cloud Translation API permissions in the
IAM permissions reference.
Translation permissions begin with cloudtranslate
.
For creating glossaries, you also need permissions to read objects in the
Cloud Storage bucket where your glossary file is located. You can grant
a role to your service account by using one of the pre-defined
IAM roles, such as
Storage Object Viewer (roles/storage.objectViewer
), or you can create a
custom role that grants permissions to
read objects.
For information on adding an account to a role, see Granting, changing, and revoking access to resources.
Create a glossary
The terms in a glossary can be single tokens (words) or short phrases (usually fewer than five words). The current limit for the number of separate glossaries is 1000 per project.
The main steps for using a glossary are:
- Create a glossary file
- Create the glossary resource with our Cloud Translation API
- Specify which glossary to use when you request a translation
A project can have multiple glossaries. You can get a list of the available glossaries and can delete glossaries that you no longer need.
Creating a glossary file
Fundamentally, a glossary is a text file in which each line contains corresponding terms in multiple languages. The Cloud Translation API supports both unidirectional glossaries, which specify the desired translation for a single pair of source and target languages, and equivalent term sets, which identify the equivalent terms in multiple languages.
The total number of terms in an glossary input file can't exceed 10.4 million (10,485,760) UTF-8 bytes for all terms in all the languages combined. Any single glossary term must be less than 1024 UTF-8 bytes. Terms longer than 1024 bytes are ignored.
Unidirectional glossaries
The Cloud Translation API accepts TSV, CSV or TMX files.
Tab-separated values (TSV)
The Cloud Translation API supports tab-separated files, where each row has this format:
Term in source language
tabTerm in target language
For example:
account\tcuenta directions\tindicaciones
The tab-separated source data does not include language codes to identify the source and target languages. You identify the source and target language codes when you create the glossary.
Comma-separated values (CSV)
For each row in a CSV file, use a comma (,
) to separate the source language
term and target language term, as shown in the following example:
account,cuenta directions,indicaciones
Do not include a header row in the CSV file to identify the source and target languages. You identify them when you create the glossary.
Translation Memory eXchange (TMX)
Translation Memory eXchange (TMX) is a standard XML format for providing source and target translations. The Cloud Translation API supports input files in a format based on TMX version 1.4. This example illustrates the required structure:
<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE tmx SYSTEM "tmx14.dtd">
<tmx version="1.4">
<header segtype="sentence" o-tmf="UTF-8"
adminlang="en" srclang="en" datatype="PlainText"/>
<body>
<tu>
<tuv xml:lang="en">
<seg>account</seg>
</tuv>
<tuv xml:lang="es">
<seg>cuenta</seg>
</tuv>
</tu>
<tu>
<tuv xml:lang="en">
<seg>directions</seg>
</tuv>
<tuv xml:lang="es">
<seg>indicaciones</seg>
</tuv>
</tu>
</body>
</tmx>
The <header>
element of a well-formed TMX file must identify the source
language using the srclang
attribute, and every <tuv>
element must identify
the language of the contained text using the xml:lang
attribute. You identify
the source and target languages using their ISO-639-1
codes.
All <tu>
elements must contain a pair of <tuv>
elements with the same source
and target languages. If a <tu>
element contains more than two <tuv>
elements, the Cloud Translation API processes only the first <tuv>
matching the
source language and the first matching the target language and ignores the rest.
If a <tu>
element does not have a matching pair of <tuv>
elements, the
Cloud Translation API skips over the invalid <tu>
element.
The Cloud Translation API strips the markup tags from around a <seg>
element
before processing it. If a <tuv>
element contains more than one <seg>
element, the Cloud Translation API concatenates their text into a single element with a
space between them.
If the file contains XML tags other than those shown above, the Cloud Translation API ignores them.
If the file does not conform to proper XML and TMX format – for example,
if it is missing an end tag or a <tmx>
element – the Cloud Translation API
aborts processing it. The Cloud Translation API also aborts processing if it skips
more than 1024 invalid <tu>
elements.
Equivalent term sets (CSV)
For equivalent term sets, Cloud Translation API only accepts files in the CSV format. To define equivalent term sets, create a multi-column CSV file in which each row lists a single glossary term in multiple languages.
The first row in the file is a header row identifying the language for each column, using its ISO-639-1 or BCP-47 language code. You can also include optional columns for part of speech (pos) and a description (description).
Each subsequent row contains equivalent glossary terms in the languages identified in the header. You can leave columns blank if the term is not available in all languages.
Create a glossary resource
Once you have the equivalent glossary terms identified, make the glossary file available to the Cloud Translation API by creating a glossary resource.
Unidirectional glossary
When creating a unidirectional glossary, you must indicate the language pair
(language_pair
), by specifying the source language
(source_language_code
), and the target language
(target_language_code
). The following example uses the REST API and
command line, but you can also use the client
libraries to create a
unidirectional glossary.
REST & CMD LINE
When you create a new glossary, you supply a glossary id (a resource name). For example:projects/my-project/locations/us-central1/glossaries/my-en-to-ru-glossarywhere
my-project
is the project-number-or-id, and my-en-ru-glossary
is the glossary-id provided by you.
Before using any of the request data below, make the following replacements:
- project-number-or-id: your Google Cloud project number or ID
- glossary-id: your glossary ID, for example., my_en_ru_glossary
- bucket-name: name of bucket where your glossary file is located
- glossary-filename: filename of your glossary
HTTP method and URL:
POST https://translation.googleapis.com/v3/projects/project-number-or-id/locations/us-central1/glossaries
Request JSON body:
{ "name":"projects/project-number-or-id/locations/us-central1/glossaries/glossary-id", "languagePair": { "sourceLanguageCode": "en", "targetLanguageCode": "ru" }, "inputConfig": { "gcsSource": { "inputUri": "gs://bucket-name/glossary-filename" } } }
To send your request, choose one of these options:
curl
Save the request body in a file called request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
https://translation.googleapis.com/v3/projects/project-number-or-id/locations/us-central1/glossaries
PowerShell
Save the request body in a file called request.json
,
and execute the following command:
$cred = gcloud auth application-default print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://translation.googleapis.com/v3/projects/project-number-or-id/locations/us-central1/glossaries " | Select-Object -Expand Content
You should receive a JSON response similar to the following:
{ "name": "projects/project-number/locations/us-central1/operations/operation-id", "metadata": { "@type": "type.googleapis.com/google.cloud.translation.v3beta1.CreateGlossaryMetadata", "name": "projects/project-number/locations/us-central1/glossaries/glossary-id", "state": "RUNNING", "submitTime": "2019-11-19T19:05:10.650047636Z" } }
Equivalent term sets glossary
Once you have the glossary terms identified in your equivalent term set, make the glossary file available to the Cloud Translation API by creating a glossary resource.
REST & CMD LINE
Before using any of the request data below, make the following replacements:
- project-number-or-id: your Google Cloud project number or ID
- glossary-id: your glossary ID
- bucket-name: name of bucket where your glossary file is located
- glossary-filename: filename of your glossary
HTTP method and URL:
POST https://translation.googleapis.com/v3/projects/project-number-or-id/locations/us-central1/glossaries
Request JSON body:
{ "name":"projects/project-number-or-id/locations/us-central1/glossaries/glossary-id", "languageCodesSet": { "languageCodes": ["en", "en-GB", "ru", "fr", "pt-BR", "pt-PT", "es"] }, "inputConfig": { "gcsSource": { "inputUri": "gs://bucket-name/glossary-file-name" } } }
To send your request, choose one of these options:
curl
Save the request body in a file called request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
https://translation.googleapis.com/v3/projects/project-number-or-id/locations/us-central1/glossaries
PowerShell
Save the request body in a file called request.json
,
and execute the following command:
$cred = gcloud auth application-default print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://translation.googleapis.com/v3/projects/project-number-or-id/locations/us-central1/glossaries " | Select-Object -Expand Content
You should receive a JSON response similar to the following:
{ "name": "projects/project-number/locations/us-central1/operations/20191103-09061569945989-5d937985-0000-21ac-816d-f4f5e80782d4", "metadata": { "@type": "type.googleapis.com/google.cloud.translation.v3beta1.CreateGlossaryMetadata", "name": "projects/project-number/locations/us-central1/glossaries/glossary-id", "state": "RUNNING", "submitTime": "2019-11-03T16:06:29.134496675Z" } }
Go
Before trying this sample, follow the Go setup instructions in the Translation Quickstart Using Client Libraries. For more information, see the Translation Go API reference documentation.
Java
Before trying this sample, follow the Java setup instructions in the Translation Quickstart Using Client Libraries. For more information, see the Translation Java API reference documentation.
Node.js
Before trying this sample, follow the Node.js setup instructions in the Translation Quickstart Using Client Libraries. For more information, see the Translation Node.js API reference documentation.
Python
Before trying this sample, follow the Python setup instructions in the Translation Quickstart Using Client Libraries. For more information, see the Translation Python API reference documentation.
Operation status
Creating a glossary resource is a long-running operation, so it may take a substantial amount of time to complete. You can poll the status of this operation to see if it has completed, or you can cancel the operation.
For more information, see Long-running operations.
Use glossaries
Translate text with a glossary
In Cloud Translation - Advanced, you explicitly specify which translation model to use for translating the text. You can also identify a glossary to use for domain-specific terminology.
REST & CMD LINE
This example translates text using the default NMT model and a glossary.
Before using any of the request data below, make the following replacements:
- project-number-or-id: your Google Cloud project number or ID
- glossary-id: your glossary ID e.g., my-en-ru-glossary
HTTP method and URL:
POST https://translation.googleapis.com/v3/projects/project-number-or-id/locations/us-central1:translateText
Request JSON body:
{ "sourceLanguageCode": "en", "targetLanguageCode": "ru", "contents": "Dr. Watson, please discard your trash. You've shared unsolicited email with me. Let's talk about spam and importance ranking in a confidential mode.", "glossaryConfig": { "glossary": "projects/project-number/locations/us-central1/glossaries/glossary-id" } }
To send your request, choose one of these options:
curl
Save the request body in a file called request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
https://translation.googleapis.com/v3/projects/project-number-or-id/locations/us-central1:translateText
PowerShell
Save the request body in a file called request.json
,
and execute the following command:
$cred = gcloud auth application-default print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://translation.googleapis.com/v3/projects/project-number-or-id/locations/us-central1:translateText " | Select-Object -Expand Content
You should receive a JSON response similar to the following:
{ "glossaryTranslations": { "translatedText": "Доктор Ватсон, пожалуйста, откажитесь от своего мусора. Вы поделились нежелательной электронной почтой со я . Давайте поговорим о спаме и важности рейтинга в конфиденциальном режиме.", "glossaryConfig": { "glossary": "projects/project-number/locations/us-central1/glossaries/glossary-id" } }, "translations": { "translatedText": "Доктор Ватсон, пожалуйста, откажитесь от своего мусора. Вы поделились нежелательной электронной почтой со мной. Давайте поговорим о спаме и важности рейтинга в конфиденциальном режиме.", } }
The translations
field contains the regular machine translation before the glossary is applied;
the glossaryTranslations
field contains the translation after the glossary is applied.
Go
Before trying this sample, follow the Go setup instructions in the Translation Quickstart Using Client Libraries. For more information, see the Translation Go API reference documentation.
Java
Before trying this sample, follow the Java setup instructions in the Translation Quickstart Using Client Libraries. For more information, see the Translation Java API reference documentation.
Node.js
Before trying this sample, follow the Node.js setup instructions in the Translation Quickstart Using Client Libraries. For more information, see the Translation Node.js API reference documentation.
Python
Before trying this sample, follow the Python setup instructions in the Translation Quickstart Using Client Libraries. For more information, see the Translation Python API reference documentation.
Get information about a glossary
REST & CMD LINE
Before using any of the request data below, make the following replacements:
- project-number-or-id: your Google Cloud project number or ID
- glossary-id: your glossary ID, for example, "my-en-to-ru-glossary"
HTTP method and URL:
GET https://translation.googleapis.com/v3/projects/project-number-or-id/locations/us-central1/glossaries/glossary-id
To send your request, choose one of these options:
curl
Execute the following command:
curl -X GET \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
https://translation.googleapis.com/v3/projects/project-number-or-id/locations/us-central1/glossaries/glossary-id
PowerShell
Execute the following command:
$cred = gcloud auth application-default print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://translation.googleapis.com/v3/projects/project-number-or-id/locations/us-central1/glossaries/glossary-id " | Select-Object -Expand Content
You should receive a JSON response similar to the following:
{ "name": "projects/project-number/locations/us-central1/glossaries/glossary-id", "languagePair": { "sourceLanguageCode": "en", "targetLanguageCode": "ru" }, "inputConfig": { "gcsSource": { "inputUri": "gs://bucket-name/glossary-file-name" } }, "entryCount": 9603 }
Go
Before trying this sample, follow the Go setup instructions in the Translation Quickstart Using Client Libraries. For more information, see the Translation Go API reference documentation.
Java
Before trying this sample, follow the Java setup instructions in the Translation Quickstart Using Client Libraries. For more information, see the Translation Java API reference documentation.
Node.js
Before trying this sample, follow the Node.js setup instructions in the Translation Quickstart Using Client Libraries. For more information, see the Translation Node.js API reference documentation.
Python
Before trying this sample, follow the Python setup instructions in the Translation Quickstart Using Client Libraries. For more information, see the Translation Python API reference documentation.
List glossaries
A project can include numerous glossaries. This section describes how to retrieve a list of the available glossaries for a particular project.
REST & CMD LINE
This example lists all glossaries associated with the specified project.
Before using any of the request data below, make the following replacements:
- project-number-or-id: your Google Cloud project number or ID
HTTP method and URL:
GET https://translation.googleapis.com/v3/projects/project-number-or-id/locations/us-central1/glossaries
To send your request, choose one of these options:
curl
Execute the following command:
curl -X GET \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
https://translation.googleapis.com/v3/projects/project-number-or-id/locations/us-central1/glossaries
PowerShell
Execute the following command:
$cred = gcloud auth application-default print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://translation.googleapis.com/v3/projects/project-number-or-id/locations/us-central1/glossaries " | Select-Object -Expand Content
You should receive a JSON response similar to the following:
{ "glossaries": [ { "name": "projects/project-number/locations/us-central1/glossaries/glossary-id", "languagePair": { "sourceLanguageCode": "en", "targetLanguageCode": "ru" }, "inputConfig": { "gcsSource": { "inputUri": "gs://bucket-name/glossary-file-name" } }, "entryCount": 9603 }, ... ] }
Go
Before trying this sample, follow the Go setup instructions in the Translation Quickstart Using Client Libraries. For more information, see the Translation Go API reference documentation.
Java
Before trying this sample, follow the Java setup instructions in the Translation Quickstart Using Client Libraries. For more information, see the Translation Java API reference documentation.
Node.js
Before trying this sample, follow the Node.js setup instructions in the Translation Quickstart Using Client Libraries. For more information, see the Translation Node.js API reference documentation.
Python
Before trying this sample, follow the Python setup instructions in the Translation Quickstart Using Client Libraries. For more information, see the Translation Python API reference documentation.
Delete a glossary
The following example deletes a glossary.
REST & CMD LINE
Before using any of the request data below, make the following replacements:
- project-number-or-id: your Google Cloud project number or ID
- glossary-id: your glossary ID
HTTP method and URL:
DELETE https://translation.googleapis.com/v3/projects/project-number-or-id/locations/us-central1/glossaries/glossary-id
To send your request, choose one of these options:
curl
Execute the following command:
curl -X DELETE \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
https://translation.googleapis.com/v3/projects/project-number-or-id/locations/us-central1/glossaries/glossary-id
PowerShell
Execute the following command:
$cred = gcloud auth application-default print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method DELETE `
-Headers $headers `
-Uri "https://translation.googleapis.com/v3/projects/project-number-or-id/locations/us-central1/glossaries/glossary-id " | Select-Object -Expand Content
You should receive a successful status code (2xx) and an empty response.
Go
Before trying this sample, follow the Go setup instructions in the Translation Quickstart Using Client Libraries. For more information, see the Translation Go API reference documentation.
Java
Before trying this sample, follow the Java setup instructions in the Translation Quickstart Using Client Libraries. For more information, see the Translation Java API reference documentation.
Node.js
Before trying this sample, follow the Node.js setup instructions in the Translation Quickstart Using Client Libraries. For more information, see the Translation Node.js API reference documentation.
Python
Before trying this sample, follow the Python setup instructions in the Translation Quickstart Using Client Libraries. For more information, see the Translation Python API reference documentation.