Using glossaries (v3beta1)

A glossary is a custom dictionary the Cloud Translation API uses to consistently translate the customer's domain-specific terminology. This typically involves specifying how to translate a named entity. For example, provide a translation for "Google Summer of Code," "Gmail confidential mode," or "placement performance report." Glossaries can be used with AutoML models or the Google model.

Using glossaries

The terms in a glossary can be single tokens (words) or short phrases (usually fewer than five words). The current limit for number of separate glossaries is 1000 per project.

The main steps for using a glossary are:

  1. Create a glossary file
  2. Create the glossary resource with our Cloud Translation API
  3. Specify which glossary to use when you request a translation

A project can have multiple glossaries. You can get a list of the available glossaries and can delete glossaries you no longer need.

Creating a glossary file

Fundamentally, a glossary is a text file in which each line contains corresponding terms in multiple languages. The Cloud Translation API supports both unidirectional glossaries, which specify the desired translation for a single pair of source and target languages, and equivalent term sets, which identify the equivalent terms in multiple languages.

Tab-separated values (.tsv)

The Cloud Translation API supports tab-separated files, where each row has this format:

  • Term in source language tab Term in target language

For example:

account\tcuenta
directions\tindicaciones

The tab-separated source data does not include language codes to identify the source and target languages. You identify the source and target language codes when you create the online glossary.

Translation Memory eXchange (.tmx)

Translation Memory eXchange (TMX) is a standard XML format for providing source and target translations. The Cloud Translation API supports input files in a format based on TMX version version 1.4. This example illustrates the required structure:

<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE tmx SYSTEM "tmx14.dtd">
<tmx version="1.4">
  <header segtype="sentence" o-tmf="UTF-8"
  adminlang="en" srclang="en" datatype="PlainText"/>
  <body>
    <tu>
      <tuv xml:lang="en">
        <seg>account</seg>
      </tuv>
      <tuv xml:lang="es">
        <seg>cuenta</seg>
      </tuv>
    </tu>
    <tu>
      <tuv xml:lang="en">
        <seg>directions</seg>
      </tuv>
      <tuv xml:lang="es">
        <seg>indicaciones</seg>
      </tuv>
    </tu>
  </body>
</tmx>

The <header> element of a well-formed .tmx file must identify the source language using the srclang attribute, and every <tuv> element must identify the language of the contained text using the xml:lang attribute. You identify the source and target languages using their iso-639-1 codes.

All <tu> elements must contain a pair of <tuv> elements with the same source and target languages. If a <tu> element contains more than two <tuv> elements, the Cloud Translation API processes only the first <tuv> matching the source language and the first matching the target language and ignores the rest. If a <tu> element does not have a matching pair of <tuv> elements, the Cloud Translation API skips over the invalid <tu> element.

The Cloud Translation API strips the markup tags from around a <seg> element before processing it. If a <tuv> element contains more than one <seg> element, the Cloud Translation API concatenates their text into a single element with a space between them.

If the file contains XML tags other than those shown above, the Cloud Translation API ignores them.

If the file does not conform to proper XML and TMX format – for example, if it is missing an end tag or a <tmx> element – the Cloud Translation API aborts processing it. The Cloud Translation API also aborts processing if it skips more than 1024 invalid <tu> elements.

Equivalent term sets (.csv)

To define equivalent term sets, you create a multi-column CSV file in which each row lists a single glossary term in multiple languages.

The first row in the file is a header row identifying the language for each column, using its iso-639-1 or BCP-47 language code. You can also include optional columns for part of speech (pos) and a description (description).

Each subsequent row contains equivalent glossary terms in the languages identified in the header. You can leave columns blank if the term is not available in all languages.

Equivalent terms set

Creating a glossary resource

Once you have the equivalent glossary terms identified, you make the glossary available to the Cloud Translation API by creating the online glossary.

curl command

curl -X POST \
     -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
     -H "Content-Type: application/json; charset=utf-8" \
     --data "{
         name: 'projects/project-id/locations/us-central1/glossaries/glossary-id',
         language_pair: {
           source_language_code: 'en',
           target_language_code: 'ru'
         },
         input_config: {
              gcs_source: {
                 input_uri: 'gs://bucket-name/glossary-file-name'
              }
         }
  }" "https://translation.googleapis.com/v3beta1/projects/project-id/locations/us-central1/glossaries"

You should see output similar to the following.

{
  "name": "projects/project-id/locations/us-central1/operations/20190130-10291548872956-5c51de8a-0000-2f98-8884-001a114a77aa",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.translation.v3beta1.CreateGlossaryMetadata"",
    "name": "projects/project-id/locations/us-central1/glossaries/glossary-id",
    "state": "RUNNING",
    "submitTime": "2019-01-30T18:29:16.918633642Z"
  }
}

You can check the status of creating a glossary using the operation ID from the response. In the command below, replace operation-id with the operation ID returned by the request.

curl -X GET \
  -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
  -H "Content-Type: application/json" \
  https://translation.googleapis.com/v3beta1/projects/project-id/locations/us-central1/operations/operation-id

You should see output similar to the following for a create glossary operation:

{
  "name": "projects/project-id/locations/us-central1/operations/20190130-10291548872956-5c51de8a-0000-2f98-8884-001a114a77aa",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.translation.v3beta1.CreateGlossaryMetadata",
    "name": "projects/project-id/locations/us-central1/glossaries/glossary-id",
    "state": "SUCCEEDED",
    "submitTime": "2019-01-30T18:29:16.918633642Z"
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.cloud.translation.v3beta1.Glossary",
    "name": "projects/project-id/locations/us-central1/glossaries/glossary-id",
    "languagePair": {
      "sourceLanguageCode": "en",
      "targetLanguageCode": "ru"
    },
    "inputConfig": {
      "gcsSource": {
        "inputUri": "gs://bucket-name/glossary-file-name"
      }
    },
  "entryCount": 9603
  }
}

Java

Before trying this sample, follow the Java setup instructions in the Translation Quickstart Using Client Libraries . For more information, see the Translation Java API reference documentation .

static Glossary createGlossary(String projectId, String location, String name, String gcsUri) {
  try (TranslationServiceClient translationServiceClient = TranslationServiceClient.create()) {

    LocationName locationName =
        LocationName.newBuilder().setProject(projectId).setLocation(location).build();
    LanguageCodesSet languageCodesSet =
        LanguageCodesSet.newBuilder().addLanguageCodes("en").addLanguageCodes("es").build();
    GcsSource gcsSource = GcsSource.newBuilder().setInputUri(gcsUri).build();
    GlossaryInputConfig glossaryInputConfig =
        GlossaryInputConfig.newBuilder().setGcsSource(gcsSource).build();
    GlossaryName glossaryName =
        GlossaryName.newBuilder()
            .setProject(projectId)
            .setLocation(location)
            .setGlossary(name)
            .build();
    Glossary glossary =
        Glossary.newBuilder()
            .setLanguageCodesSet(languageCodesSet)
            .setInputConfig(glossaryInputConfig)
            .setName(glossaryName.toString())
            .build();
    CreateGlossaryRequest request =
        CreateGlossaryRequest.newBuilder()
            .setParent(locationName.toString())
            .setGlossary(glossary)
            .build();

    // Call the API
    Glossary response =
        translationServiceClient.createGlossaryAsync(request).get(300, TimeUnit.SECONDS);
    System.out.format("Created: %s\n", response.getName());
    System.out.format("Input Uri: %s\n", response.getInputConfig().getGcsSource());
    return response;

  } catch (Exception e) {
    throw new RuntimeException("Couldn't create client.", e);
  }
}

Node.js

Before trying this sample, follow the Node.js setup instructions in the Translation Quickstart Using Client Libraries . For more information, see the Translation Node.js API reference documentation .

/**
 * TODO(developer): Uncomment these variables before running the sample.
 */
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'global';

// Imports the Google Cloud Translation library
const {TranslationServiceClient} = require('@google-cloud/translate').v3beta1;

// Instantiates a client
const translationClient = new TranslationServiceClient();

async function createGlossary() {
  // Construct glossary
  const glossary = {
    languageCodesSet: {
      languageCodes: ['en', 'es'],
    },
    inputConfig: {
      gcsSource: {
        inputUri: 'gs://cloud-samples-data/translation/glossary.csv',
      },
    },
    name: translationClient.glossaryPath(projectId, location, glossaryId),
  };

  // Construct request
  const request = {
    parent: translationClient.locationPath(projectId, location),
    glossary: glossary,
  };

  // Create glossary using a long-running operation.
  // You can wait for now, or get results later.
  const [operation] = await translationClient.createGlossary(request);

  // Wait for operation to complete.
  await operation.promise();

  console.log(`Created glossary:`);
  console.log(`InputUri ${request.glossary.inputConfig.gcsSource.inputUri}`);
}

createGlossary();

Python

Before trying this sample, follow the Python setup instructions in the Translation Quickstart Using Client Libraries . For more information, see the Translation Python API reference documentation .

from google.cloud import translate_v3beta1 as translate
client = translate.TranslationServiceClient()

# project_id = 'YOUR_PROJECT_ID'
# glossary_id = 'glossary-id'
location = 'us-central1'  # The location of the glossary

name = client.glossary_path(
    project_id,
    location,
    glossary_id)

language_codes_set = translate.types.Glossary.LanguageCodesSet(
    language_codes=['en', 'es'])

gcs_source = translate.types.GcsSource(
    input_uri='gs://cloud-samples-data/translation/glossary.csv')

input_config = translate.types.GlossaryInputConfig(
    gcs_source=gcs_source)

glossary = translate.types.Glossary(
    name=name,
    language_codes_set=language_codes_set,
    input_config=input_config)

parent = client.location_path(project_id, location)

operation = client.create_glossary(parent=parent, glossary=glossary)

result = operation.result(timeout=90)
print('Created: {}'.format(result.name))
print('Input Uri: {}'.format(result.input_config.gcs_source.input_uri))

Translating text with a glossary

In Cloud Translation v3, you explicitly specify which translation model to use for translating the text. You can also identify a glossary to use for domain-specific terminology.

curl command

This example translates text using the default NMT model and a glossary. Replace project-id and glossary-id with the IDs for your project and glossary.

curl -X POST \
     -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
     -H "Content-Type: application/json; charset=utf-8" \
     --data "{
         source_language_code: 'en',
         target_language_code: 'ru',
         contents: 'Dr. Watson, please discard your trash. You\'ve shared unsolicited email with me. \
                   Let\'s talk about spam and importance ranking in a confidential mode.',
         glossary_config: {
           glossary: 'projects/project-id/locations/us-central1/glossaries/glossary-id'
         }
     }" "https://translation.googleapis.com/v3beta1/projects/project-id/locations/us-central1:translateText"

If the request is successful, the server returns a 200 OK HTTP status code and the response in JSON format:

{
  "glossaryTranslations": {
    "translatedText": "Доктор Ватсон, пожалуйста, откажитесь от своего мусора. Вы поделились нежелательной электронной почтой со я . Давайте поговорим о спаме и важности рейтинга в конфиденциальном режиме.",
    "glossaryConfig": {
      "glossary": "projects/project-id/locations/us-central1/glossaries/glossary-id"
    }
  },
  "translations": {
    "translatedText": "Доктор Ватсон, пожалуйста, откажитесь от своего мусора. Вы поделились нежелательной электронной почтой со мной. Давайте поговорим о спаме и важности рейтинга в конфиденциальном режиме.",
  }
}

The translations field contains the regular machine translation before the glossary is applied; the glossaryTranslations field contains the translation after the glossary is applied.

Java

Before trying this sample, follow the Java setup instructions in the Translation Quickstart Using Client Libraries . For more information, see the Translation Java API reference documentation .

static TranslateTextResponse translateTextWithGlossary(
    String projectId,
    String location,
    String name,
    String text,
    String sourceLanguageCode,
    String targetLanguageCode) {
  try (TranslationServiceClient translationServiceClient = TranslationServiceClient.create()) {

    LocationName locationName =
        LocationName.newBuilder().setProject(projectId).setLocation(location).build();
    GlossaryName glossaryName =
        GlossaryName.newBuilder()
            .setProject(projectId)
            .setLocation(location)
            .setGlossary(name)
            .build();
    TranslateTextGlossaryConfig translateTextGlossaryConfig =
        TranslateTextGlossaryConfig.newBuilder().setGlossary(glossaryName.toString()).build();
    TranslateTextRequest translateTextRequest =
        TranslateTextRequest.newBuilder()
            .setParent(locationName.toString())
            .setMimeType("text/plain")
            .setSourceLanguageCode(sourceLanguageCode)
            .setTargetLanguageCode(targetLanguageCode)
            .addContents(text)
            .setGlossaryConfig(translateTextGlossaryConfig)
            .build();

    // Call the API
    TranslateTextResponse response = translationServiceClient.translateText(translateTextRequest);
    System.out.format(
        "Translated text: %s", response.getTranslationsList().get(0).getTranslatedText());
    return response;

  } catch (Exception e) {
    throw new RuntimeException("Couldn't create client.", e);
  }
}

Node.js

Before trying this sample, follow the Node.js setup instructions in the Translation Quickstart Using Client Libraries . For more information, see the Translation Node.js API reference documentation .

/**
 * TODO(developer): Uncomment these variables before running the sample.
 */
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'global';

// Imports the Google Cloud Translation library
const {TranslationServiceClient} = require('@google-cloud/translate').v3beta1;

// Instantiates a client
const translationClient = new TranslationServiceClient();
async function translateTextWithGlossary() {
  const glossary = translationClient.glossaryPath(
    projectId,
    location,
    glossaryId
  );
  const glossaryConfig = {
    glossary: glossary,
  };
  // Construct request
  const request = {
    parent: translationClient.locationPath(projectId, location),
    contents: [text],
    mimeType: 'text/plain', // mime types: text/plain, text/html
    sourceLanguageCode: 'en-US',
    targetLanguageCode: 'es',
    glossaryConfig: glossaryConfig,
  };

  // Run request
  const [response] = await translationClient.translateText(request);

  for (const translation of response.translations) {
    console.log(`Translation: ${translation.translatedText}`);
  }
}

translateTextWithGlossary();

Python

Before trying this sample, follow the Python setup instructions in the Translation Quickstart Using Client Libraries . For more information, see the Translation Python API reference documentation .

from google.cloud import translate_v3beta1 as translate
client = translate.TranslationServiceClient()

# project_id = 'YOUR_PROJECT_ID'
# glossary_id = 'GLOSSARY_ID'
# text = 'Text you wish to translate'
location = 'us-central1'  # The location of the glossary

glossary = client.glossary_path(
    project_id,
    'us-central1',  # The location of the glossary
    glossary_id)

glossary_config = translate.types.TranslateTextGlossaryConfig(
    glossary=glossary)

parent = client.location_path(project_id, location)

result = client.translate_text(
    parent=parent,
    contents=[text],
    mime_type='text/plain',  # mime types: text/plain, text/html
    source_language_code='en',
    target_language_code='es',
    glossary_config=glossary_config)

for translation in result.translations:
    print(translation)

Getting information about a glossary

curl command

curl -X GET \
  -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
  -H "Content-Type: application/json" \
  "https://translation.googleapis.com/v3beta1/projects/project-id/locations/us-central1/glossaries/glossary-id"

You should see output similar to the following:

{
  "name": "projects/project-id/locations/us-central1/glossaries/glossary-id",
  "languagePair": {
    "sourceLanguageCode": "en",
    "targetLanguageCode": "ru"
  },
  "inputConfig": {
    "gcsSource": {
      "inputUri": "gs://bucket-name/glossary-file-name"
    }
  },
  "entryCount": 9603
}

Java

Before trying this sample, follow the Java setup instructions in the Translation Quickstart Using Client Libraries . For more information, see the Translation Java API reference documentation .

static Glossary getGlossary(String projectId, String location, String name) {
  try (TranslationServiceClient translationServiceClient = TranslationServiceClient.create()) {

    GlossaryName glossaryName =
        GlossaryName.newBuilder()
            .setProject(projectId)
            .setLocation(location)
            .setGlossary(name)
            .build();

    // Call the API
    Glossary response = translationServiceClient.getGlossary(glossaryName.toString());
    System.out.format("Got: %s\n", response.getName());
    return response;

  } catch (Exception e) {
    throw new RuntimeException("Couldn't create client.", e);
  }
}

Node.js

Before trying this sample, follow the Node.js setup instructions in the Translation Quickstart Using Client Libraries . For more information, see the Translation Node.js API reference documentation .

/**
 * TODO(developer): Uncomment these variables before running the sample.
 */
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'global';

// Imports the Google Cloud Translation library
const {TranslationServiceClient} = require('@google-cloud/translate').v3beta1;

// Instantiates a client
const translationClient = new TranslationServiceClient();

async function getGlossary() {
  // Construct request
  const name = translationClient.glossaryPath(
    projectId,
    location,
    glossaryId
  );
  const request = {
    parent: translationClient.locationPath(projectId, location),
    name: name,
  };

  // Get glossary
  const [response] = await translationClient.getGlossary(request);

  console.log(`Got glossary: ${response.name}`);
}

getGlossary();

Python

Before trying this sample, follow the Python setup instructions in the Translation Quickstart Using Client Libraries . For more information, see the Translation Python API reference documentation .

from google.cloud import translate_v3beta1 as translate
client = translate.TranslationServiceClient()

# project_id = 'YOUR_PROJECT_ID'
# glossary_id = 'GLOSSARY_ID'

parent = client.glossary_path(
    project_id,
    'us-central1',  # The location of the glossary
    glossary_id)

response = client.get_glossary(parent)
print('Name: {}'.format(response.name))
print('Language Pair:')
print('\tSource Language Code: {}'.format(
    response.language_pair.source_language_code))
print('\tTarget Language Code: {}'.format(
    response.language_pair.target_language_code))
print('Input Uri: {}'.format(
    response.input_config.gcs_source.input_uri))

Listing glossaries

A project can include numerous glossaries. This section describes how to retrieve a list of the available glossaries for a project.

Java

Before trying this sample, follow the Java setup instructions in the Translation Quickstart Using Client Libraries . For more information, see the Translation Java API reference documentation .

static ListGlossariesPagedResponse listGlossary(
    String projectId, String location, String filter) {

  try (TranslationServiceClient translationServiceClient = TranslationServiceClient.create()) {

    LocationName locationName =
        LocationName.newBuilder().setProject(projectId).setLocation(location).build();

    ListGlossariesPagedResponse response =
        translationServiceClient.listGlossaries(locationName.toString(), filter);

    // Call the API
    for (Glossary element : response.iterateAll()) {
      System.out.format("Name: %s\n", element.getName());
      System.out.format("Language Codes Set:\n");
      System.out.format(
          "Source Language Code: %s\n",
          element.getLanguageCodesSet().getLanguageCodesList().get(0));
      System.out.format(
          "Target Language Code: %s\n",
          element.getLanguageCodesSet().getLanguageCodesList().get(1));
      System.out.format("Input Uri: %s\n", element.getInputConfig().getGcsSource());
    }
    return response;
  } catch (Exception e) {
    throw new RuntimeException("Couldn't create client.", e);
  }
}

Node.js

Before trying this sample, follow the Node.js setup instructions in the Translation Quickstart Using Client Libraries . For more information, see the Translation Node.js API reference documentation .

/**
 * TODO(developer): Uncomment these variables before running the sample.
 */
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'global';

// Imports the Google Cloud Translation library
const {TranslationServiceClient} = require('@google-cloud/translate').v3beta1;

// Instantiates a client
const translationClient = new TranslationServiceClient();

async function listGlossaries() {
  // Construct request
  const request = {
    parent: translationClient.locationPath(projectId, location),
  };

  // Run request
  const [response] = await translationClient.listGlossaries(request);

  for (const glossary of response) {
    console.log(`Name: ${glossary.name}`);
    console.log(`Entry count: ${glossary.entryCount}`);
    console.log(`Input uri: ${glossary.inputConfig.gcsSource.inputUri}`);
    for (const languageCode of glossary.languageCodesSet.languageCodes) {
      console.log(`Language code: ${languageCode}`);
    }
  }
}

listGlossaries();

Python

Before trying this sample, follow the Python setup instructions in the Translation Quickstart Using Client Libraries . For more information, see the Translation Python API reference documentation .

from google.cloud import translate_v3beta1 as translate
client = translate.TranslationServiceClient()

# project_id = 'YOUR_PROJECT_ID'
location = 'us-central1'  # The location of the glossary

parent = client.location_path(project_id, location)

for glossary in client.list_glossaries(parent):
    print('Name: {}'.format(glossary.name))
    print('Entry count: {}'.format(glossary.entry_count))
    print('Input uri: {}'.format(
        glossary.input_config.gcs_source.input_uri))
    for language_code in glossary.language_codes_set.language_codes:
        print('Language code: {}'.format(language_code))

Deleting a glossary

The following example deletes a glossary.

curl command

curl -X DELETE \
    -H "Authorization: Bearer $(gcloud auth application-default  \
print-access-token)" \
    -H "Content-Type: application/json" \
"https://translation.googleapis.com/v3beta1/projects/project-id/locations/us-central1/glossaries/glossary-id"

Java

Before trying this sample, follow the Java setup instructions in the Translation Quickstart Using Client Libraries . For more information, see the Translation Java API reference documentation .

static DeleteGlossaryResponse deleteGlossary(String projectId, String location, String name) {
  try (TranslationServiceClient translationServiceClient = TranslationServiceClient.create()) {

    GlossaryName glossaryName =
        GlossaryName.newBuilder()
            .setProject(projectId)
            .setLocation(location)
            .setGlossary(name)
            .build();

    // Call the API
    DeleteGlossaryResponse response =
        translationServiceClient
            .deleteGlossaryAsync(glossaryName.toString())
            .get(300, TimeUnit.SECONDS);

    System.out.format("Deleted: %s\n", response.getName());
    return response;
  } catch (Exception e) {
    throw new RuntimeException("Couldn't create client.", e);
  }
}

Node.js

Before trying this sample, follow the Node.js setup instructions in the Translation Quickstart Using Client Libraries . For more information, see the Translation Node.js API reference documentation .

/**
 * TODO(developer): Uncomment these variables before running the sample.
 */
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'global';

// Imports the Google Cloud Translation library
const {TranslationServiceClient} = require('@google-cloud/translate').v3beta1;

// Instantiates a client
const translationClient = new TranslationServiceClient();

async function deleteGlossary() {
  // Construct request
  const name = translationClient.glossaryPath(
    projectId,
    location,
    glossaryId
  );
  const request = {
    parent: translationClient.locationPath(projectId, location),
    name: name,
  };

  // Delete glossary using a long-running operation.
  // You can wait for now, or get results later.
  const [operation] = await translationClient.deleteGlossary(request);

  // Wait for operation to complete.
  const [response] = await operation.promise();

  console.log(`Deleted glossary: ${response.name}`);
}

deleteGlossary();

Python

Before trying this sample, follow the Python setup instructions in the Translation Quickstart Using Client Libraries . For more information, see the Translation Python API reference documentation .

from google.cloud import translate_v3beta1 as translate
client = translate.TranslationServiceClient()

# project_id = 'YOUR_PROJECT_ID'
# glossary_id = 'GLOSSARY_ID'

parent = client.glossary_path(
    project_id,
    'us-central1',  # The location of the glossary
    glossary_id)

operation = client.delete_glossary(parent)
result = operation.result(timeout=90)
print('Deleted: {}'.format(result.name))

Var denne side nyttig? Giv os en anmeldelse af den:

Send feedback om...

Cloud Translation
Har du brug for hjælp? Besøg vores supportside.