Translate documents

Cloud Translation - Advanced provides a Document Translation API for directly translating documents in formats such as PDF and DOCX. Compared to plain text translations, Document Translation preserves the original formatting and layout in your translated documents, helping you retain much of the original context like paragraph breaks.

The following sections describe how to translate documents and use Document Translation with other Cloud Translation - Advanced features like glossaries and AutoML Translation models. Document Translation support both online and batch translation requests.

Supported file formats

Document Translation support the following file types.

Inputs Document MIME type
DOCX application/vnd.openxmlformats-officedocument.wordprocessingml.document
PDF* application/pdf
PPTX application/vnd.openxmlformats-officedocument.presentationml.presentation
XLSX application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

*Document Translation supports both native and scanned PDF documents.

Generally, DOCX and PPTX file translations preserve the original layout better compared to PDF file translations. For PDF file translations, if you have the original content in those file formats, we recommend that you translate those files and then convert them to PDF files.

Before you begin

Before you can start using the Cloud Translation API, you must have a project that has the Cloud Translation API enabled and the appropriate credentials. You can also install client libraries for common programming languages to help you make calls to the API.

For more information, see the Setup page.

Required permissions

For requests that require Cloud Storage access, such as batch Document Translation, you might require Cloud Storage permissions to read input files or send output files to a bucket. For example, to read input files from a bucket, you must have at least read object permissions (provided by the role roles/storage.objectViewer) on the bucket. For more information about Cloud Storage roles, see the Cloud Storage documentation.

Translate documents (online)

Online translation provides real-time processing (synchronous processing) of a single file. For PDFs, the file size can be up to 20 MB and up to 20 pages. For other document types, the file sizes can be up to 20 MB with no page limits.

Translate a document from Cloud Storage

The following example translates a file from a Cloud Storage bucket and outputs the result to a Cloud Storage bucket. The response also returns a byte stream. You can specify the MIME type; if you don't, Document Translation determines it by using the input file's extension.

REST & CMD LINE

Before using any of the request data below, make the following replacements:

  • PROJECT_NUMBER_OR_ID: Your Google Cloud project number or ID
  • LOCATION: Region where you want to run this operation. For example, us-central1.
  • SOURCE_LANGUAGE: (Optional) The language code of the input document. If known, set to one of the language codes listed in Language support.
  • TARGET_LANGUAGE: The target language to translate the input document to. Set to one of the language codes listed in Language support.
  • INPUT_FILE_PATH: The Cloud Storage location and file name of the input document.
  • OUTPUT_FILE_PREFIX: The Cloud Storage location where the output document will be stored.

HTTP method and URL:

POST https://translation.googleapis.com/v3beta1/projects/PROJECT_NUMBER_OR_ID/locations/LOCATION:translateDocument

Request JSON body:

{
  "source_language_code": "SOURCE_LANGUAGE",
  "target_language_code": "TARGET_LANGUAGE",
  "document_input_config": {
    "gcsSource": {
      "inputUri": "gs://INPUT_FILE_PATH"
    }
  },
  "document_output_config": {
    "gcsDestination": {
      "outputUriPrefix": "gs://OUTPUT_FILE_PREFIX"
    }
  }
}

To send your request, expand one of these options:

You should receive a JSON response similar to the following:

{
  "documentTranslation": {
    "byteStreamOutputs": ["BYTE_STREAM"],
    "mimeType": "MIME_TYPE"
  },
  "model": "projects/PROJECT_NUMBER/locations/LOCATION/models/general/nmt"
}

Translate a document inline

The following example sends a document inline as part of the request. You must include the MIME type for inline document translations.

REST & CMD LINE

Before using any of the request data below, make the following replacements:

  • PROJECT_NUMBER_OR_ID: Your Google Cloud project number or ID
  • LOCATION: Region where you want to run this operation. For example, us-central1.
  • SOURCE_LANGUAGE: (Optional) The language code of the input document. If known, set to one of the language codes listed in Language support.
  • TARGET_LANGUAGE: The target language to translate the input document to. Set to one of the language codes listed in Language support.
  • MIME_TYPE: The format of the source document, such as application/pdf.
  • INPUT_BYTE_STREAM: The input document's content represented as a stream of bytes.
  • OUTPUT_FILE_PREFIX: The Cloud Storage location where the output document will be stored.

HTTP method and URL:

POST https://translation.googleapis.com/v3beta1/projects/PROJECT_NUMBER_OR_ID/locations/LOCATION:translateDocument

Request JSON body:

{
  "source_language_code": "SOURCE_LANGUAGE",
  "target_language_code": "TARGET_LANGUAGE",
  "document_input_config": {
    "mimeType": "MIME_TYPE",
    "content": "INPUT_BYTE_STREAM"
  },
  "document_output_config": {
    "gcsDestination": {
      "outputUriPrefix": "gs://OUTPUT_FILE_PREFIX"
    }
  }
}

To send your request, expand one of these options:

You should receive a JSON response similar to the following:

{
  "documentTranslation": {
    "byteStreamOutputs": ["BYTE_STREAM"],
    "mimeType": "MIME_TYPE"
  },
  "model": "projects/PROJECT_NUMBER/locations/LOCATION/models/general/nmt"
}

Use an AutoML model or a glossary

Instead of the Google-managed model, you can use your own AutoML Translation models to translate documents. In addition to specifying a model, you can also include a glossary to handle domain-specific terminology. Note that if you specify a model or a glossary, you must specify the source language. The following example uses an AutoML model and a glossary. If the model or glossary are in a different project, you must have the corresponding IAM permission to access those resources.

REST & CMD LINE

Before using any of the request data below, make the following replacements:

  • PROJECT_NUMBER_OR_ID: Your Google Cloud project number or ID
  • LOCATION: Region where you want to run this operation, such as us-central1. The location must match the region where your model, glossary, or both are located.
  • SOURCE_LANGUAGE: The language code of the input document. Set to one of the language codes listed in Language support.
  • TARGET_LANGUAGE: The target language to translate the input document to. Set to one of the language codes listed in Language support.
  • INPUT_FILE_PATH: The Cloud Storage location and file name of the input document.
  • OUTPUT_FILE_PREFIX: The Cloud Storage location where the output document will be stored.
  • MODEL_PROJECT_ID: The project ID where the model is located.
  • MODEL_LOCATION: The region where the model is located.
  • MODEL_ID: The ID of the model to use.
  • GLOSSARY_PROJECT_ID: The project ID where the glossary is located.
  • GLOSSARY_LOCATION: The region where the glossary is located.
  • GLOSSARY_ID: The ID of the glossary to use.

HTTP method and URL:

POST https://translation.googleapis.com/v3beta1/projects/PROJECT_NUMBER_OR_ID/locations/LOCATION:translateDocument

Request JSON body:

{
  "source_language_code": "SOURCE_LANGUAGE",
  "target_language_code": "TARGET_LANGUAGE",
  "document_input_config": {
    "gcsSource": {
      "inputUri": "gs://INPUT_FILE_PATH"
    }
  },
  "document_output_config": {
    "gcsDestination": {
      "outputUriPrefix": "gs://OUTPUT_FILE_PREFIX"
    }
  },
  "model": "projects/MODEL_PROJECT_ID/locations/MODEL_LOCATION/models/MODEL_ID",
  "glossary_config": {
    "glossary": "projects/GLOSSARY_PROJECT_ID/locations/MODEL_LOCATION/glossaries/GLOSSARY_ID"
  }
}

To send your request, expand one of these options:

You should receive a JSON response similar to the following:

{
  "documentTranslation": {
    "byteStreamOutputs": ["BYTE_STREAM"],
    "mimeType": "MIME_TYPE"
  },
  "glossary_document_translation": {
    "byteStreamOutputs": ["BYTE_STREAM_USING_GLOSSARY"],
    "mimeType": "MIME_TYPE"
  },
  "model": "projects/MODEL_PROJECT_ID/locations/MODEL_LOCATION/models/MODEL_ID",
  "glossaryConfig": {
    "glossary": "projects/GLOSSARY_PROJECT_ID/locations/MODEL_LOCATION/glossaries/GLOSSARY_ID"
  }
}

Translate documents (batch)

Batch translation allows you to translate multiple files into multiple languages in a single request. For each request, you can send up to 100 files with a total content size of up to 1 GB or 100 million Unicode codepoints, whichever limit is hit first. You can specify a particular translation model for each language.

Translate multiple documents

The following example includes multiple input configuration. Each input configuration is a pointer to a file in Cloud Storage.

REST & CMD LINE

Before using any of the request data below, make the following replacements:

  • PROJECT_NUMBER_OR_ID: Your Google Cloud project number or ID
  • LOCATION: Region where you want to run this operation. For example, us-central1.
  • SOURCE_LANGUAGE: The language code of the input documents. Set to one of the language codes listed in Language support.
  • TARGET_LANGUAGE: The target language or languages to translate the input documents to. Use the language codes listed in Language support.
  • INPUT_FILE_PATH: The Cloud Storage location and file name of one or more input documents.
  • OUTPUT_FILE_PREFIX: The Cloud Storage location where all output documents are stored.

HTTP method and URL:

POST https://translation.googleapis.com/v3beta1/projects/PROJECT_NUMBER_OR_ID/locations/LOCATION:batchTranslateDocument

Request JSON body:

{
  "source_language_code": "SOURCE_LANGUAGE",
  "target_language_codes": ["TARGET_LANGUAGE", ...],
  "input_configs": [
    {
      "gcsSource": {
        "inputUri": "gs://INPUT_FILE_PATH_1"
      }
    },
    {
      "gcsSource": {
        "inputUri": "gs://INPUT_FILE_PATH_2"
      }
    },
    ...
  ],
  "output_config": {
    "gcsDestination": {
      "outputUriPrefix": "gs://OUTPUT_FILE_PREFIX"
    }
  }
}

To send your request, expand one of these options:

The response contains the ID for a [long-running operation](/translate/docs/advanced/long-running-operation).
{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.translation.v3beta1.BatchTranslateDocumentMetadata",
    "state": "RUNNING"
  }
}

Use an AutoML model or a glossary

Instead of the Google-managed model, you can use your own AutoML Translation models to translate documents. In addition to specifying a model, you can also include a glossary to handle domain-specific terminology. Note that if you specify a model or a glossary, you must specify the source language. The following example uses an AutoML model and a glossary. You can specify up to 10 target languages with their own model and glossary.

If you specify a model for some target languages and not others, Document Translation uses the Google-managed model for the unspecified languages. Similarly, if you specify a glossary for some target languages, Document Translation doesn't use any glossary for the unspecified languages.

REST & CMD LINE

Before using any of the request data below, make the following replacements:

  • PROJECT_NUMBER_OR_ID: Your Google Cloud project number or ID
  • LOCATION: Region where you want to run this operation, such as us-central1. The location must match the region where your model, glossary, or both are located.
  • SOURCE_LANGUAGE: The language code of the input documents. Set to one of the language codes listed in Language support.
  • TARGET_LANGUAGE: The target language or languages to translate the input documents to. Use the language codes listed in Language support.
  • INPUT_FILE_PATH: The Cloud Storage location and file name of one or more input documents.
  • OUTPUT_FILE_PREFIX: The Cloud Storage location where all output documents are stored.
  • MODEL_PROJECT_ID: The project ID where the model is located.
  • MODEL_LOCATION: The region where the model is located.
  • MODEL_ID: The ID of the model to use.
  • GLOSSARY_PROJECT_ID: The project ID where the glossary is located.
  • GLOSSARY_LOCATION: The region where the glossary is located.
  • GLOSSARY_ID: The ID of the glossary to use.

HTTP method and URL:

POST https://translation.googleapis.com/v3beta1/projects/PROJECT_NUMBER_OR_ID/locations/LOCATION:translateDocument

Request JSON body:

{
  "source_language_code": "SOURCE_LANGUAGE",
  "target_language_codes": "[TARGET_LANGUAGE, ...]",
  "input_configs": [
    {
      "gcsSource": {
        "inputUri": "gs://INPUT_FILE_PATH"
      }
    }
  ],
  "output_config": {
    "gcsDestination": {
      "outputUriPrefix": "gs://OUTPUT_FILE_PREFIX"
    }
  },
  "models": {
    "TARGET_LANGUAGE": "projects/MODEL_PROJECT_ID/locations/MODEL_LOCATION/models/MODEL_ID",
    ...
  },
  "glossaries": {
    "TARGET_LANGUAGE": {
      "glossary": "projects/GLOSSARY_PROJECT_ID/locations/MODEL_LOCATION/glossaries/GLOSSARY_ID"
    },
    ...
  }
}

To send your request, expand one of these options:

The response contains the ID for a [long-running operation](/translate/docs/advanced/long-running-operation).
{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.translation.v3beta1.BatchTranslateDocumentMetadata",
    "state": "RUNNING"
  }
}

What's next

  • Document Translation is priced per page. For more information, see Pricing.