Method: projects.locations.batchTranslateDocument

HTTP request
Path parameters
Request body
- JSON representation
Response body
Authorization scopes
BatchDocumentInputConfig
- JSON representation
BatchDocumentOutputConfig
- JSON representation
Try it!

Translates a large volume of document in asynchronous batch mode. This function provides real-time output as the inputs are being processed. If caller cancels a request, the partial results (for an input file, it's all or nothing) may still be available on the specified output location.

This call returns immediately and you can use google.longrunning.Operation.name to poll the status of the call.

HTTP request

POST https://translate.googleapis.com/v3beta1/{parent=projects/*/locations/*}:batchTranslateDocument

The URL uses gRPC Transcoding syntax.

Path parameters

Parameters

Parameters
`parent`	`string` Required. Location to make a regional call. Format: `projects/{project-number-or-id}/locations/{location-id}`. The `global` location is not supported for batch translation. Only AutoML Translation models or glossaries within the same region (have the same location-id) can be used, otherwise an INVALID_ARGUMENT (400) error is returned.

parent

string

Required. Location to make a regional call.

Format: projects/{project-number-or-id}/locations/{location-id}.

The global location is not supported for batch translation.

Only AutoML Translation models or glossaries within the same region (have the same location-id) can be used, otherwise an INVALID_ARGUMENT (400) error is returned.

Request body

The request body contains data with the following structure:

JSON representation

JSON representation
{ "sourceLanguageCode": string, "targetLanguageCodes": [ string ], "inputConfigs": [ { object (`BatchDocumentInputConfig`) } ], "outputConfig": { object (`BatchDocumentOutputConfig`) }, "models": { string: string, ... }, "glossaries": { string: { object (`TranslateTextGlossaryConfig`) }, ... }, "formatConversions": { string: string, ... }, "customizedAttribution": string, "enableShadowRemovalNativePdf": boolean, "enableRotationCorrection": boolean }

{
  "sourceLanguageCode": string,
  "targetLanguageCodes": [
    string
  ],
  "inputConfigs": [
    {
      object (BatchDocumentInputConfig)
    }
  ],
  "outputConfig": {
    object (BatchDocumentOutputConfig)
  },
  "models": {
    string: string,
    ...
  },
  "glossaries": {
    string: {
      object (TranslateTextGlossaryConfig)
    },
    ...
  },
  "formatConversions": {
    string: string,
    ...
  },
  "customizedAttribution": string,
  "enableShadowRemovalNativePdf": boolean,
  "enableRotationCorrection": boolean
}

Fields
`sourceLanguageCode`	`string` Required. The BCP-47 language code of the input document if known, for example, "en-US" or "sr-Latn". Supported language codes are listed in Language Support.
`targetLanguageCodes[]`	`string` Required. The BCP-47 language code to use for translation of the input document. Specify up to 10 language codes here.
`inputConfigs[]`	`object (BatchDocumentInputConfig)` Required. Input configurations. The total number of files matched should be <= 100. The total content size to translate should be <= 100M Unicode codepoints. The files must use UTF-8 encoding.
`outputConfig`	`object (BatchDocumentOutputConfig)` Required. Output configuration. If 2 input configs match to the same file (that is, same input path), we don't generate output for duplicate inputs.
`models`	`map (key: string, value: string)` Optional. The models to use for translation. Map's key is target language code. Map's value is the model name. Value can be a built-in general model, or an AutoML Translation model. The value format depends on model type: AutoML Translation models: `projects/{project-number-or-id}/locations/{location-id}/models/{model-id}` General (built-in) models: `projects/{project-number-or-id}/locations/{location-id}/models/general/nmt`, If the map is empty or a specific model is not requested for a language pair, then default google model (nmt) is used. Authorization requires one or more of the following IAM permissions on the specified resource `models`: `cloudtranslate.generalModels.batchDocPredict` `automl.models.predict`
`glossaries`	`map (key: string, value: object (TranslateTextGlossaryConfig))` Optional. Glossaries to be applied. It's keyed by target language code. Authorization requires the following IAM permission on the specified resource `glossaries`: `cloudtranslate.glossaries.batchDocPredict`
`formatConversions`	`map (key: string, value: string)` Optional. File format conversion map to be applied to all input files. Map's key is the original mimeType. Map's value is the target mimeType of translated documents. Supported file format conversion includes: - `application/pdf` to `application/vnd.openxmlformats-officedocument.wordprocessingml.document` If nothing specified, output files will be in the same format as the original file.
`customizedAttribution`	`string` Optional. This flag is to support user customized attribution. If not provided, the default is `Machine Translated by Google`. Customized attribution should follow rules in https://cloud.google.com/translate/attribution#attribution_and_logos
`enableShadowRemovalNativePdf`	`boolean` Optional. If true, use the text removal server to remove the shadow text on background image for native pdf translation. Shadow removal feature can only be enabled when isTranslateNativePdfOnly: false && pdfNativeOnly: false
`enableRotationCorrection`	`boolean` Optional. If true, enable auto rotation correction in DVS.

Response body

If successful, the response body contains an instance of Operation.

Authorization scopes

Requires the following OAuth scope:

https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

BatchDocumentInputConfig

Input configuration for locations.batchTranslateDocument request.

JSON representation
{ // Union field `source` can be only one of the following: "gcsSource": { object (`GcsSource`) } // End of list of possible types for union field `source`. }

Fields

Fields
Union field `source`. Specify the input. `source` can be only one of the following:
`gcsSource`	`object (GcsSource)` Google Cloud Storage location for the source input. This can be a single file (for example, `gs://translation-test/input.docx`) or a wildcard (for example, `gs://translation-test/*`). File mime type is determined based on extension. Supported mime type includes: - `pdf`, application/pdf - `docx`, application/vnd.openxmlformats-officedocument.wordprocessingml.document - `pptx`, application/vnd.openxmlformats-officedocument.presentationml.presentation - `xlsx`, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet The max file size to support for `.docx`, `.pptx` and `.xlsx` is 100MB. The max file size to support for `.pdf` is 1GB and the max page limit is 1000 pages. The max file size to support for all input documents is 1GB.

Union field source. Specify the input. source can be only one of the following:

gcsSource

object (GcsSource)

Google Cloud Storage location for the source input. This can be a single file (for example, gs://translation-test/input.docx) or a wildcard (for example, gs://translation-test/*).

File mime type is determined based on extension. Supported mime type includes: - pdf, application/pdf - docx, application/vnd.openxmlformats-officedocument.wordprocessingml.document - pptx, application/vnd.openxmlformats-officedocument.presentationml.presentation - xlsx, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

The max file size to support for .docx, .pptx and .xlsx is 100MB. The max file size to support for .pdf is 1GB and the max page limit is 1000 pages. The max file size to support for all input documents is 1GB.

BatchDocumentOutputConfig

Output configuration for locations.batchTranslateDocument request.

JSON representation
{ // Union field `destination` can be only one of the following: "gcsDestination": { object (`GcsDestination`) } // End of list of possible types for union field `destination`. }

Fields

Fields
Union field `destination`. The destination of output. The destination directory provided must exist and be empty. `destination` can be only one of the following:
`gcsDestination`	`object (GcsDestination)` Google Cloud Storage destination for output content. For every single input document (for example, gs://a/b/c.[extension]), we generate at most 2 * n output files. (n is the # of targetLanguageCodes in the BatchTranslateDocumentRequest). While the input documents are being processed, we write/update an index file `index.csv` under `gcsDestination.output_uri_prefix` (for example, gs://translation_output/index.csv) The index file is generated/updated as new files are being translated. The format is: input_document,targetLanguageCode,translation_output,error_output, glossary_translation_output,glossary_error_output `input_document` is one file we matched using gcsSource.input_uri. `targetLanguageCode` is provided in the request. `translation_output` contains the translations. (details provided below) `error_output` contains the error message during processing of the file. Both translations_file and errors_file could be empty strings if we have no content to output. `glossary_translation_output` and `glossary_error_output` are the translated output/error when we apply glossaries. They could also be empty if we have no content to output. Once a row is present in index.csv, the input/output matching never changes. Callers should also expect all the content in input_file are processed and ready to be consumed (that is, no partial output file is written). Since index.csv will be keeping updated during the process, please make sure there is no custom retention policy applied on the output bucket that may avoid file updating. (https://cloud.google.com/storage/docs/bucket-lock#retention-policy) The naming format of translation output files follows (for target language code [trg]): `translation_output`: `gs://translation_output/a_b_c_[trg]_translation.[extension]` `glossary_translation_output`: `gs://translation_test/a_b_c_[trg]_glossary_translation.[extension]`. The output document will maintain the same file format as the input document. The naming format of error output files follows (for target language code [trg]): `error_output`: `gs://translation_test/a_b_c_[trg]_errors.txt` `glossary_error_output`: `gs://translation_test/a_b_c_[trg]_glossary_translation.txt` The error output is a txt file containing error details.

Union field destination. The destination of output. The destination directory provided must exist and be empty. destination can be only one of the following:

gcsDestination

object (GcsDestination)

Google Cloud Storage destination for output content. For every single input document (for example, gs://a/b/c.[extension]), we generate at most 2 * n output files. (n is the # of targetLanguageCodes in the BatchTranslateDocumentRequest).

While the input documents are being processed, we write/update an index file index.csv under gcsDestination.output_uri_prefix (for example, gs://translation_output/index.csv) The index file is generated/updated as new files are being translated. The format is:

input_document,targetLanguageCode,translation_output,error_output, glossary_translation_output,glossary_error_output

input_document is one file we matched using gcsSource.input_uri. targetLanguageCode is provided in the request. translation_output contains the translations. (details provided below) error_output contains the error message during processing of the file. Both translations_file and errors_file could be empty strings if we have no content to output. glossary_translation_output and glossary_error_output are the translated output/error when we apply glossaries. They could also be empty if we have no content to output.

Once a row is present in index.csv, the input/output matching never changes. Callers should also expect all the content in input_file are processed and ready to be consumed (that is, no partial output file is written).

Since index.csv will be keeping updated during the process, please make sure there is no custom retention policy applied on the output bucket that may avoid file updating. (https://cloud.google.com/storage/docs/bucket-lock#retention-policy)

The naming format of translation output files follows (for target language code [trg]): translation_output: gs://translation_output/a_b_c_[trg]_translation.[extension] glossary_translation_output: gs://translation_test/a_b_c_[trg]_glossary_translation.[extension]. The output document will maintain the same file format as the input document.

The naming format of error output files follows (for target language code [trg]): error_output: gs://translation_test/a_b_c_[trg]_errors.txt glossary_error_output: gs://translation_test/a_b_c_[trg]_glossary_translation.txt The error output is a txt file containing error details.