Method: projects.locations.batchTranslateText

Translates a large volume of text in asynchronous batch mode. This function provides real-time output as the inputs are being processed. If caller cancels a request, the partial results (for an input file, it's all or nothing) may still be available on the specified output location.

This call returns immediately and you can use google.longrunning.Operation.name to poll the status of the call.

HTTP request

POST https://translate.googleapis.com/v3beta1/{parent=projects/*/locations/*}:batchTranslateText

The URL uses gRPC Transcoding syntax.

Path parameters

Parameters
parent

string

Required. Location to make a call. Must refer to a caller's project.

Format: projects/{project-number-or-id}/locations/{location-id}.

The global location is not supported for batch translation.

Only AutoML Translation models or glossaries within the same region (have the same location-id) can be used, otherwise an INVALID_ARGUMENT (400) error is returned.

Request body

The request body contains data with the following structure:

JSON representation
{
  "sourceLanguageCode": string,
  "targetLanguageCodes": [
    string
  ],
  "models": {
    string: string,
    ...
  },
  "inputConfigs": [
    {
      object (InputConfig)
    }
  ],
  "outputConfig": {
    object (OutputConfig)
  },
  "glossaries": {
    string: {
      object (TranslateTextGlossaryConfig)
    },
    ...
  },
  "labels": {
    string: string,
    ...
  }
}
Fields
sourceLanguageCode

string

Required. Source language code.

targetLanguageCodes[]

string

Required. Specify up to 10 language codes here.

models

map (key: string, value: string)

Optional. The models to use for translation. Map's key is target language code. Map's value is model name. Value can be a built-in general model, or an AutoML Translation model.

The value format depends on model type:

  • AutoML Translation models: projects/{project-number-or-id}/locations/{location-id}/models/{model-id}

  • General (built-in) models: projects/{project-number-or-id}/locations/{location-id}/models/general/nmt,

If the map is empty or a specific model is not requested for a language pair, then default google model (nmt) is used.

Authorization requires one or more of the following IAM permissions on the specified resource models:

  • cloudtranslate.generalModels.batchPredict
  • automl.models.predict
inputConfigs[]

object (InputConfig)

Required. Input configurations. The total number of files matched should be <= 100. The total content size should be <= 100M Unicode codepoints. The files must use UTF-8 encoding.

outputConfig

object (OutputConfig)

Required. Output configuration. If 2 input configs match to the same file (that is, same input path), we don't generate output for duplicate inputs.

glossaries

map (key: string, value: object (TranslateTextGlossaryConfig))

Optional. Glossaries to be applied for translation. It's keyed by target language code.

Authorization requires the following IAM permission on the specified resource glossaries:

  • cloudtranslate.glossaries.batchPredict
labels

map (key: string, value: string)

Optional. The labels with user-defined metadata for the request.

Label keys and values can be no longer than 63 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. Label values are optional. Label keys must start with a letter.

See https://cloud.google.com/translate/docs/labels for more information.

Response body

If successful, the response body contains an instance of Operation.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

InputConfig

Input configuration for locations.batchTranslateText request.

JSON representation
{
  "mimeType": string,

  // Union field source can be only one of the following:
  "gcsSource": {
    object (GcsSource)
  }
  // End of list of possible types for union field source.
}
Fields
mimeType

string

Optional. Can be "text/plain" or "text/html". For .tsv, "text/html" is used if mimeType is missing. For .html, this field must be "text/html" or empty. For .txt, this field must be "text/plain" or empty.

Union field source. Required. Specify the input. source can be only one of the following:
gcsSource

object (GcsSource)

Required. Google Cloud Storage location for the source input. This can be a single file (for example, gs://translation-test/input.tsv) or a wildcard (for example, gs://translation-test/*). If a file extension is .tsv, it can contain either one or two columns. The first column (optional) is the id of the text request. If the first column is missing, we use the row number (0-based) from the input file as the ID in the output file. The second column is the actual text to be translated. We recommend each row be <= 10K Unicode codepoints, otherwise an error might be returned. Note that the input tsv must be RFC 4180 compliant.

You could use https://github.com/Clever/csvlint to check potential formatting errors in your tsv file. csvlint --delimiter='\t' your_input_file.tsv

The other supported file extensions are .txt or .html, which is treated as a single large chunk of text.

OutputConfig

Output configuration for locations.batchTranslateText request.

JSON representation
{

  // Union field destination can be only one of the following:
  "gcsDestination": {
    object (GcsDestination)
  }
  // End of list of possible types for union field destination.
}
Fields
Union field destination. Required. The destination of output. destination can be only one of the following:
gcsDestination

object (GcsDestination)

Google Cloud Storage destination for output content. For every single input file (for example, gs://a/b/c.[extension]), we generate at most 2 * n output files. (n is the # of targetLanguageCodes in the BatchTranslateTextRequest).

Output files (tsv) generated are compliant with RFC 4180 except that record delimiters are '\n' instead of '\r\n'. We don't provide any way to change record delimiters.

While the input files are being processed, we write/update an index file 'index.csv' under 'outputUriPrefix' (for example, gs://translation-test/index.csv) The index file is generated/updated as new files are being translated. The format is:

input_file,targetLanguageCode,translations_file,errors_file, glossary_translations_file,glossary_errors_file

input_file is one file we matched using gcsSource.input_uri. targetLanguageCode is provided in the request. translations_file contains the translations. (details provided below) errors_file contains the errors during processing of the file. (details below). Both translations_file and errors_file could be empty strings if we have no content to output. glossary_translations_file and glossary_errors_file are always empty strings if the input_file is tsv. They could also be empty if we have no content to output.

Once a row is present in index.csv, the input/output matching never changes. Callers should also expect all the content in input_file are processed and ready to be consumed (that is, no partial output file is written).

Since index.csv will be keeping updated during the process, please make sure there is no custom retention policy applied on the output bucket that may avoid file updating. (https://cloud.google.com/storage/docs/bucket-lock#retention-policy)

The format of translations_file (for target language code 'trg') is: gs://translation_test/a_b_c_'trg'_translations.[extension]

If the input file extension is tsv, the output has the following columns: Column 1: ID of the request provided in the input, if it's not provided in the input, then the input row number is used (0-based). Column 2: source sentence. Column 3: translation without applying a glossary. Empty string if there is an error. Column 4 (only present if a glossary is provided in the request): translation after applying the glossary. Empty string if there is an error applying the glossary. Could be same string as column 3 if there is no glossary applied.

If input file extension is a txt or html, the translation is directly written to the output file. If glossary is requested, a separate glossary_translations_file has format of gs://translation_test/a_b_c_'trg'_glossary_translations.[extension]

The format of errors file (for target language code 'trg') is: gs://translation_test/a_b_c_'trg'_errors.[extension]

If the input file extension is tsv, errors_file contains the following: Column 1: ID of the request provided in the input, if it's not provided in the input, then the input row number is used (0-based). Column 2: source sentence. Column 3: Error detail for the translation. Could be empty. Column 4 (only present if a glossary is provided in the request): Error when applying the glossary.

If the input file extension is txt or html, glossary_error_file will be generated that contains error details. glossary_error_file has format of gs://translation_test/a_b_c_'trg'_glossary_errors.[extension]