Output configuration for BatchTranslateText request.
Google Cloud Storage destination for output content. For every
single input file (for example, gs://a/b/c.[extension]), we
generate at most 2 * n output files. (n is the # of
target_language_codes in the BatchTranslateTextRequest).
Output files (tsv) generated are compliant with RFC 4180
except that record delimiters are \\n
instead of
\\r\\n
. We don’t provide any way to
change record delimiters. While the input files are being
processed, we write/update an index file ‘index.csv’ under
‘output_uri_prefix’ (for example, gs://translation-
test/index.csv) The index file is generated/updated as new
files are being translated. The format is:
input_file,target_language_code,translations_file,errors_file,
glossary_translations_file,glossary_errors_file input_file is
one file we matched using gcs_source.input_uri.
target_language_code is provided in the request.
translations_file contains the translations. (details provided
below) errors_file contains the errors during processing of
the file. (details below). Both translations_file and
errors_file could be empty strings if we have no content to
output. glossary_translations_file and glossary_errors_file
are always empty strings if the input_file is tsv. They could
also be empty if we have no content to output. Once a row is
present in index.csv, the input/output matching never changes.
Callers should also expect all the content in input_file are
processed and ready to be consumed (that is, no partial output
file is written). The format of translations_file (for target
language code ‘trg’) is:
gs://translation_test/a_b_c_‘trg’_translations.[extension]
If the input file extension is tsv, the output has the
following columns: Column 1: ID of the request provided in the
input, if it’s not provided in the input, then the input row
number is used (0-based). Column 2: source sentence. Column 3:
translation without applying a glossary. Empty string if there
is an error. Column 4 (only present if a glossary is provided
in the request): translation after applying the glossary.
Empty string if there is an error applying the glossary. Could
be same string as column 3 if there is no glossary applied.
If input file extension is a txt or html, the translation is
directly written to the output file. If glossary is requested,
a separate glossary_translations_file has format of gs://trans
lation_test/a_b_c_‘trg’_glossary_translations.[extension]
The format of errors file (for target language code ‘trg’) is:
gs://translation_test/a_b_c_‘trg’_errors.[extension] If the
input file extension is tsv, errors_file contains the
following: Column 1: ID of the request provided in the input,
if it’s not provided in the input, then the input row number
is used (0-based). Column 2: source sentence. Column 3: Error
detail for the translation. Could be empty. Column 4 (only
present if a glossary is provided in the request): Error when
applying the glossary. If the input file extension is txt or
html, glossary_error_file will be generated that contains
error details. glossary_error_file has format of gs://translat
ion_test/a_b_c_‘trg’_glossary_errors.[extension]