This tutorial demonstrates how to create a custom translation model using AutoML Translation. The application trains a custom model using an English to Spanish dataset of technology-oriented sentence pairs from software localization.
The tutorial covers training the custom model, evaluating its performance, and translating new content.
Prerequisites
Configure your project environment
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Cloud project. Learn how to check if billing is enabled on a project.
-
Enable the AutoML Translation APIs.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Cloud project. Learn how to check if billing is enabled on a project.
-
Enable the AutoML Translation APIs.
- Install the Google Cloud CLI.
- Follow the instructions to create a service account and download a key file.
- Set the
GOOGLE_APPLICATION_CREDENTIALS
environment variable to the path to the service account key file that you downloaded when you created the service account. For example:export GOOGLE_APPLICATION_CREDENTIALS=key-file
- Add your new service account to the AutoML Editor IAM role with the
following commands. Replace project-id with the name of your
Google Cloud project and replace service-account-name with the name of
your new service account, for example
service-account1@myproject.iam.gserviceaccount.com
.gcloud auth login gcloud config set project project-id gcloud projects add-iam-policy-binding project-id \ --member=serviceAccount:service-account-name \ --role='roles/automl.editor'
- Allow the AutoML Translation service accounts to access your Google
Cloud project resources:
gcloud projects add-iam-policy-binding project-id \ --member="serviceAccount:service-project-number@gcp-sa-automl.iam.gserviceaccount.com" \ --role="roles/automl.serviceAgent"
- Install the client library.
- Set the PROJECT_ID and REGION_NAME environment variables.
Replace project-id with the Project ID of your Google Cloud project. AutoML Translation currently requires the locationus-central1
.export PROJECT_ID="project-id" export REGION_NAME="us-central1"
- Create a Google Cloud Storage bucket to store the documents that you will
use to train your custom model.
The bucket name must be in the format:$PROJECT_ID-vcm
. The following command creates a storage bucket in theus-central1
region named$PROJECT_ID-vcm
.gsutil mb -p $PROJECT_ID -c regional -l $REGION_NAME gs://$PROJECT_ID-vcm/
-
Download the archive
file containing the sample data for training the model, extract its contents,
and upload the files to your Google Cloud Storage bucket.
See Preparing your training data for details about the formats.The sample code in this tutorial uses the English to Spanish dataset. Datasets with target languages German, French, Russian, and Chinese are also available. If you use one of these alternate datasets, replace the language code
es
in the samples with the appropriate language code. -
In the
en-es.csv
file from the previous step, replace{project_id}
with the Project ID for your project.
Source code file locations
You can download the source code from the location provided below. After downloading, you can copy the source code into your Google Cloud project folder.
Python
The tutorial consists of these Python files:
translate_create_dataset.py
– Includes functionality to create a datasetimport_dataset.py
– Includes functionality to import a datasettranslate_create_model.py
– Includes functionality to create a modellist_model_evaluations.py
– Includes functionality to list model evaluationstranslate_predict.py
– Includes functionality related to predictiondelete_model.py
- Include functionality to delete a model
Java
The tutorial consists of these Java files:
TranslateCreateDataset.java
– Includes functionality to create a datasetImportDataset.java
– Includes functionality to import a datasetTranslateCreateModel.java
– Includes functionality to create a modelListModelEvaluations.java
– Includes functionality to list model evaluationsTranslatePredict.java
– Includes functionality related to predictionDeleteModel.java
– Includes functionality to delete a model
Node.js
The tutorial consists of these Node.js programs:
translate_create_dataset.js
– Includes functionality to create a datasetimport_dataset.js
– Includes functionality to import a datasettranslate_create_model.js
– Includes functionality to create a modellist_model_evaluations.js
– Includes functionality to list model evaluationstranslate_predict.js
– Includes functionality related to predictiondelete_model.js
- Include functionality to delete a model
Running the application
Step 1: Create a dataset
The first step in creating a custom model is to create an empty dataset that will eventually hold the training data for the model. When you create a dataset, you specify the source and target languages for the translation.
Copy the Code
Python
To learn how to install and use the client library for AutoML Translation, see AutoML Translation client libraries. For more information, see the AutoML Translation Python API reference documentation.
Java
To learn how to install and use the client library for AutoML Translation, see AutoML Translation client libraries. For more information, see the AutoML Translation Java API reference documentation.
Node.js
To learn how to install and use the client library for AutoML Translation, see AutoML Translation client libraries. For more information, see the AutoML Translation Node.js API reference documentation.
Request
Run the create_dataset
function to create an empty dataset. You must modify
the following lines of code:
- Set the
project_id
to your PROJECT_ID - Set the
display_name
for the dataset (en_es_dataset
) Modify the
target_language_code
field fromja
toes
Python
python translate_create_dataset.py
Java
mvn compile exec:java -Dexec.mainClass="com.example.automl.TranslateCreateDataset"
Node.js
node translate_create_dataset.js
Response
The response includes the details of the newly created dataset, including the
Dataset ID that you'll use to reference the dataset in future requests. We
recommend that you set an environment variable DATASET_ID
to the returned
Dataset ID value.
Dataset name: projects/216065747626/locations/us-central1/datasets/TRL7372141011130533778 Dataset id: TRL7372141011130533778 Dataset display name: en_es_dataset Translation dataset Metadata: source_language_code: en target_language_code: es Dataset example count: 0 Dataset create time: seconds: 1530251987 nanos: 216586000
Step 2: Import training sentence pairs into the dataset
The next step is to populate the dataset with a list of training sentence pairs.
The import_dataset
function interface takes as input a .csv file that lists the
locations of all training documents and the proper label for each training document.
(See Prepare your data for
details about the required format.) For this tutorial, we will be using
en-es.csv
, which you uploaded to Google Cloud Storage above.
Copy the Code
Python
To learn how to install and use the client library for AutoML Translation, see AutoML Translation client libraries. For more information, see the AutoML Translation Python API reference documentation.
Java
To learn how to install and use the client library for AutoML Translation, see AutoML Translation client libraries. For more information, see the AutoML Translation Java API reference documentation.
Node.js
To learn how to install and use the client library for AutoML Translation, see AutoML Translation client libraries. For more information, see the AutoML Translation Node.js API reference documentation.
Request
Run the import_data
function to import the training content. You must modify
the following lines of code:
- Set the
project_id
to your PROJECT_ID - Set the
dataset_id
for the dataset (from the output of the previous step) Set the
path
which is the URI of the (gs://YOUR_PROJECT_ID-vcm/en-es.csv
)
Python
python import_dataset.py
Java
mvn compile exec:java -Dexec.mainClass="com.example.automl.ImportDataset"
Node.js
node import_dataset.js
Response
Processing import... Dataset imported.
Step 3: Create (train) the model
Now that you have a dataset of labeled training documents, you can train a new model.
Copy the Code
Python
To learn how to install and use the client library for AutoML Translation, see AutoML Translation client libraries. For more information, see the AutoML Translation Python API reference documentation.
Java
To learn how to install and use the client library for AutoML Translation, see AutoML Translation client libraries. For more information, see the AutoML Translation Java API reference documentation.
Node.js
To learn how to install and use the client library for AutoML Translation, see AutoML Translation client libraries. For more information, see the AutoML Translation Node.js API reference documentation.
Request
To run create_model
, you must modify the following lines of code:
- Set the
project_id
to your PROJECT_ID - Set the
dataset_id
for the dataset (from the output of the previous step) Set the
display_name
for the new model (en_es_test_model)
Python
python translate_create_model.py
Java
mvn compile exec:java -Dexec.mainClass="com.example.automl.TranlateCreateModel"
Node.js
node translate_create_model.js
Response
The create_model
function kicks off a training operation and prints the operation
name. Training happens asynchronously and can take a while to complete, so you can
use the operation ID to check training status.
When training is complete, create_model
returns the Model ID. As with the Dataset
ID, you might want to set an environment variable MODEL_ID
to the returned
Model ID value.
Training operation name: projects/216065747626/locations/us-central1/operations/TRL3007727620979824033 Training started... Model name: projects/216065747626/locations/us-central1/models/TRL3007727620979824033 Model id: TRL3007727620979824033 Model display name: en_es_test_model Model create time: seconds: 1529649600 nanos: 966000000 Model deployment state: deployed
Step 4: Evaluate the model
After training, you can evaluate your model's readiness by reviewing its BLEU score.
The list_model_evaluations
function takes the Model ID as a parameter.
Copy the Code
Python
To learn how to install and use the client library for AutoML Translation, see AutoML Translation client libraries. For more information, see the AutoML Translation Python API reference documentation.
Java
To learn how to install and use the client library for AutoML Translation, see AutoML Translation client libraries. For more information, see the AutoML Translation Java API reference documentation.
Node.js
To learn how to install and use the client library for AutoML Translation, see AutoML Translation client libraries. For more information, see the AutoML Translation Node.js API reference documentation.
Request
Make a request to display the overall evaluation performance of the model by executing the following request. You must modify the following lines of code:
- Set the
project_id
to your PROJECT_ID Set the
model_id
to your model's id
Python
python list_model_evaluations.py
Java
mvn compile exec:java -Dexec.mainClass="com.example.automl.ListModelEvaluations"
Node.js
node list_model_evaluations.js
Response
If the BLEU score is too low, you can strengthen the training dataset and re-train your model. For more information, see Evaluating models.
List of model evaluations: name: "projects/216065747626/locations/us-central1/models/5419131644870929143/modelEvaluations/TRL7683346839371803263" create_time { seconds: 1530196488 nanos: 509247000 } evaluated_example_count: 3 translation_evaluation_metrics { bleu_score: 19.23076957464218 base_bleu_score: 11.428571492433548 }
Step 5: Use a model to make a prediction
When your custom model meets your quality standards, you can use it to translate novel content.
Copy the Code
Python
To learn how to install and use the client library for AutoML Translation, see AutoML Translation client libraries. For more information, see the AutoML Translation Python API reference documentation.
Java
To learn how to install and use the client library for AutoML Translation, see AutoML Translation client libraries. For more information, see the AutoML Translation Java API reference documentation.
Node.js
To learn how to install and use the client library for AutoML Translation, see AutoML Translation client libraries. For more information, see the AutoML Translation Node.js API reference documentation.
Request
For the predict
function you must modify the following lines of code:
- Set the
project_id
to your PROJECT_ID - Set the
model_id
to your model's id Set the
file_path
to the downloaded file ("resources/input.txt")
Python
python tranlsate_predict.py
Java
mvn compile exec:java -Dexec.mainClass="com.example.automl.TranslatePredict"
Node.js
node translate_predict.js predict
Response
The function returns the translated content.
Translated content: Ver y administrar tus cuentas de Google Tag Manager.
Above is the Spanish translation for the English sentence: “View and manage your Google Tag Manager accounts.” Contrast this custom translation with the translation from the base Google model:
Ver y administrar sus cuentas de Administrador de etiquetas de Google
Step 6: Delete a model
When you are done using this sample model, you can delete it permanently. You will no longer be able to use the model for prediction.
Copy the Code
Python
To learn how to install and use the client library for AutoML Translation, see AutoML Translation client libraries. For more information, see the AutoML Translation Python API reference documentation.
Java
To learn how to install and use the client library for AutoML Translation, see AutoML Translation client libraries. For more information, see the AutoML Translation Java API reference documentation.
Node.js
To learn how to install and use the client library for AutoML Translation, see AutoML Translation client libraries. For more information, see the AutoML Translation Node.js API reference documentation.
Request
Make a request with operation type delete_model
to delete a model you created
you must modify the following lines of code:
- Set the
project_id
to your PROJECT_ID Set the
model_id
to your model's id
Python
python delete_model.py
Java
mvn compile exec:java -Dexec.mainClass="com.example.automl.DeleteModel"
Node.js
node delete_model.js
Response
Model deleted.