Create a custom translation model
This page shows you how to train and use a custom AutoML translation model by using the Google Cloud console. The following example trains a custom English-to-Spanish translation model by using technology-oriented sentence pairs from software localization.
Before you begin
Go to the AutoML Translation page and select your project from the drop-down list. You must have at least roles/editor access to the project. The AutoML documentation walks you through setting up a project and granting the necessary permissions.
Create a translation dataset and import sentence pairs
Download the archive file containing the sample data for training the model, and extract the file
en-es.tsv
.Go to the AutoML Translation console page.
Select the project for which you enabled AutoML Translation.
Click the Create Dataset button.
On the Create dataset page, enter a name for the dataset and select the source and target languages.
When you select English as the Translate from language, the available Translate to languages appear. Select Spanish.
Click Create.
On the Import tab for your dataset, do the following:
- Select Upload files from your computer,
click Select Files, and choose the
en-es.tsv
file you downloaded previously. - When choosing files from local, you must specify the
Cloud Storage path
where the uploaded files are to be stored. The Cloud Storage
bucket region must be
us-central1.
- Select Upload files from your computer,
click Select Files, and choose the
Click Continue.
You're returned to the Datasets page; your dataset will show an in progress animation while your documents are being imported. When your dataset has been successfully uploaded, you will receive a message at the email address that you used to sign up for the program.
Review the dataset.
After your data has been successfully imported, select the dataset from the dataset listing page (or click the link in the email notification) to see the details about the dataset. The name of the selected dataset appears in the title bar, and the page lists the sentence pairs and which stage of processing they will be used for (TRAIN, VALIDATION, TEST).
Train an AutoML translation model
To begin training your custom model, click the Train tab just below the title bar, then the Start Training button.
Training a model can take several hours to complete. After the model is successfully trained, you will receive a message at the email address you used to sign up for the program.
When you receive notification that training is complete, open the email message and click the link to go to the Google Cloud console. The Train page shows high-level metrics for the model, most notably its BLEU score. The BLEU (Bilingual Evaluation Understudy) score indicates how similar the candidate text is to the reference texts, with values closer to one representing more similar texts.
Use the AutoML translation model
Click the Predict tab just below the title bar or the Test and use link below the model information. Enter some text to translate and click the Translate button. You can compare the results from your custom model to the Google NMT model.
Clean up
To avoid incurring charges to your Google Cloud account for the resources used on this page, follow these steps.
To avoid unnecessary Google Cloud charges, use the Google Cloud console to delete your project if you do not need it.
What's next
- When you're ready to create your own dataset to create an AutoML Translation model, read the instructions on how to prepare your data.