Create a custom translation model

This page shows you how to train and use a custom AutoML translation model by using the Google Cloud console. The following example trains a custom English-to-Spanish translation model by using technology-oriented sentence pairs from software localization.

Before you begin

Go to the AutoML Translation page and select your project from the drop-down list. You must have at least roles/editor access to the project. The AutoML documentation walks you through setting up a project and granting the necessary permissions.

Create a translation dataset and import sentence pairs

  1. Download the archive file containing the sample data for training the model, and extract the file en-es.tsv.

  2. Go to the AutoML Translation console page.

  3. Select the project for which you enabled AutoML Translation.

    Datasets page with one dataset

  4. Click the Create Dataset button.

  5. On the Create dataset page, enter a name for the dataset and select the source and target languages.

    When you select English as the Translate from language, the available Translate to languages appear. Select Spanish.

  6. Click Create.

  7. On the Import tab for your dataset, do the following:

    Import tab for my_dataset

    • Select Upload files from your computer, click Select Files, and choose the en-es.tsv file you downloaded previously.
    • When choosing files from local, you must specify the Cloud Storage path where the uploaded files are to be stored. The Cloud Storage bucket region must be us-central1.
  8. Click Continue.

    You're returned to the Datasets page; your dataset will show an in progress animation while your documents are being imported. When your dataset has been successfully uploaded, you will receive a message at the email address that you used to sign up for the program.

  9. Review the dataset.

    After your data has been successfully imported, select the dataset from the dataset listing page (or click the link in the email notification) to see the details about the dataset. The name of the selected dataset appears in the title bar, and the page lists the sentence pairs and which stage of processing they will be used for (TRAIN, VALIDATION, TEST).

Train an AutoML translation model

To begin training your custom model, click the Train tab just below the title bar, then the Start Training button.

Train tab for the my_dataset dataset

Training a model can take several hours to complete. After the model is successfully trained, you will receive a message at the email address you used to sign up for the program.

When you receive notification that training is complete, open the email message and click the link to go to the Google Cloud console. The Train page shows high-level metrics for the model, most notably its BLEU score. The BLEU (Bilingual Evaluation Understudy) score indicates how similar the candidate text is to the reference texts, with values closer to one representing more similar texts.

Train tab for the my_dataset showing the model evaluation

Use the AutoML translation model

Click the Predict tab just below the title bar or the Test and use link below the model information. Enter some text to translate and click the Translate button. You can compare the results from your custom model to the Google NMT model.

Clean up

To avoid incurring charges to your Google Cloud account for the resources used on this page, follow these steps.

To avoid unnecessary Google Cloud charges, use the Google Cloud console to delete your project if you do not need it.

What's next