Run a batch translation using the Cloud Translation connector


This tutorial shows you how to create a workflow that uses the Cloud Translation API connector to translate files to other languages in asynchronous batch mode. This provides real-time output as the inputs are being processed.

Objectives

In this tutorial you will:

  1. Create an input Cloud Storage bucket.
  2. Create two files in English and upload them to the input bucket.
  3. Create a workflow that uses the Cloud Translation API connector to translate the two files to French and Spanish and saves the results in an output bucket.
  4. Deploy and execute the workflow to orchestrate the entire process.

Costs

In this document, you use the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use the pricing calculator. New Google Cloud users might be eligible for a free trial.

Before you begin

Security constraints defined by your organization might prevent you from completing the following steps. For troubleshooting information, see Develop applications in a constrained Google Cloud environment.

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. Install the Google Cloud CLI.
  3. To initialize the gcloud CLI, run the following command:

    gcloud init
  4. Create or select a Google Cloud project.

    • Create a Google Cloud project:

      gcloud projects create PROJECT_ID
    • Select the Google Cloud project that you created:

      gcloud config set project PROJECT_ID
  5. Make sure that billing is enabled for your Google Cloud project.

  6. Enable the Cloud Storage, Translation, and Workflows APIs:

    gcloud services enable storage.googleapis.com translate.googleapis.com workflows.googleapis.com
  7. Install the Google Cloud CLI.
  8. To initialize the gcloud CLI, run the following command:

    gcloud init
  9. Create or select a Google Cloud project.

    • Create a Google Cloud project:

      gcloud projects create PROJECT_ID
    • Select the Google Cloud project that you created:

      gcloud config set project PROJECT_ID
  10. Make sure that billing is enabled for your Google Cloud project.

  11. Enable the Cloud Storage, Translation, and Workflows APIs:

    gcloud services enable storage.googleapis.com translate.googleapis.com workflows.googleapis.com
  12. Update gcloud components:
    gcloud components update
  13. Log in using your account:
    gcloud auth login
  14. Set the default location used in this tutorial:
    gcloud config set workflows/location us-central1
    

    Since this tutorial uses the default AutoML Translation model which resides in us-central1, you must set the location to us-central1.

    If using an AutoML Translation model or glossary other than the default, ensure that it resides in the same location as the call to the connector; otherwise, an INVALID_ARGUMENT (400) error is returned. For details, see the batchTranslateText method.

Create an input Cloud Storage bucket and files

You can use Cloud Storage to store objects. Objects are immutable pieces of data consisting of a file of any format, and are stored in containers called buckets.

  1. Create a Cloud Storage bucket to hold the files to translate:

    BUCKET_INPUT=${GOOGLE_CLOUD_PROJECT}-input-files
    gsutil mb gs://${BUCKET_INPUT}
  2. Create two files in English and upload them to the input bucket:

    echo "Hello World!" > file1.txt
    gsutil cp file1.txt gs://${BUCKET_INPUT}
    echo "Workflows connectors simplify calling services." > file2.txt
    gsutil cp file2.txt gs://${BUCKET_INPUT}

Deploy and execute the workflow

A workflow is made up of a series of steps described using the Workflows syntax, which can be written in either YAML or JSON format. This is the workflow's definition. After creating a workflow, you deploy it to make it available for execution.

  1. Create a text file with the filename workflow.yaml and with the following content:

    main:
      steps:
      - init:
          assign:
          - projectId: ${sys.get_env("GOOGLE_CLOUD_PROJECT_ID")}
          - location: ${sys.get_env("GOOGLE_CLOUD_LOCATION")}
          - inputBucketName: ${projectId + "-input-files"}
          - outputBucketName: ${projectId + "-output-files-" + string(int(sys.now()))}
      - createOutputBucket:
            call: googleapis.storage.v1.buckets.insert
            args:
              project: ${projectId}
              body:
                name: ${outputBucketName}
      - batchTranslateText:
          call: googleapis.translate.v3beta1.projects.locations.batchTranslateText
          args:
              parent: ${"projects/" + projectId + "/locations/" + location}
              body:
                  inputConfigs:
                    gcsSource:
                      inputUri: ${"gs://" + inputBucketName + "/*"}
                  outputConfig:
                      gcsDestination:
                        outputUriPrefix: ${"gs://" + outputBucketName + "/"}
                  sourceLanguageCode: "en"
                  targetLanguageCodes: ["es", "fr"]
          result: batchTranslateTextResult

    The workflow assigns variables, creates an output bucket, and initiates the translation of the files, saving the results to the output bucket.

  2. After creating the workflow, deploy it:

    gcloud workflows deploy batch-translation --source=workflow.yaml
  3. Execute the workflow:

    gcloud workflows execute batch-translation
  4. To view the workflow status, you can run the returned command. For example:

    gcloud workflows executions describe eb4a6239-cffa-4672-81d8-d4caef7d8424 /
      --workflow batch-translation /
      --location us-central1

    The workflow should be ACTIVE. After a few minutes, the translated files (in French and Spanish) are uploaded to the output bucket.

List objects in the output bucket

You can confirm that the workflow worked as expected by listing the objects in your output bucket.

  1. Retrieve your output bucket name:

    gsutil ls

    The output is similar to the following:

    gs://PROJECT_ID-input-files/
    gs://PROJECT_ID-output-files-TIMESTAMP/

  2. List the objects in your output bucket:

    gsutil ls -r gs://PROJECT_ID-output-files-TIMESTAMP/**

    After a few minutes, the translated files, two of each in French and Spanish, are listed.

Clean up

If you created a new project for this tutorial, delete the project. If you used an existing project and wish to keep it without the changes added in this tutorial, delete resources created for the tutorial.

Delete the project

The easiest way to eliminate billing is to delete the project that you created for the tutorial.

To delete the project:

  1. In the Google Cloud console, go to the Manage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then click Delete.
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

Delete tutorial resources

  1. Remove the gcloud default configuration you added during the tutorial setup:

    gcloud config unset workflows/location
    
  2. Delete the workflow created in this tutorial:

    gcloud workflows delete WORKFLOW_NAME
    
  3. Delete the buckets created in this tutorial:

    gsutil rm -r gs://BUCKET_NAME

    Where BUCKET_NAME is the name of the bucket to delete. For example, my-bucket.

    The response is similar to the following:

    Removing gs://my-bucket/...

What's next