Running the Automated Speech Recognition (ASR) model

This tutorial shows you how to train an Automated Speech Recognition (ASR) model using the publicly available Librispeech ASR corpus dataset with Tensor2Tensor on a Cloud TPU.

The speech recognition model is just one of the models in the Tensor2Tensor library. Tensor2Tensor (T2T) is a library of deep learning models and datasets as well as a set of scripts that allow you to train the models and to download and prepare the data. This model does speech-to-text conversion.

Objectives

  • Create a Cloud Storage bucket to hold your dataset and model output.
  • Download and prepare the Tensor2Tensor library dataset.
  • Run the training job.
  • Verify the output results.

Costs

This tutorial uses billable components of Google Cloud, including:

  • Compute Engine
  • Cloud TPU
  • Cloud Storage

Use the pricing calculator to generate a cost estimate based on your projected usage. New Google Cloud users might be eligible for a free trial.

Before you begin

Before starting this tutorial, check that your Google Cloud project is correctly set up.

  1. Sign in to your Google Account.

    If you don't already have one, sign up for a new account.

  2. In the Cloud Console, on the project selector page, select or create a Cloud project.

    Go to the project selector page

  3. Make sure that billing is enabled for your Google Cloud project. Learn how to confirm billing is enabled for your project.

  4. This walkthrough uses billable components of Google Cloud. Check the Cloud TPU pricing page to estimate your costs. Be sure to clean up resources you create when you've finished with them to avoid unnecessary charges.

Set up your resources

This section provides information on setting up Cloud Storage bucket, VM, and Cloud TPU resources for tutorials.

  1. Open a Cloud Shell window.

    Open Cloud Shell

  2. Create a variable for your project's ID.

    export PROJECT_ID=project-id
    
  3. Configure gcloud command-line tool to use the project where you want to create Cloud TPU.

    gcloud config set project ${PROJECT_ID}
    

    The first time you run this command in a new Cloud Shell VM, an Authorize Cloud Shell page is displayed. Click Authorize at the bottom of the page to allow gcloud to make GCP API calls with your credentials.

  4. Create a Service Account for the Cloud TPU project.

    gcloud beta services identity create --service tpu.googleapis.com --project $PROJECT_ID
    

    The command returns a Cloud TPU Service Account with following format:

    service-PROJECT_NUMBER@cloud-tpu.iam.gserviceaccount.com
    

  5. Create a Cloud Storage bucket using the following command:

    gsutil mb -p ${PROJECT_ID} -c standard -l europe-west4 -b on gs://bucket-name
    

    This Cloud Storage bucket stores the data you use to train your model and the training results. The ctpu up tool used in this tutorial sets up default permissions for the Cloud TPU Service Account. If you want finer-grain permissions, review the access level permissions.

    The bucket location must be in the same region as your virtual machine (VM) and your TPU node. VMs and TPU nodes are located in specific zones, which are subdivisions within a region.

  6. Launch the Compute Engine and Cloud TPU resources required for this using the ctpu up command.

    ctpu up --project=${PROJECT_ID} \
     --zone=europe-west4-a \
     --vm-only \
     --disk-size-gb=300 \
     --machine-type=n1-standard-8 \
     --tf-version=1.15.4 \
     --name=auto-speech-recog-tutorial

    Command flag descriptions

    project
    Your GCP project ID
    zone
    The zone where you plan to create your Cloud TPU.
    vm-only
    Creates the VM without creating a Cloud TPU. By default the ctpu up command creates a VM and a Cloud TPU.
    disk-size-gb
    The size of the hard disk in GB of the VM created by the ctpu up command.
    machine-type
    The machine type of the Compute Engine VM to create.
    tf-version
    The version of Tensorflow ctpu installs on the VM.
    name
    The name of the Cloud TPU to create.

    For more information on the CTPU utility, see CTPU Reference.

  7. When prompted, press y to create your Cloud TPU resources.

When the ctpu up command has finished executing, verify that your shell prompt has changed from username@projectname to username@vm-name. This change shows that you are now logged into your Compute Engine VM. If you are not connected to the Compute Engine instance, you can do so by running the following command:

gcloud compute ssh auto-speech-recog-tutorial --zone=europe-west4-a

From this point on, a prefix of (vm)$ means you should run the command on the Compute Engine VM instance.

  1. Create the following environment variables for directories:

    (vm)$ STORAGE_BUCKET=gs://bucket-name
    
    (vm)$ TPU_NAME=auto-speech-recog-tutorial
    (vm)$ DATA_DIR=$STORAGE_BUCKET/data/
    (vm)$ OUT_DIR=$STORAGE_BUCKET/output
    (vm)$ export TMP_DIR=~/tmp
    

Generate the training and evaluation datasets

T2T conveniently packages data generation for many common open-source datasets in its t2t-datagen script. The script downloads the data, preprocesses it, and prepares it for training.

On your Compute Engine VM:

  1. Use the t2t-datagen script to generate both the full dataset and the smaller clean version, which you will use for evaluation.

    The audio import in t2t-datagen uses sox to generate normalized waveforms. Install it on your Compute Engine VM and then run the t2t-datagen commands that follow.

    (vm)$  sudo apt-get install sox
    (vm)$  t2t-datagen --problem=librispeech --data_dir=$DATA_DIR --tmp_dir=$TMP_DIR
    (vm)$  t2t-datagen --problem=librispeech_clean --data_dir=$DATA_DIR --tmp_dir=$TMP_DIR

The problem librispeech_train_full_test_clean trains on the full dataset but evaluates on the clean dataset.

You can also use librispeech_clean_small which is a smaller version of the clean dataset.

You can view the data on Cloud Storage by going to the Google Cloud Console and choosing Storage from the left-hand menu. Click the name of the bucket that you created for this tutorial.

Training the model

To train a model on Cloud TPU run the trainer with big batches and truncated sequences.

(vm)$ t2t-trainer \
  --model=transformer \
  --hparams_set=transformer_librispeech_tpu \
  --problem=librispeech_train_full_test_clean \
  --train_steps=210000 \
  --eval_steps=3 \
  --local_eval_frequency=100 \
  --data_dir=$DATA_DIR \
  --output_dir=$OUT_DIR \
  --use_tpu \
  --cloud_tpu_name=$TPU_NAME

After this step is completed, run the training again for more steps with smaller batch size and full sequences. This training take approximately 11 hours on a v3-8 TPU node.

(vm)$ t2t-trainer \
  --model=transformer \
  --hparams_set=transformer_librispeech_tpu \
  --hparams=max_length=295650,max_input_seq_length=3650,max_target_seq_length=650,batch_size=6 \
  --problem=librispeech_train_full_test_clean \
  --train_steps=230000 \
  --eval_steps=3 \
  --local_eval_frequency=100 \
  --data_dir=$DATA_DIR \
  --output_dir=$OUT_DIR \
  --use_tpu \
  --cloud_tpu_name=$TPU_NAME

Cleaning up

To avoid incurring charges to your Google Cloud Platform account for the resources used in this tutorial:

  1. Disconnect from the Compute Engine instance, if you have not already done so:

    (vm)$ exit
    

    Your prompt should now be username@projectname, showing you are in the Cloud Shell.

  2. In your Cloud Shell, run ctpu delete with the --zone flag you used when you set up the Cloud TPU to delete your Compute Engine VM and your Cloud TPU:

    $ ctpu delete --project=${PROJECT_ID} \
      --zone=europe-west4-a \
      --name=auto-speech-recog-tutorial
    
  3. Run ctpu status to make sure you have no instances allocated to avoid unnecessary charges for TPU usage. The deletion might take several minutes. A response like the one below indicates there are no more allocated instances:

    2018/04/28 16:16:23 WARNING: Setting zone to "europe-west4-a"
    No instances currently exist.
            Compute Engine VM:     --
            Cloud TPU:             --
    
  4. Run gsutil as shown, replacing bucket-name with the name of the Cloud Storage bucket you created for this tutorial:

    $ gsutil rm -r gs://bucket-name
    

What's next

In this tutorial you have trained the Automated Speech Recognition model using a sample dataset. The results of this training are (in most cases) not usable for inference. To use a model for inference you can train the data on a publicly available dataset or your own data set. Models trained on Cloud TPUs require datasets to be in TFRecord format.

You can use the dataset conversion tool sample to convert an image classification dataset into TFRecord format. If you are not using an image classification model, you will have to convert your dataset to TFRecord format yourself. For more information, see TFRecord and tf.Example

Hyperparameter tuning

To improve the model's performance with your dataset, you can tune the model's hyperparameters. You can find information about hyperparameters common to all TPU supported models on GitHub. Information about model-specific hyperparameters can be found in the source code for each model. For more information on hyperparameter tuning, see Overview of hyperparameter tuning, Using the Hyperparameter tuning service and Tune hyperparameters.

Inference

Once you have trained your model you can use it for inference (also called prediction). AI Platform is a cloud-based solution for developing, training, and deploying machine learning models. Once a model is deployed, you can use the AI Platform Prediction service.