This tutorial shows you how to train the Bidirectional Encoder Representations from Transformers (BERT) model on AI Platform Training.
BERT is a method of pre-training language representations. Pre-training refers to how BERT is first trained on a large source of text, such as Wikipedia. You can then apply the training results to other Natural Language Processing (NLP) tasks, such as question answering and sentiment analysis. With BERT and AI Platform Training, you can train a variety of NLP models in about 30 minutes.
For more information about BERT, see the following resources:
- Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Objectives
- Create a Cloud Storage bucket to hold your model output.
- Run the training job.
- Verify the output results.
Before starting this tutorial, check that your Google Cloud project is correctly set up.
Complete the following steps to set up a GCP account, enable the required APIs, and install and activate the Google Cloud CLI:
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the AI Platform Training & Prediction and Compute Engine APIs.
- Install the Google Cloud CLI.
-
To initialize the gcloud CLI, run the following command:
gcloud init
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the AI Platform Training & Prediction and Compute Engine APIs.
- Install the Google Cloud CLI.
-
To initialize the gcloud CLI, run the following command:
gcloud init
Prepare the data
This tutorial will not require any preprocessing or downloading data. All of the data and model checkpoints needed are available in public storage buckets. If you are interested in that process please check out the Cloud TPU tutorial, which covers creating this dataset from the command-line.
Submit a training job
To submit a job, you must specify some basic training arguments and some basic arguments related to the BERT algorithm.
General arguments for the training job:
Training job arguments | |
---|---|
Argument | Description |
job-id |
Unique ID for your training job. You can use this to find logs for the status of your training job after you submit it. |
job-dir |
Cloud Storage path where AI Platform Training saves training files after completing a successful training job. |
scale-tier |
Specifies machine types for training. Use BASIC to select
a configuration of just one machine.
|
master-image-uri |
Container Registry URI used to specify which Docker container to
use for the training job. Use the container for the built-in
BERT algorithm defined earlier as IMAGE_URI .
|
region |
Specify the available region in which to run your training job. For
this tutorial, you can use the region us-central1 .
|
Arguments specific to the built-in BERT algorithm training with the provided dataset:
Algorithm arguments | ||
---|---|---|
Argument | Value | Description |
mode |
train_and_eval | Indicate whether or not to do fine-tuning training or export the model. |
train_dataset_path |
gs://cloud-tpu-checkpoints/bert/classification/mnli_train.tf_record | Cloud Storage path where the training data is stored. |
eval_dataset_path |
gs://cloud-tpu-checkpoints/bert/classification/mnli_eval.tf_record | Cloud Storage path where the evaluation data is stored. |
input_meta_data_path |
gs://cloud-tpu-checkpoints/bert/classification/mnli_meta_data | Cloud Storage path where the input schema is stored. |
bert_config_file |
gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16/bert_config.json | Cloud Storage path where the BERT config file is stored. |
init_checkpoint |
gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16/bert_model.ckpt | Starting checkpoint for fine-tuning (usually a pre-trained BERT model.) |
train_batch_size |
32 | Batch size for training. |
eval_batch_size |
32 | Batch size for evaluation. |
learning_rate |
2e-5 | Learning rate used by the Adam optimizer. |
num_train_epochs |
1 | Number of training epochs to run (only available
in train_and_eval mode.)
|
steps_per_loop |
1000 | The number of steps per graph-mode loop. |
For a detailed list of all other BERT algorithm flags, refer to the built-in BERT reference.
Run the training job
- Navigate to the AI Platform > Jobs page:
At the top of the page, click the "New training job" button and select "Built-in algorithm training"
Select BERT as your training algorithm
Use the browse button to mark the training and evaluation datasets in your Cloud Storage bucket and choose the output directory.
On the next page, use the argument values above to configure the training job.
Give your training job a name and use the
BASIC_TPU
machine type.Click "Submit" to start your job.
Understand your job directory
After the successful completion of a training job, AI Platform Training
creates a trained model in your Cloud Storage bucket, along with some other
artifacts. You can find the following directory structure within your JOB_DIR
:
- model/ (a TensorFlow SavedModel
directory)
- saved_model.pb
- assets/
- variables/
- summaries/ (logging from training and evaluation)
- eval/
- train/
- various checkpoint files (created and used during training)
- checkpoint
- ctl_checkpoint-1.data-00000-of-00002
- ...
- ctl_checkpoint-1.index
Confirm that the directory structure in your JOB_DIR
matches the structure
described in the preceding list:
gcloud storage ls -a $JOB_DIR/*
Deploy the trained model
AI Platform Prediction organizes your trained models using model and version resources. An AI Platform Prediction model is a container for the versions of your machine learning model.
To deploy a model, you create a model resource in AI Platform Prediction, create a version of that model, then use the model and version to request online predictions.
Learn more about how to deploy models to AI Platform Prediction.
Console
On the Jobs page, you can find a list of all your training jobs. Click the name of the training job you just submitted.
On the Job details page, you can view the general progress of your job, or click View logs for a more detailed view of its progress.
When the job is successful, the Deploy model button appears at the top. Click Deploy model.
Select "Deploy as new model", and enter a model name. Next, click Confirm.
On the Create version page, enter a version name, such as
v1
, and leave all other fields at their default settings. Click Save.On the Model details page, your version name displays. The version takes a few minutes to create. When the version is ready, a checkmark icon appears by the version name.
Click the version name (
v1
) to navigate to the Version details page. In the next step of this tutorial, you send a prediction request
Get online predictions
When you request predictions, you must format input data as JSON in a manner that the model expects. Current BERT models do not automatically preprocess inputs.
Console
On the Version details page for
v1
, the version you just created, you can send a sample prediction request.Select the Test & Use tab.
Copy the following sample to the input field:
{ "instances": [ { "input_mask": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], "input_type_ids":[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], "input_word_ids": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] } ] }
Click Test.
Wait a moment, and a prediction vector should be returned.
What's next
In this tutorial you have trained the BERT model using a sample dataset. In most cases, the results of this training are not usable for inference. To use a model for inference you can train the data on a publicly available dataset or your own data set. Models trained on Cloud TPU require datasets to be in TFRecord format.
You can use the dataset conversion tool sample to convert an image classification dataset into TFRecord format. If you are not using an image classification model, you will have to convert your dataset to TFRecord format yourself. For more information, see TFRecord and tf.Example
- Learn more about using the built-in BERT algorithm.