After creating an AutoML Vision Edge model and exporting it to a Google Cloud Storage bucket you can use RESTful services with your AutoML Vision Edge models and TF Serving Docker images.
What you will build
Docker containers can help you deploy edge models easily on different devices. You can run edge models by calling REST APIs from containers with any language you prefer, with the added benefit of not having to install dependencies or find proper TensorFlow versions.
In this tutorial, you will have a step-by-step experience of running edge models on devices using Docker containers.
Specifically, this tutorial will walk you through three steps:
- Getting pre-built containers.
- Running containers with Edge models to start REST APIs.
- Making predictions.
Many devices only have CPUs, while some might have GPUs to get faster predictions. So, we provide tutorials with both pre-built CPU and GPU containers.
Objectives
In this introductory, end-to-end walkthrough you will use code samples to:
- Get the Docker container.
- Start REST APIs using Docker containers with edge models.
- Make predictions to get analyzed results.
Before you begin
To complete this tutorial, you must:
- Train an exportable Edge model. Follow the Edge device model quickstart to train an Edge model.
- Export an AutoML Vision Edge model. This model will be served with containers as REST APIs.
- Install Docker. This is the required software to run Docker containers.
- (Optional) Install NVIDIA docker and driver. This is an optional step if you have devices with GPUs and would like to get faster predictions.
- Prepare test images. These images will be sent in requests to get analyzed results.
Details for exporting models and installing necessary software are in the following section.
Export AutoML Vision Edge Model
After training an Edge model, you can export it to different devices.
The containers support TensorFlow models,
which are named saved_model.pb
on export.
To export a AutoML Vision Edge model for containers, select the Container tab in the UI and then export the model to ${YOUR_MODEL_PATH} on Google Cloud Storage. This exported model will be served with containers as REST APIs later.
To download the exported model locally, run the following command.
Where:
- ${YOUR_MODEL_PATH} - The model location on Google Cloud Storage (for example,
gs://my-bucket-vcm/models/edge/ICN4245971651915048908/2020-01-20_01-27-14-064_tf-saved-model/
) - ${YOUR_LOCAL_MODEL_PATH} - Your local path
where you want to download your model (for example,
/tmp
).
gsutil cp ${YOUR_MODEL_PATH} ${YOUR_LOCAL_MODEL_PATH}/saved_model.pb
Install Docker
Docker is software used for deploying and running applications inside containers.
Install Docker Community Edition (CE) on your system. You will use this to serve Edge models as REST APIs.
Install NVIDIA Driver And NVIDIA DOCKER (optional - for GPU only)
Some devices have GPUs to provide faster predictions. The GPU docker container is provided supporting NVIDIA GPUs.
In order to run GPU containers, you must install the NVIDIA driver and NVIDIA Docker on your system.
Running model inference using CPU
This section gives step-by-step instructions to run model inferences using CPU containers. You will use the installed Docker to get and run the CPU container to serve the exported Edge models as REST APIs, and then send requests of a test image to the REST APIs to get analyzed results.
Pull the Docker image
First, you will use Docker to get a pre-built CPU container. The pre-built CPU container already has the whole environment to serve exported Edge models, which does not yet contain any Edge models.
The pre-built CPU container is stored in Google Container Registry. Before requesting the container, set an environment variable for the container's location in Google Container Registry:
export CPU_DOCKER_GCR_PATH=gcr.io/cloud-devrel-public-resources/gcloud-container-1.14.0:latest
After setting the environment variable for the Container Registry path, run the following command line to get the CPU container:
sudo docker pull ${CPU_DOCKER_GCR_PATH}
Run the Docker container
After getting the existing container you will run this CPU container to serve Edge model inferences with REST APIs.
Before starting the CPU container you must set system variables:
- ${CONTAINER_NAME} - A string indicating the container name when
it runs, for example
CONTAINER_NAME=automl_high_accuracy_model_cpu
. - ${PORT} - A number indicating the port in your device to accept
REST API calls later, such as
PORT=8501
.
After setting the variables, run Docker in command line to serve Edge model inferences with REST APIs:
sudo docker run --rm --name ${CONTAINER_NAME} -p ${PORT}:8501 -v ${YOUR_MODEL_PATH}:/tmp/mounted_model/0001 -t ${CPU_DOCKER_GCR_PATH}
After the container is running successfully, the REST APIs are ready
for serving at http://localhost:${PORT}/v1/models/default:predict
. The
following section details how to send requests for prediction to this location.
Send a prediction request
Now that the container is running successfully, you can send a prediction request on a test image to the REST APIs.
Command-line
The command line request body contains base64-encoded image_bytes
and a string key
to identify the given image. See the
Base64 encoding topic for more
information about image encoding. The format of the request JSON file is as follows:
/tmp/request.json
{ "instances": [ { "image_bytes": { "b64": "/9j/7QBEUGhvdG9zaG9...base64-encoded-image-content...fXNWzvDEeYxxxzj/Coa6Bax//Z" }, "key": "your-chosen-image-key" } ] }
After you have created a local JSON request file you can send your prediction request.
Use the following command to send the prediction request:
curl -X POST -d @/tmp/request.json http://localhost:${PORT}/v1/models/default:predictResponse
You should see output similar to the following:
{ "predictions": [ { "labels": ["Good", "Bad"], "scores": [0.665018, 0.334982] } ] }
Python
For more information, see the AutoML Vision Python API reference documentation.
Run Model Inference Using GPU Containers (optional)
This section shows how to run model inferences using GPU containers. This process is very similar to running model inference using a CPU. The key differences are the GPU container path and how you start GPU containers.
Pull the Docker image
First, you will use Docker to get a pre-built GPU container. The pre-built GPU container already has the environment to serve exported Edge models with GPUs, which does not yet contain any Edge models, or the drivers.
The pre-built CPU container is stored in Google Container Registry. Before requesting the container, set an environment variable for the container's location in Google Container Registry:
export GPU_DOCKER_GCR_PATH=gcr.io/cloud-devrel-public-resources/gcloud-container-1.14.0-gpu:latest
Run the following command line to get the GPU container:
sudo docker pull ${GPU_DOCKER_GCR_PATH}
Run the Docker container
This step will run the GPU container to serve Edge model inferences with REST APIs. You must install NVIDIA driver and docker as mentioned above. You also must must set the following system variables:
- ${CONTAINER_NAME} - A string indicating the container name when
it runs, for example
CONTAINER_NAME=automl_high_accuracy_model_gpu
. - ${PORT} - A number indicating the port in your device to accept
REST API calls later, such as
PORT=8502
.
After setting the variables, run Docker in command line to serve Edge model inferences with REST APIs:
sudo docker run --runtime=nvidia --rm --name "${CONTAINER_NAME}" -v \ ${YOUR_MODEL_PATH}:/tmp/mounted_model/0001 -p \ ${PORT}:8501 -t ${GPU_DOCKER_GCR_PATH}
After the container is running successfully, the REST APIs are ready
for serving in http://localhost:${PORT}/v1/models/default:predict
. The
following section details how to send requests for prediction to this location.
Send a prediction request
Now that the container is running successfully, you can send a prediction request on a test image to the REST APIs.
Command-line
The command line request body contains base64-encoded image_bytes
and a string key
to identify the given image. See the
Base64 encoding topic for more
information about image encoding. The format of the request JSON file is as follows:
/tmp/request.json
{ "instances": [ { "image_bytes": { "b64": "/9j/7QBEUGhvdG9zaG9...base64-encoded-image-content...fXNWzvDEeYxxxzj/Coa6Bax//Z" }, "key": "your-chosen-image-key" } ] }
After you have created a local JSON request file you can send your prediction request.
Use the following command to send the prediction request:
curl -X POST -d @/tmp/request.json http://localhost:${PORT}/v1/models/default:predictResponse
You should see output similar to the following:
{ "predictions": [ { "labels": ["Good", "Bad"], "scores": [0.665018, 0.334982] } ] }
Python
For more information, see the AutoML Vision Python API reference documentation.
Summary
In this tutorial, you have walked through running Edge models using CPU or GPU Docker containers. You can now deploy this container based solution on more devices.
What Next
- Learn more about TensorFlow generally with TensorFlow's Getting Started documentation.
- Learn more about Tensorflow Serving.
- Learn how to use TensorFlow Serving with Kubernetes.