Edge containers tutorial

After creating an AutoML Vision Edge model and exporting it to a Google Cloud Storage bucket you can use RESTful services with your AutoML Vision Edge models and TF Serving Docker images.

What you will build

Docker containers can help you deploy edge models easily on different devices. You can run edge models by calling REST APIs from containers with any language you prefer, with the added benefit of not having to install dependencies or find proper TensorFlow versions.

In this tutorial, you will have a step-by-step experience of running edge models on devices using Docker containers.

Specifically, this tutorial will walk you through three steps:

  1. Getting pre-built containers.
  2. Running containers with Edge models to start REST APIs.
  3. Making predictions.

Many devices only have CPUs, while some might have GPUs to get faster predictions. So, we provide tutorials with both pre-built CPU and GPU containers.

Objectives

In this introductory, end-to-end walkthrough you will use code samples to:

  1. Get the Docker container.
  2. Start REST APIs using Docker containers with edge models.
  3. Make predictions to get analyzed results.

Before you begin

To complete this tutorial, you must:

  1. Train an exportable Edge model. Follow the Edge device model quickstart to train an Edge model.
  2. Export an AutoML Vision Edge model. This model will be served with containers as REST APIs.
  3. Install Docker. This is the required software to run Docker containers.
  4. (Optional) Install NVIDIA docker and driver. This is an optional step if you have devices with GPUs and would like to get faster predictions.
  5. Prepare test images. These images will be sent in requests to get analyzed results.

Details for exporting models and installing necessary software are in the following section.

Export AutoML Vision Edge Model

After training an Edge model, you can export it to different devices.

The containers support TensorFlow models, which are named saved_model.pb on export.

To export a AutoML Vision Edge model for containers, select the Container tab in the UI and then export the model to ${YOUR_MODEL_PATH} on Google Cloud Storage. This exported model will be served with containers as REST APIs later.

Export to container option

To download the exported model locally, run the following command.

Where:

  • ${YOUR_MODEL_PATH} - The model location on Google Cloud Storage (for example, gs://my-bucket-vcm/models/edge/ICN4245971651915048908/2020-01-20_01-27-14-064_tf-saved-model/)
  • ${YOUR_LOCAL_MODEL_PATH} - Your local path where you want to download your model (for example, /tmp).
gsutil cp ${YOUR_MODEL_PATH} ${YOUR_LOCAL_MODEL_PATH}/saved_model.pb

Install Docker

Docker is software used for deploying and running applications inside containers.

Install Docker Community Edition (CE) on your system. You will use this to serve Edge models as REST APIs.

Install NVIDIA Driver And NVIDIA DOCKER (optional - for GPU only)

Some devices have GPUs to provide faster predictions. The GPU docker container is provided supporting NVIDIA GPUs.

In order to run GPU containers, you must install the NVIDIA driver and NVIDIA Docker on your system.

Running model inference using CPU

This section gives step-by-step instructions to run model inferences using CPU containers. You will use the installed Docker to get and run the CPU container to serve the exported Edge models as REST APIs, and then send requests of a test image to the REST APIs to get analyzed results.

Pull the Docker image

First, you will use Docker to get a pre-built CPU container. The pre-built CPU container already has the whole environment to serve exported Edge models, which does not yet contain any Edge models.

The pre-built CPU container is stored in Google Container Registry. Before requesting the container, set an environment variable for the container's location in Google Container Registry:

export CPU_DOCKER_GCR_PATH=gcr.io/cloud-devrel-public-resources/gcloud-container-1.14.0:latest

After setting the environment variable for the Container Registry path, run the following command line to get the CPU container:

sudo docker pull ${CPU_DOCKER_GCR_PATH}

Run the Docker container

After getting the existing container you will run this CPU container to serve Edge model inferences with REST APIs.

Before starting the CPU container you must set system variables:

  • ${CONTAINER_NAME} - A string indicating the container name when it runs, for example CONTAINER_NAME=automl_high_accuracy_model_cpu.
  • ${PORT} - A number indicating the port in your device to accept REST API calls later, such as PORT=8501.

After setting the variables, run Docker in command line to serve Edge model inferences with REST APIs:

sudo docker run --rm --name ${CONTAINER_NAME} -p ${PORT}:8501 -v ${YOUR_MODEL_PATH}:/tmp/mounted_model/0001 -t ${CPU_DOCKER_GCR_PATH}

After the container is running successfully, the REST APIs are ready for serving at http://localhost:${PORT}/v1/models/default:predict. The following section details how to send requests for prediction to this location.

Send a prediction request

Now that the container is running successfully, you can send a prediction request on a test image to the REST APIs.

Command-line

The command line request body contains base64-encoded image_bytes and a string key to identify the given image. See the Base64 encoding topic for more information about image encoding. The format of the request JSON file is as follows:

/tmp/request.json
{
  "instances":
  [
    {
      "image_bytes":
      {
        "b64": "/9j/7QBEUGhvdG9zaG9...base64-encoded-image-content...fXNWzvDEeYxxxzj/Coa6Bax//Z"
      },
      "key": "your-chosen-image-key"
    }
  ]
}

After you have created a local JSON request file you can send your prediction request.

Use the following command to send the prediction request:

curl -X POST -d  @/tmp/request.json http://localhost:${PORT}/v1/models/default:predict
Response

You should see output similar to the following:

{
    "predictions": [
        {
            "labels": ["Good", "Bad"],
            "scores": [0.665018, 0.334982]
        }
    ]
}

Python

For more information, see the AutoML Vision Python API reference documentation.

To authenticate to AutoML Vision, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

import base64
import cv2
import io
import json

import requests


def preprocess_image(image_file_path, max_width, max_height):
    """Preprocesses input images for AutoML Vision Edge models.

    Args:
        image_file_path: Path to a local image for the prediction request.
        max_width: The max width for preprocessed images. The max width is 640
            (1024) for AutoML Vision Image Classfication (Object Detection)
            models.
        max_height: The max width for preprocessed images. The max height is
            480 (1024) for AutoML Vision Image Classfication (Object
            Detetion) models.
    Returns:
        The preprocessed encoded image bytes.
    """
    # cv2 is used to read, resize and encode images.
    encode_param = [int(cv2.IMWRITE_JPEG_QUALITY), 85]
    im = cv2.imread(image_file_path)
    [height, width, _] = im.shape
    if height > max_height or width > max_width:
        ratio = max(height / float(max_width), width / float(max_height))
        new_height = int(height / ratio + 0.5)
        new_width = int(width / ratio + 0.5)
        resized_im = cv2.resize(
            im, (new_width, new_height), interpolation=cv2.INTER_AREA
        )
        _, processed_image = cv2.imencode(".jpg", resized_im, encode_param)
    else:
        _, processed_image = cv2.imencode(".jpg", im, encode_param)
    return base64.b64encode(processed_image).decode("utf-8")


def container_predict(image_file_path, image_key, port_number=8501):
    """Sends a prediction request to TFServing docker container REST API.

    Args:
        image_file_path: Path to a local image for the prediction request.
        image_key: Your chosen string key to identify the given image.
        port_number: The port number on your device to accept REST API calls.
    Returns:
        The response of the prediction request.
    """
    # AutoML Vision Edge models will preprocess the input images.
    # The max width and height for AutoML Vision Image Classification and
    # Object Detection models are 640*480 and 1024*1024 separately. The
    # example here is for Image Classification models.
    encoded_image = preprocess_image(
        image_file_path=image_file_path, max_width=640, max_height=480
    )

    # The example here only shows prediction with one image. You can extend it
    # to predict with a batch of images indicated by different keys, which can
    # make sure that the responses corresponding to the given image.
    instances = {
        "instances": [{"image_bytes": {"b64": str(encoded_image)}, "key": image_key}]
    }

    # This example shows sending requests in the same server that you start
    # docker containers. If you would like to send requests to other servers,
    # please change localhost to IP of other servers.
    url = "http://localhost:{}/v1/models/default:predict".format(port_number)

    response = requests.post(url, data=json.dumps(instances))
    print(response.json())

Run Model Inference Using GPU Containers (optional)

This section shows how to run model inferences using GPU containers. This process is very similar to running model inference using a CPU. The key differences are the GPU container path and how you start GPU containers.

Pull the Docker image

First, you will use Docker to get a pre-built GPU container. The pre-built GPU container already has the environment to serve exported Edge models with GPUs, which does not yet contain any Edge models, or the drivers.

The pre-built CPU container is stored in Google Container Registry. Before requesting the container, set an environment variable for the container's location in Google Container Registry:

export GPU_DOCKER_GCR_PATH=gcr.io/cloud-devrel-public-resources/gcloud-container-1.14.0-gpu:latest

Run the following command line to get the GPU container:

sudo docker pull ${GPU_DOCKER_GCR_PATH}

Run the Docker container

This step will run the GPU container to serve Edge model inferences with REST APIs. You must install NVIDIA driver and docker as mentioned above. You also must must set the following system variables:

  • ${CONTAINER_NAME} - A string indicating the container name when it runs, for example CONTAINER_NAME=automl_high_accuracy_model_gpu.
  • ${PORT} - A number indicating the port in your device to accept REST API calls later, such as PORT=8502.

After setting the variables, run Docker in command line to serve Edge model inferences with REST APIs:

sudo docker run --runtime=nvidia --rm --name "${CONTAINER_NAME}" -v \
${YOUR_MODEL_PATH}:/tmp/mounted_model/0001 -p \
${PORT}:8501 -t ${GPU_DOCKER_GCR_PATH}

After the container is running successfully, the REST APIs are ready for serving in http://localhost:${PORT}/v1/models/default:predict. The following section details how to send requests for prediction to this location.

Send a prediction request

Now that the container is running successfully, you can send a prediction request on a test image to the REST APIs.

Command-line

The command line request body contains base64-encoded image_bytes and a string key to identify the given image. See the Base64 encoding topic for more information about image encoding. The format of the request JSON file is as follows:

/tmp/request.json
{
  "instances":
  [
    {
      "image_bytes":
      {
        "b64": "/9j/7QBEUGhvdG9zaG9...base64-encoded-image-content...fXNWzvDEeYxxxzj/Coa6Bax//Z"
      },
      "key": "your-chosen-image-key"
    }
  ]
}

After you have created a local JSON request file you can send your prediction request.

Use the following command to send the prediction request:

curl -X POST -d  @/tmp/request.json http://localhost:${PORT}/v1/models/default:predict
Response

You should see output similar to the following:

{
    "predictions": [
        {
            "labels": ["Good", "Bad"],
            "scores": [0.665018, 0.334982]
        }
    ]
}

Python

For more information, see the AutoML Vision Python API reference documentation.

To authenticate to AutoML Vision, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

import base64
import cv2
import io
import json

import requests


def preprocess_image(image_file_path, max_width, max_height):
    """Preprocesses input images for AutoML Vision Edge models.

    Args:
        image_file_path: Path to a local image for the prediction request.
        max_width: The max width for preprocessed images. The max width is 640
            (1024) for AutoML Vision Image Classfication (Object Detection)
            models.
        max_height: The max width for preprocessed images. The max height is
            480 (1024) for AutoML Vision Image Classfication (Object
            Detetion) models.
    Returns:
        The preprocessed encoded image bytes.
    """
    # cv2 is used to read, resize and encode images.
    encode_param = [int(cv2.IMWRITE_JPEG_QUALITY), 85]
    im = cv2.imread(image_file_path)
    [height, width, _] = im.shape
    if height > max_height or width > max_width:
        ratio = max(height / float(max_width), width / float(max_height))
        new_height = int(height / ratio + 0.5)
        new_width = int(width / ratio + 0.5)
        resized_im = cv2.resize(
            im, (new_width, new_height), interpolation=cv2.INTER_AREA
        )
        _, processed_image = cv2.imencode(".jpg", resized_im, encode_param)
    else:
        _, processed_image = cv2.imencode(".jpg", im, encode_param)
    return base64.b64encode(processed_image).decode("utf-8")


def container_predict(image_file_path, image_key, port_number=8501):
    """Sends a prediction request to TFServing docker container REST API.

    Args:
        image_file_path: Path to a local image for the prediction request.
        image_key: Your chosen string key to identify the given image.
        port_number: The port number on your device to accept REST API calls.
    Returns:
        The response of the prediction request.
    """
    # AutoML Vision Edge models will preprocess the input images.
    # The max width and height for AutoML Vision Image Classification and
    # Object Detection models are 640*480 and 1024*1024 separately. The
    # example here is for Image Classification models.
    encoded_image = preprocess_image(
        image_file_path=image_file_path, max_width=640, max_height=480
    )

    # The example here only shows prediction with one image. You can extend it
    # to predict with a batch of images indicated by different keys, which can
    # make sure that the responses corresponding to the given image.
    instances = {
        "instances": [{"image_bytes": {"b64": str(encoded_image)}, "key": image_key}]
    }

    # This example shows sending requests in the same server that you start
    # docker containers. If you would like to send requests to other servers,
    # please change localhost to IP of other servers.
    url = "http://localhost:{}/v1/models/default:predict".format(port_number)

    response = requests.post(url, data=json.dumps(instances))
    print(response.json())

Summary

In this tutorial, you have walked through running Edge models using CPU or GPU Docker containers. You can now deploy this container based solution on more devices.

What Next