Edge TF Lite iOS tutorial

What you will build

In this tutorial you will download an exported custom TensorFlow Lite model from AutoML Vision Edge. You will then run a pre-made iOS app that uses the model to detect multiple objects within an image (with bounding boxes), and provide custom labeling of object categories.

End product mobile screenshot

Objectives

In this introductory, end-to-end walkthrough you will use code to:

  • Run a AutoML Vision Object Detection Edge model in an iOS app using the TF Lite interpreter.

Before you begin

Clone the Git repository

Using the command line, clone the following Git repository with the following command:

git clone https://github.com/tensorflow/examples.git

Navigate to the ios directory of the local clone of the repository (examples/lite/examples/object_detection/ios/). You will run all following code samples from the ios directory:

cd examples/lite/examples/object_detection/ios

Prerequisites

  • Git installed.
  • iOS versions supported: iOS 12.0 and above.

Setup the iOS app

Generate & open the workspace file

To begin setting up the original iOS app, you must first generate the workspace file using the required software:

  1. Navigate to the ios folder if you haven't done so already:

    cd examples/lite/examples/object_detection/ios
  2. Install the pod to generate the workspace file:

    pod install

    If you have installed this pod before, use the following command:

    pod update
  3. After you have generated the workspace file, you can open the project with Xcode. To open the project via the command line run the following command from the ios directory:

    open ./ObjectDetection.xcworkspace

Create a unique identifier & build the app

With the ObjectDetection.xcworkspace open in Xcode you must first change the bundle identifier (bundle ID) to a unique value.

  1. Select the top ObjectDetection project item in the project navigator on the left.

    image of ObjectDetection project in side navigation

  2. Make sure you have Targets > ObjectDetection selected.

    image of Targets options selected

  3. In the General > Identity section, change the Bundle Identifier field to a unique value. The preferred style is reverse domain name notation.

    image of Bundle ID example in Identity section

  4. In the General > Signing section below Identity, specify a Team from the drop down menu. This value is provided from your Developer ID.

    Choose a team drop down menu image

  5. Connect an iOS device to your computer. After the device is detected select it from the list of devices.

    default device selected

    select your connected device

  6. After specifying all configuration changes, build the app in Xcode using the following command: Command + B.

Run the original app

The sample app is a camera app that continuously detects the objects (bounding boxes and labels) in the frames seen by your device's back camera, using a quantized MobileNet SSD model trained on the COCO dataset.

These instructions walk you through building and running the demo on an iOS device.

The model files are downloaded via scripts in Xcode when you build and run. You don't need to do any steps to download TF Lite models into the project explicitly.

Before inserting your customized model, test the baseline version of the app which uses the base "mobilenet" trained model.

  1. To launch the app in the Simulator, select the play button xcode play icon in the upper left corner of the Xcode window.

    allow camera access prompt image

  2. After you allow the app access to your camera by selecting the Allow button, the app will then start live detection and annotation. Objects will be detected and marked with a bounding box and label in each camera frame.

  3. Move your device to different objects in your surroundings and verify the app is detecting images correctly.

    End product mobile screenshot

Run the customized app

Modify the app so that it will use your retrained model with custom object image categories.

Add your model files to the project

The demo project is configured to search for a two files in the ios/objectDetection/model directory:

  • detect.tflite
  • labelmap.txt

To replace those two files with your custom versions, run the following command:

cp tf_files/optimized_graph.lite ios/objectDetection/model/detect.tflite
cp tf_files/retrained_labels.txt ios/objectDetection/model/labelmap.txt

Run your app

To relaunch the app in your iOS device, select the play button xcode play icon in the upper left corner of the Xcode window.

To test the modifications, move your device's camera to a variety of objects to see live predictions.

Results should look similar to this:

custom app object detection with kitchen items

How does it work?

Now that you have the app running, look at the TensorFlow Lite specific code.

TensorFlowLite Pod

This app uses a pre-compiled TFLite Cocoapod. The Podfile includes the cocoapod in the project:

Podfile

# Uncomment the next line to define a global platform for your project
platform :ios, '12.0'

target 'ObjectDetection' do
  # Comment the next line if you're not using Swift and don't want to use dynamic frameworks
  use_frameworks!

  # Pods for ObjectDetection
  pod 'TensorFlowLiteSwift'

end

The code interfacing to the TF Lite is all contained in the ModelDataHandler.swift file. This class handles all data preprocessing and makes calls to run inference on a given frame by invoking the Interpreter. It then formats the inferences obtained and returns the top N results for a successful inference.

Exploring the code

ModelDataHandler.swift

The first block of interest (after the necessary imports) is property declarations. The tfLite model inputShape parameters (batchSize, inputChannels, inputWidth, inputHeight) can be found in tflite_metadata.json you will have this file when exporting the tflite model. For more info visit the Exporting Edge models how-to topic.

The example of tflite_metadata.json looks similar to the following code:

{
    "inferenceType": "QUANTIZED_UINT8",
    "inputShape": [
        1,   // This represents batch size
        512,  // This represents image width
        512,  // This represents image Height
        3  //This represents inputChannels
    ],
    "inputTensor": "normalized_input_image_tensor",
    "maxDetections": 20,  // This represents max number of boxes.
    "outputTensorRepresentation": [
        "bounding_boxes",
        "class_labels",
        "class_confidences",
        "num_of_boxes"
    ],
    "outputTensors": [
        "TFLite_Detection_PostProcess",
        "TFLite_Detection_PostProcess:1",
        "TFLite_Detection_PostProcess:2",
        "TFLite_Detection_PostProcess:3"
    ]
}
...

Model parameters:

Replace the below values as per the tflite_metadata.json file for your model.

let batchSize = 1 //Number of images to get prediction, the model takes 1 image at a time
let inputChannels = 3 //The pixels of the image input represented in RGB values
let inputWidth = 300 //Width of the image
let inputHeight = 300 //Height of the image
...

init

The init method, which creates the Interpreter with Model path and InterpreterOptions, then allocates memory for the model's input.

init?(modelFileInfo: FileInfo, labelsFileInfo: FileInfo, threadCount: Int = 1) {
    let modelFilename = modelFileInfo.name

    // Construct the path to the model file.
    guard let modelPath = Bundle.main.path(forResource: modelFilename,ofType: modelFileInfo.extension)

    // Specify the options for the `Interpreter`.
   var options = InterpreterOptions()
    options.threadCount = threadCount
    do {
      // Create the `Interpreter`.
      interpreter = try Interpreter(modelPath: modelPath, options: options)
      // Allocate memory for the model's input  `Tensor`s.
      try interpreter.allocateTensors()
    }

    super.init()

    // Load the classes listed in the labels file.
    loadLabels(fileInfo: labelsFileInfo)
  }
…

runModel

The below runModel method:

  1. Scales input image to aspect ratio for which model is trained.
  2. Removes the alpha component from the image buffer to get the RGB data.
  3. Copies the RGB data to the input Tensor.
  4. Runs inference by invoking the Interpreter.
  5. Gets the output from the Interpreter.
  6. Formats output.
func runModel(onFrame pixelBuffer: CVPixelBuffer) -> Result? {

Crops the image to the biggest square in the center and scales it down to model dimensions:

let scaledSize = CGSize(width: inputWidth, height: inputHeight)
guard let scaledPixelBuffer = pixelBuffer.resized(to: scaledSize) else
{
    return nil
}
...

do {
  let inputTensor = try interpreter.input(at: 0)

Remove the alpha component from the image buffer to get the RGB data:

guard let rgbData = rgbDataFromBuffer(
  scaledPixelBuffer,
  byteCount: batchSize * inputWidth * inputHeight * inputChannels,
  isModelQuantized: inputTensor.dataType == .uInt8
) else {
  print("Failed to convert the image buffer to RGB data.")
  return nil
}

Copy the RGB data to the input Tensor:

try interpreter.copy(rgbData, toInputAt: 0)

Run inference by invoking the Interpreter:

    let startDate = Date()
    try interpreter.invoke()
    interval = Date().timeIntervalSince(startDate) * 1000
    outputBoundingBox = try interpreter.output(at: 0)
    outputClasses = try interpreter.output(at: 1)
    outputScores = try interpreter.output(at: 2)
    outputCount = try interpreter.output(at: 3)
  }

Formats the results:

    let resultArray = formatResults(
      boundingBox: [Float](unsafeData: outputBoundingBox.data) ?? [],
      outputClasses: [Float](unsafeData: outputClasses.data) ?? [],
      outputScores: [Float](unsafeData: outputScores.data) ?? [],
      outputCount: Int(([Float](unsafeData: outputCount.data) ?? [0])[0]),
      width: CGFloat(imageWidth),
      height: CGFloat(imageHeight)
    )
...
}

Filters out all the results with confidence score < threshold and returns the top N results sorted in descending order:

  func formatResults(boundingBox: [Float], outputClasses: [Float],
  outputScores: [Float], outputCount: Int, width: CGFloat, height: CGFloat)
  -> [Inference]{
    var resultsArray: [Inference] = []
    for i in 0...outputCount - 1 {

      let score = outputScores[i]

Filters results with confidence < threshold:

      guard score >= threshold else {
        continue
      }

Gets the output class names for detected classes from labels list:

      let outputClassIndex = Int(outputClasses[i])
      let outputClass = labels[outputClassIndex + 1]

      var rect: CGRect = CGRect.zero

Translates the detected bounding box to CGRect.

      rect.origin.y = CGFloat(boundingBox[4*i])
      rect.origin.x = CGFloat(boundingBox[4*i+1])
      rect.size.height = CGFloat(boundingBox[4*i+2]) - rect.origin.y
      rect.size.width = CGFloat(boundingBox[4*i+3]) - rect.origin.x

The detected corners are for model dimensions. So we scale the rect with respect to the actual image dimensions.

let newRect = rect.applying(CGAffineTransform(scaleX: width, y: height))

Gets the color assigned for the class:

let colorToAssign = colorForClass(withIndex: outputClassIndex + 1)
      let inference = Inference(confidence: score,
                                className: outputClass,
                                rect: newRect,
                                displayColor: colorToAssign)
      resultsArray.append(inference)
    }

    // Sort results in descending order of confidence.
    resultsArray.sort { (first, second) -> Bool in
      return first.confidence  > second.confidence
    }

    return resultsArray
  }

CameraFeedManager

CameraFeedManager.swift manages all camera related functionality.

  1. It initializes and configures the AVCaptureSession:

    private func configureSession() {
    session.beginConfiguration()
  2. It then tries to add an AVCaptureDeviceInput and adds builtInWideAngleCamera as device input for Session.

    addVideoDeviceInput()

    It then tries to add an AVCaptureVideoDataOutput:

    addVideoDataOutput()

       session.commitConfiguration()
        self.cameraConfiguration = .success
      }
  3. Starts the session.

  4. Stops the session.

  5. Adds and removes the AVCaptureSession notifications.

Error handling:

The InferenceViewController.swift class is responsible for below screen, where as our main focus would be the highlighted part.

  • Resolution: It displays the resolution of the current frame (Image from Video session).
  • Crop: It displays the cropping size of the current frame.
  • InferenceTime: It displays how much time the model is taking to detect the object.
  • Threads: This displays how many number of threads are running. User can increase or decrease this count by tapping + or - sign of the Stepper. The current thread count used by the TensorFlow Lite Interpreter.

adjust threads image

The ViewController.swift class holds the instance of CameraFeedManager, which manages the camera related functionality and ModelDataHandler. ModelDataHandler handles the Model (trained model) and gets the output for image frame of video session.

private lazy var cameraFeedManager = CameraFeedManager(previewView: previewView)

private var modelDataHandler: ModelDataHandler? =
    ModelDataHandler(modelFileInfo: MobileNetSSD.modelInfo, labelsFileInfo: MobileNetSSD.labelsInfo)

Starts the camera session by calling:

cameraFeedManager.checkCameraConfigurationAndStartSession()

When you change the thread count, this class then re-initializes the model with new thread count in the didChangeThreadCount function.

CameraFeedManager class will send the ImageFrame as CVPixelBuffer to ViewController, which will be sent to model for prediction.

This method runs the live camera pixelBuffer through TensorFlow to get the result.

@objc
func runModel(onPixelBuffer pixelBuffer: CVPixelBuffer) {

Run the live camera pixelBuffer through tensorFlow to get the result:

result = self.modelDataHandler?.runModel(onFrame: pixelBuffer)
...
let displayResult = result

let width = CVPixelBufferGetWidth(pixelBuffer)
    let height = CVPixelBufferGetHeight(pixelBuffer)

    DispatchQueue.main.async {

Display results by handing off to the InferenceViewController:

self.inferenceViewController?.resolution = CGSize(width: width, height: height)

self.inferenceViewController?.inferenceTime = inferenceTime

Draws the bounding boxes and displays class names and confidence scores:

self.drawAfterPerformingCalculations(onInferences: displayResult.inferences, withImageSize: CGSize(width: CGFloat(width), height: CGFloat(height)))
    }
  }

What Next

You've now completed a walkthrough of an iOS object detection and annotation app using an Edge model. You used a trained Edge Tensorflow Lite model to test an object detection app before making modifications to it and getting sample annotations. You then examined TensorFlow Lite specific code to to understand underlying functionality.

The following resources can help you continue to learn about TensorFlow models and AutoML Vision Edge: