AI & Machine Learning

AI in Motion: designing a simple system to see, understand, and react in the real world (Part III)

October 22, 2018

Noah Negrey

Cloud Developer Programs Engineer

Yu-Han Liu

Developer Programs Engineer, Google Cloud AI

In the first two posts (here and here) of this blog series, we covered the design and architecture of our AI in Motion demo project, walked step by step through how we generated data, and explained how we selected and trained a model. In this final edition, we’ll show how we deployed the two models to devices, and how we were able to achieve near real-time predictions that led to entertaining and educational gameplay.

In the last edition, we selected and trained our two TensorFlow machine learning models. Once we had models that we were confident could deliver the “sight”, “understanding”, and “reaction” to the real world objects and the human players, we were ready to deploy the models for inference directly on the Android devices.

Converting models to mobile formats

TensorFlow Lite comes with tools that convert a normal TensorFlow model to the TensorFlow Lite format. TensorFlow 1.9 also comes with the object detection post-processing operation. The instructions for converting a TensorFlow model to the TensorFlow Lite format with the command line tool can be found here, or feel free to follow the steps in this post.

Prediction during gameplay

For this project, we spent the lion's share of our development effort on data gathering, data synthesis, data preparation, model selection, and training steps. Inference on devices was a relatively straightforward final stage, as a consequence of decisions we had made during the initial stages of the project.

Object detection

As discussed earlier, during gameplay we used the Android device’s camera to capture an image of the arena, then pass that image to our TensorFlow Lite model on the device for inference. The model identified the different robots and obstacles in the image and returned details of where the object was along with how confident it was that the object was in that spot—a confidence score. As expected, the inference time on the device took between 60-80 milliseconds, which gave us performant and actionable information on the locations of the various robots during gameplay.

We built the individual gameplay Android apps on top of the sample apps provided by the TensorFlow team. To show how we leveraged the sample apps, let’s look at our Human Freeze Tag app. The main Object Detection blocks can be seen in these lines of code which shows how we transformed an Android Bitmap (the image taken from the camera) into something usable by the TensorFlow Lite model. We found it quite simple to use TensorFlow with Android.

Commander model

During gameplay, the Commander Model received information from the Object Detection Model and delivered directions to the physical Sphero robotic balls via the Bluetooth connection provided by the Sphero APIs. The TensorFlow Lite Commander Model on the Pixel 2 Android phone was only 2.5 MB and in part thanks to its small size, it delivered inference results quickly enough to play competitive and compelling games against human players.

Tools used in gameplay

We used Firebase’s Realtime Database to handle all our communication between the Android devices and our leaderboard that holds each player’s score. Since one device is mounted on top of each arena to execute the Object Detection Model and the Commander Model to control the AI Sphero robots, we needed other Android devices for the human players to control their robots. As a consequence, we needed a fast, reliable platform to run the backend of our games, such as when a game should start, timers, game events (tagged, frozen, and zombie state changes), and scorekeeping. It also needed to keep all the devices synced and up to date to share that information with the players.

We also added a fun feature for games that had three human players. If only one player showed up to play, we didn’t want to leave the other two robots sitting idle. So we used the phone above the arena to also run the Commander Model to “act” as a human player and either try to run away, unfreeze, or tag other robot balls. We were able to achieve this interactivity by: 1. Sending commands from the phone above the arena through Firebase’s Realtime Database; and 2. Configuring the Android phones that were connected to the Sphero devices via Bluetooth relay those same commands to the robotic ball.

Conclusion

In the AI in Motion project, we set out to build a system that demonstrated how computers can see, understand, and react to the world around us. We used Google Cloud Platform for complex training and deployed predictions in the real world to control physical robotic balls. Across the project, we used many existing tools, models, and insights from the TensorFlow community. Our approach reflected the related and sequential processes of gathering and preparing data, choosing a model, training the model, evaluating and tuning the model, and finally deploying the model for predictions. We hope you’ll find the resources and inspiration to build your own interactive, physical, and robotic games!

Posted in