AI & Machine Learning

AI in motion: designing a simple system to see, understand, and react in the real world (Part I)

Google Next '18 attendees play a game, controlling spherical robots against an AI-controlled one

Companies want to use AI to build smart systems that can react to real world events. However, developers who are new to the field are often intimidated by both the pace of change and the perceived complexity of building an AI system. At Google Cloud Next ‘18, we set out to build a simple system that demonstrated how computers can see, understand, and react to the world around us. The system we designed plays a series of robotic games with—and against—humans, and also showed how off-the-shelf components and the cloud can be used together to train and deploy complex models in the real world.

In this three-part blog series, we will walk through how we designed the system, how we selected, trained and tuned the models, and finally how we deployed the models to do near real-time predictions.


AI in motion

There are many opportunities for machine learning to influence or control physical systems. For example, a warehouse or retail inventory management system could use object detection to identify when supply is low or in the wrong location. The system could then decide on the best corrective action, such as sending an alert or automatically placing an  order to resupply the warehouse. Similarly, a quality control system at an assembly line could leverage an object detection model to look for flaws in the product and then take an action to isolate the item for further inspection. You might even take it one step further and use machine learning insights and robotics to correct the problem.

Along these lines, we wanted to show how a system could leverage machine learning to see the world, understand complex and rapidly changing events, and then to react in real time to what it sees and understands.

Design goals

System design considerations included:

  1. Cloud and the physical world. As more and more machines and devices become ‘thinking machines’, we wanted to demonstrate the ‘motion’ that AI can drive in the form of a physical device—in this case a robotic ball. To do that, we aimed to use the cloud for complex, intensive training, deploying predictions in the real world to control physical devices.

  2. Leverage. Our aim was to use existing tools, models, and apps whenever possible. We wanted to show that great things can be built using widely available assets, so we used existing blogs, APIs, and scripts published by the TensorFlow community to help us move rapidly.

  3. Automation. Machine learning projects can be labor intensive. We wanted to show how a close approximation can be designed in areas such as data generation and simulation in order to lower costs and improve time-to-market.

  4. Fun. Finally, our demo needed to interact with people and the experience needed to be as fun as it was educational.

Architecture decisions

In order to accelerate training and to demonstrate the benefits of the cloud, we decided to do all of our training in the cloud. This decision meant that we could do rapid iterations on our models at a low cost and we could leverage the latest hardware developments such as GPUs and TPUs.

A second important consideration was the need to do on-device prediction. On-device prediction mimics many real world scenarios that require low latency decisions. Also, local prediction demonstrates that AI can be useful on mobile or IoT devices even when an internet connection may not be available. We chose to use an Android device (a Pixel 2) but our approach is useful in many IoT devices with requisite compute power.

In order to accomplish the local predictions and near real-time gameplay, we realized we would need at least two models: one to “see” what was happening and another to “predict and react”. In our case the “seeing” model we dubbed the Object Detection Model and the “react” model we called the Commander Model. Both models heavily leveraged and were informed by existing TensorFlow assets and the TensorFlow community via blogs, scripts, and tools.

AI games

The robotic games we designed were all variants of tag. They required the deployed models to see objects and obstacles and then make near real-time decisions regarding the optimal direction and path based on what it sees. We experimented with many alternatives but settled on four distinct AI-Human games.

  1. Bot Freeze Tag. In this game, one AI-controlled bot played three human-controlled robots. The AI bot’s objective was to ‘tag’, or run into, all of the human-controlled robots before time ran out. When a human-controlled robot was tagged, it became frozen until a human teammate unfroze it by touching the robot with his or her own robot. Players earned points based on how long a human player avoided getting frozen, and for unfreezing teammates.

  2. Human Freeze Tag. We turned things around in this game, with three AI bots playing just one human player. The human player’s object was to tag all of the AI controlled robots before time ran out. When the player tagged an AI ball, the ball became frozen until one of its AI teammates unfroze it by touching the robot. Points were earned based on how quickly the player froze all of the AI robots.

  3. Zombie Apocalypse. At the beginning of the game, one AI bot and three human players  competed. The AI bot’s objective was to tag the untagged human controlled robots. At first, all three human player’s must avoid the AI bot. However when a human’s robot is tagged, it joined the “it” team and had to partner with the AI bot and other infected players to tag the remaining uninfected players. Score is gained based on how long a player remains untagged.

  4. Capture the Flag. In this game, one human player competed against two AI trained robots with the objective of touching each of the four targets (the ‘flags’), while avoiding being tagged by the AI controlled robots. Points were earned based on how quickly the four flags were captured, with points lost for being tagged by the AI robots.

We decided to use Sphero robots, cue-ball sized round balls enclosed in hardened plastic, capable of rolling around, and controlled by a smartphone or tablet. We chose Sphero robots because of their simple SDK, end-user application, low cost, and overall curb appeal.


The process we used was a generally straightforward machine learning process as described in several TensorFlow 101 trainings, such as Yufeng Guo’s fantastic blog series AI Adventures.
Process for training an on-device model for competition
Machine learning process

The process consisted of:

  1. Gather and prepare the data. In our case we had to generate and simulate data.

  2. Choose a model. We were able to leverage and modify existing models.

  3. Train, evaluate and tune. Cloud was a key accelerator for training.

  4. Predict. On device prediction in near real-time was key to the experience.

You too can apply artificial intelligence to problems that exist in the physical world. In the AI in Motion project at Next ‘18 we set out to demonstrate a system that could see, understand, and react to the world.  We set key design principles and made architecture decisions that reflect what is possible with simple, standard components, and we planned a series of robotic games where AI controlled robots interacts with human controlled robots.

Stay tuned for more—we’ll be publishing the second and third parts of this series in the coming weeks. In part two of this series, we’ll walk you through the actual steps we used to generate and prepare the data, to select models for object detection and control, and to train, evaluate, and tune the models. Part three will share how we managed on-device prediction to drive AI-controlled gameplay.