Jump to Content
AI & Machine Learning

Simplifying ML predictions with Google Cloud Functions

September 21, 2018
Sara Robinson

Staff Developer Relations Engineer

Zack Akil

Developer Advocates

If you develop on Google Cloud Platform (GCP) and haven’t already tried out Cloud Functions, our serverless event-driven platform, it’s worth taking a look. Our favorite part about Cloud Functions is that you can use it to connect all sorts of services across GCP and beyond. You can think of it as the glue between different pieces of your application. Since we focus on machine learning, we wanted to share a demo we built that uses Cloud Functions to generate predictions on the fly from a text classification model hosted on Cloud Machine Learning Engine, our managed service for training and serving ML models. It supports multiple ML frameworks including TensorFlow, Scikit-learn, Keras, and XGBoost. If you’re more of a video person, you can watch us presenting this demo at Cloud Next ‘18.

Demo architecture

For our demo, we built an app for predicting the genres of a movie from its description using a text classification model. Here’s what it looks like:


To show off the power of Cloud ML Engine we built two versions of the model independently—one in Scikit-learn and one in TensorFlow—and built a web app to easily generate predictions from both versions. Because these models were built with entirely different frameworks and have different dependencies, it previously required a lot of code to build even a simple app that queried both models. Cloud ML Engine provides a centralized place for us to host multiple types of models, and streamlines the process of querying them.

And before we get into the details, you may be wondering why you’d need multiple versions of the same model. If you’ve got data scientists or ML engineers on your team, they may want to experiment independently with different model inputs and frameworks. Or, maybe they’ve built an initial prototype of a model and will then obtain additional training data and train a new version. A web app like the one we’ve built provides an easy way to compare output, or even load test across multiple versions.

For the frontend, we needed a way to make predictions directly from our web app. Because we wanted the demo to focus on Cloud ML Engine serving, and not on boilerplate details like authenticating our Cloud ML Engine API request, Cloud Functions was a great fit. The frontend consists of a single HTML page hosted on Cloud Storage. When a user enters a movie description in the web app and clicks “Get Prediction,” it invokes a cloud function using an HTTP trigger. The function sends the text to ML Engine, and parses the genres returned from the model to display them in the web UI.

Here’s an architecture diagram of how it all fits together:


Now it’s time to dive into the specifics of our cloud function.

Calling Cloud ML Engine from an HTTP cloud function

One of the great things about Cloud ML Engine is that it supports models built with multiple frameworks. We’ve deployed our Scikit-learn and TensorFlow models as different versions:


Switching between versions is as simple as changing the version name in our API request. In the frontend, we pass to our function the model we’d like to query alongside the input data we want to get a prediction on. Cloud Functions handles project authentication for us out of the box, so all we need to do to authenticate is call getApplicationDefault():


Don’t forget to add the googleapis dependency to the package.json.


Then, if the authentication has no errors,  we use the version parameter and input data we passed into our function to set up the ML Engine request JSON:


To make our Cloud ML Engine request, we call projects.predict() from the ml export in the googleapis package, and if the request is successful we send the prediction response back to our app frontend:

So first we import the ml module after our googleapis import:


Then after we have our ML Engine request JSON, we call ml.projects.predict():


The exact structure of the JSON response depends on the type of machine learning model it is calling. The following is an example request and response JSON for a model that predicts a movie’s genre based on its description:


The prediction response is a nine-element one-hot vector representing nine possible movie genres. One-hot-encoding is a common output for classification problems. For this example, the third number in the output vector represents comedy, indicating that our model is predicting this movie is a comedy. If we were querying a regression model, for example to predict a continuous value such as movie revenue, you would see just a single numeric value in the predictions field.


The final piece of the puzzle when setting up this cloud function is to define the CORS headers, which allows a frontend hosted on a different domain to call our cloud function. In our case, we have our frontend served directly from Cloud Storage, so we will add the storage.googleapis URL to the Access-Control-Allow-Origin header. Most browsers will send a blank OPTIONS request when calling external APIs just to check the CORS headers, so we will make sure to create the headers and return a blank 204 response for those requests.  Here is the complete code of our cloud function:


That’s it! Using a single cloud function, we have a serverless app that can make predictions from two ML models built with entirely different frameworks.

Start building

Want to build your own serverless ML applications with Cloud Functions? Dive into the Cloud Functions docs here. We focused on Cloud ML Engine here, but you could easily write a similar function to call any of our Machine Learning APIs or AutoML. If you’ve got questions or topics you’d like to see covered in a future post, find us on Twitter @SRobTweets and @ZackAkil.
Posted in