Data Analytics

The democratization of data and insights: Expanding machine learning access

November 16, 2020

Sudhir Hasbe

Sr. Director of Product Management, Google Cloud

Ryan Lippert

Product Manager

In the first blog in this series, we discussed how data availability, data access, and insight access have evolved over time, and what Google Cloud is doing today to help customers democratize the production of insights across organizational personas. In this blog we’ll discuss why artificial intelligence (AI) and machine learning (ML) are critical to generating insights in today’s world of big data, as well as what Google Cloud is doing to expand access to this powerful method of analysis.

A report by McKinsey highlights the stakes at play: by 2030, companies that fully absorb AI could double their cash flow, while companies that don’t could see a 20% decline. ML and AI have traditionally been seen as the domain of experts and specialists with PhDs, so it’s no surprise that many business leaders frame their ML goals around HR challenges: creating new departments, hiring new employees, developing retaining programs for the existing workforce, and so on. But this isn’t the way it has to be. At Google Cloud, we’re focused not only on making the experts more efficient but also driving ML capabilities into the day-to-day work for anyone who works with data.

For experts, the traditional ML audience, we’ve built an entire suite of tools. Our AI Platform makes it easy for them to rapidly iterate and turn ideas to deployment efficiently. Across ML teams, AI Hub makes it easier to collaborate with teammates to avoid duplicating work streams and get work done faster. Finally, TensorFlow Enterprise delivers supported and scalable TensorFlow in the cloud, directly from the leading contributors to the OSS project (us!). Making existing experts nimbler and faster helps them increase their output, which expands access to ML within an organization.

However, to truly integrate ML throughout an entire organization, we need to create tools that more personas can use to drive actionable insights. Let’s take a look at what Google Cloud is doing to democratize ML across three key personas: data analysts, developers, and data engineers.

Data Analysts

Data analysts, as we mentioned in our first blog, are the data analytics backbone of many Fortune 500 companies. They’re experts within a data warehouse, very comfortable with SQL, and knowledgeable about the needs of the business. We knew that to drive ML capabilities to this persona, we would need to meet them where their expertise already was.

That’s exactly what BigQuery ML does: it brings ML inside the data warehouse, and it’s deployed using just a few easy-to-use SQL statements—much more familiar to analysts than the Python, R, and Scala-reliant tools on which many data scientists rely. When combined with BigQuery’s ability to scale to larger data volumes than traditional enterprise data warehouses, BigQueryML gives data analysts the ability to drive ML across vast amounts of data to uncover previously unseen insights. There are a wide variety of available models within BigQuery that can help customers drive use cases as varied as recommendations, segmentation, anomaly detection, forecasting, and prediction. Further, if there’s a need for custom models, ML experts can build models to import into BigQuery, where analysts can use them at scale.

We’ve seen customers in very different industries with very different use cases successfully deploy BigQuery ML. Telus has used ML to deploy anomaly detection that secures its network; UPS has used it to achieve precise package volume forecasting; Geotab is driving smarter cities by blending ML and geospatial analytics; and we’ve even seen BigQuery ML deployed to predict movie audiences. Beyond that, we see retailers predicting purchasing, financial services institutions determining insurance risk, and gaming companies forecasting long-term customer value. This analysis would have been impossible for data analysts to drive in the past. Today, it’s not only efficient, but it also has a very quick path to production.

With the growing functionality of BigQuery ML, data-savvy team members have less need to also build expertise in transferring large amounts of data into and out of the BigQuery environment, and learning how to parallelize and scale data pipelines to handle deployment. By working directly in BigQuery for data cleaning, model training, and deployment, you can spend more time focused on understanding the data and delivering value from it, rather than moving it around.

Daniel Lewis, Senior Data Scientist, R&D Specialist, Geotab

Tweet this quote

Developers

For the developer audience, we’ve developed two different types of services that democratize ML and serve as “building blocks” in creating applications. The first is a set of pre-trained models that are easily accessible by APIs. These APIs tackle many common use cases around sight, language, conversation, and more. For models that require more specificity, such as identifying all trucks of a particular make and model versus general identification of a truck, we offer AutoML custom models, which empower developers to build domain-specific customer models. These tools have enabled companies like Keller Williams, USA Today, PWC, AES Corporation, and more.

With AutoML Vision, nearly half of our inspection images no longer need human review. Google is a great partner, because their technology is consistently among the world leaders.

Nicholas Osborn, Director, AES Digital Hub

Tweet this quote

When it comes to building machine learning models at scale, AutoML Tables gives developers (as well as data scientists and analysts) the ability to automatically build and deploy ML models on structured data with incredible speed. A codeless interface not only makes it easy for anyone to build models and incorporate them into broader applications, but it also saves time, saves money, and increases the quality of deployed ML models. Using AutoML Tables, we’ve seen customers deliver marketing programs that delivered 150% more subscribers per dollar spent and user engagement at 140% of industry averages, all by communicating to the right user in the right place at the right time.

Further, these ML APIs do more than enable application developers. For ETL developers using Cloud Data Fusion, it’s easy to integrate these APIs into your data integration pipelines to enhance and prepare analysis for downstream applications and users. ML is now as easy as point, click, drag, and drop.

Data Engineers

The final persona in our discussion of ML democratization is the data engineer. It’s worth mentioning that all of the personas we’ve discussed benefit from the autoscaling nature of Google Cloud’s platform, which eliminates the need for time-intensive tuning and provisioning of infrastructure to run ML models. This work can disproportionately fall to data engineers (or can turn data scientists into de facto data engineers as they try to productionize their models).

We’ve worked to embed ML capabilities in both buckets of data engineering we see at Google: the Dataproc-oriented open source path, as well as the cloud-native Dataflow path. Let’s examine both.

For open source adherents and those familiar with Hadoop and Spark environments, we make it easy to run SparkML jobs that you may be comfortable building, or have previously built. We have an easy-to-run Qwiklab that can introduce you to the concept of ML with Spark on Dataproc, and you can try that out with free credits. We also give customers the ability to build custom OSS clusters on custom machines - and do it fast - to bring GPU-powered ML to our customers. Together with features announced earlier this year, Dataproc users can now quickly deploy ML leverage easy-to-use notebooks, schedule cluster deletion, and more.

For data engineers using Dataflow, Google Cloud has made it easy to use Tensorflow Extended (TFX) to build and manage ML workflows in production. Working through Apache Beam (Dataflow’s SDK), this integration yields a toolkit for building ML pipelines, a set of standard components you can use as a part of a pipeline or ML training script, and libraries for the base functionality of many standard components. Our solutions teams are working to make this even easier, releasing common patterns like anomaly detection, which telco customers are putting to use for cybersecurity while banks use it to detect financial fraud.

Wrapping up

Bringing ML capabilities to this broad set of new personas democratizes the most important aspect of big data: generating insights that help businesses drive predictions, new customer segments, recommendations, or more. The deeper insights provided by ML are going to become more and more critical to business success, which means the businesses that succeed are going to be the ones that can deploy ML and artificial intelligence widely. At Google, we know the best ideas tend to bubble up rather than get pushed down. When your full organization has access to both data and the tools to analyze the data, you’re ready for whatever comes next. If you’d like to give machine learning a try today, the BigQuery sandbox is a great (and free!) place to get started trying out BigQuery ML.

Having discussed the importance of democratizing data, insights, and ML, our next blog will address how to take advantage of these insights in real-time—a critical piece of delighting customers and staying ahead of the competition.

Data Analytics