A closer look at our newest Google Cloud AI capabilities for developers
Levent Besik
Director of Product Management, Google Cloud Artificial Intelligence, Google Cloud
At Next ‘18 this past July, we announced a range of updates to our AI and machine learning offerings aimed at making AI more accessible to developers. With the excitement of Next behind us, we thought we’d share a little more on these updates and how they can help you quickly and easily inject AI into your applications.
Easily build custom ML models with AutoML
Cloud AutoML is a suite of machine learning products that leverages Google’s state-of-the-art transfer learning and neural architecture search (NAS) technology so you can easily train high quality custom models, even if you have limited experience with machine learning. This delivers the best of both worlds: high model quality and ease of use. This new suite of products aligns with our mission to democratize AI, and make it easy, fast and useful for all developers and enterprises.
At Next ‘18, we announced our first three AutoML offerings: AutoML Vision, AutoML Natural Language, and AutoML Translation. All are now available in beta.
AutoML Vision
Although our pre-trained Cloud Vision API is a popular way for customers to quickly inject AI into their applications, some have found they need a more specialized ML model to address their unique business cases. With AutoML Vision, you can upload your own image datasets, and then create a custom image ML model, even if you have limited machine learning or coding experience.
At Next ‘18, Chevron shared how they’re using AutoML Vision to classify millions of documents containing decades worth of geographic data. Their solution enables them to identify and classify images inside those documents so their analysts can quickly access crucial information.
“AutoML met our unique requirements in a very short timeframe,” says Laura L. Bandura, Ph.D Research Geophysicist at Chevron. “We can now find our documents in seconds instead of weeks, giving us the freedom to make well-informed and timely decisions, and transforming the way business decisions are made at Chevron.”
To learn more on AutoML Vision, watch our Next ‘18 breakout session.
AutoML Natural Language
AutoML Natural Language lets you automatically predict custom text categories through either single or multi-label classification. As with AutoML Vision, AutoML Natural Language helps customers that need a more customized solution than our pre-trained Natural Language API. For example, you can train the model to identify specific categories of requests from schools to a charity, or to apply universal taxonomy to content, as Hearst has done.
"With world-renowned brands such as PEOPLE, Better Homes & Gardens, Martha Stewart Living, Allrecipes, and Food & Wine, Meredith Corporation is continually innovating the ways we develop, deliver and manage our content." says Alysia Borsa, Chief Marketing & Data Officer, Meredith Corporation. “We’re looking forward to using Natural Language and AutoML services to apply our custom universal taxonomy to our content. This will help us automatically classify content based on our specific business needs, our custom taxonomy and our custom models, accelerating time to insights. This means we can better identify and respond to content trends and create more relevant and engaging audience experiences. This solution met our needs more than other solutions that we considered.”
AutoML Translation
With AutoML Translation, you can upload translated language pairs to create your own custom domain-specific translation models that leverage Google’s translation expertise. You can then deploy that model to dynamically translate between languages. For example, as Nikkei has done, you can train a domain-specific model to translate specific taxonomy for financial news.
“Nikkei Group is a leading media organization with trusted news sources around the world—from The Nikkei, our flagship Japanese-language paper, to our English-language publication Nikkei Asian Review, to the Financial Times Translating content so that it can be distributed and shared globally is an absolute necessity for us,” says Hiroyuki Watanabe, Managing Director, Digital Business, Nikkei. “AutoML Translation has the level of customization we need, and we've been impressed by its accuracy."
WeLocalize, a major language services provider, recently conducted analysis to evaluate AutoML Translation against other major offerings. They determined that AutoML Translation has the lowest Post-edit (PE) Distance, which is the number of changes that were needed post translation (lower numbers are better). They also evaluated usability, as defined by how understandable and actionable sentences are, and determined that AutoML Translation has the highest usability of all the offerings they reviewed. To see more from WeLocalize watch their presentation from Next.
To learn more on AutoML Natural Language and AutoML Translation, watch our Next breakout session, “Latest Developments in Translate & Natural Language AI.”
Updates to our Cloud AI APIs
We are also continuing to improve our pre-trained Cloud AI APIs to better meet your needs:
Cloud Vision API
We’ve recently launched several enhancements in the Cloud Vision API. Handwriting recognition makes it possible for you to identify handwritten text in a document. Support for additional file types now lets you recognize text in PDF and TIFF files. And object localization helps you identify where an object is located within an image. Additionally, we’re also making some of the core technology that powers style search in Google Lens available in the Vision API Product Search feature. This means you can now identify visually similar items from your own product catalog within an image.
Cloud Text-to-Speech and Cloud Speech-to-Text
We’re also launching improvements to Cloud Text-to-Speech, including multilingual access to DeepMind WaveNet voices and the ability to optimize for the type of speaker from which your speech is intended to play.
Improvements to Cloud Speech-to-Text include the ability to identify what language is spoken in the utterance, speaker diarization, and multi-channel recognition to record each participant separately in multi-participant recordings. We’ll be expanding on these updates in a speech-focused blog post coming soon—keep an eye out for more on that.
We’re excited to offer this new functionality to developers and enterprises and can’t wait to see how you will use them to infuse AI into your applications. Start exploring Cloud AutoML today, or take a look at our APIs for Vision, Speech-to-Text and Text-to-Speech.