Google Cloud

Experience Google’s machine learning on your own images, voice and text

September 16, 2016

Kaz Sato

Developer Advocate, Cloud AI

Recently, Google added Try the API boxes on the product pages of each of its Cloud Machine Learning APIs: Cloud Vision API, Speech API and Natural Language API. Now anyone can instantly experience the power of Google's machine intelligence on their own images, voice and text. Let's see how it works.

Try Cloud Vision API

Cloud Vision API enables developers to understand the content of an image by encapsulating powerful machine learning models in an easy-to-use REST API. To try it now, go to the Cloud Vision API product page and drop or open any image file onto the Try the API box. Click on the Captcha dialog box to prove you're not an automated script, and drop in your image. Here’s what the Vision API had to say about a picture I took of a Jack O’Lantern that my son and I carved at a Halloween party:

https://storage.googleapis.com/gweb-cloudblog-publish/images/machine-intelligence-11rzr5.max-700x700.PNG

Using the label detection method of the API, Cloud Vision executes image content analysis on the uploaded image. Looks like Cloud Vision’s machine intelligence is smart enough to understand not just the object, but also the context ("halloween," “holiday,” “carving”). Awesome, isn't it? You can also see the API’s response in the raw JSON format by clicking the JSON Response tab.

Optical Character Recognition (OCR)

Another impressive feature of the API is Optical Character Recognition (OCR). It can recognize characters and words in multiple languages inside an image and extract them as strings, attached with the position of each word in the image. Let's try the feature with this image.

https://storage.googleapis.com/gweb-cloudblog-publish/images/machine-intelligene-148vux.max-600x600.PNG

To see this image with full resolution, go here

When you drop this image to the box and open the Text tab, you can see the OCR result.

https://storage.googleapis.com/gweb-cloudblog-publish/images/machine-intelligence-14fhnx.max-700x700.PNG

Even though the words in this image were slanted and unclear, the OCR extracts the words and their positions correctly. It even picks up the word "beacon" on the presenter's t-shirt.

Detection of explicit images, landmarks and logos

Cloud Vision API can also detect other features, including popular landmarks and corporate or product logos — all with a high degree of accuracy. Safe search detection, meanwhile, checks for adult, violent, medical or spoofed images, and is already in production use at several social content providers. Previously, those social providers filtered out objectionable images uploaded by users by hiring a large staff and checking every single image manually. With Safe Search, they can reduce that staff significantly. For more details about safe search detection, check out Sara Robinson’s blog post Filtering inappropriate content with Cloud Vision API. And for landmark detection, check out this detailed post by Greg Wilson, Testing Google Cloud Vision API Landmark detection with my own travel photographs, which has lots of great examples.

https://storage.googleapis.com/gweb-cloudblog-publish/images/machine-intelligence-151ad7.max-500x500.PNG

Try Cloud Speech API

Have you noticed teenagers control their smartphones using their voice? The same voice recognition engine that powers Google Search and Google Now in modern smartphones is behind Cloud Speech API. You can now take advantage of this disruptive technology for your own applications. For example, a call center provider can use Cloud Speech API to convert audio data to text (and later, you can analyze it with Natural Language API — we will discuss that next).

Cloud Speech API also has a Try the API box. Go to the product page, click on the microphone icon, and make a recording up to 30 seconds long. When you finish recording, it uploads the audio data to the API and displays the result.

Convert your voice to text right now

Click on the microphone icon to start recording

Afrikaans (Suid-Afrika)

Bahasa Indonesia (Indonesia)

Bahasa Melayu (Malaysia)

Català (Espanya)

Čeština (Česká republika)

Dansk (Danmark)

Deutsch (Deutschland)

English (Australia)

English (Canada)

English (Great Britain)

English (India)

English (Ireland)

English (New Zealand)

English (Philippines)

English (South Africa)

English (United States)

Español (Argentina)

Español (Bolivia)

Español (Chile)

Español (Colombia)

Español (Costa Rica)

Español (Ecuador)

Español (El Salvador)

Español (España)

Español (Estados Unidos)

Español (Guatemala)

Español (Honduras)

Español (México)

Español (Nicaragua)

Español (Panamá)

Español (Paraguay)

Español (Perú)

Español (Puerto Rico)

Español (República Dominicana)

Español (Uruguay)

Español (Venezuela)

Euskara (Espainia)

Filipino (Pilipinas)

Français (France)

Galego (España)

Hrvatski (Hrvatska)

IsiZulu (Ningizimu Afrika)

Íslenska (Ísland)

Italiano (Italia)

Lietuvių (Lietuva)

Magyar (Magyarország)

Nederlands (Nederland)

Norsk bokmål (Norge)

Polski (Polska)

Português (Brasil)

Português (Portugal)

Română (România)

Slovenčina (Slovensko)

Slovenščina (Slovenija)

Suomi (Suomi)

Svenska (Sverige)

Tiếng Việt (Việt Nam)

Türkçe (Türkiye)

Ελληνικά (Ελλάδα)

Български (България)

Русский (Россия)

Српски (Србија)

Українська (Україна)

עברית (ישראל)

العربية (إسرائيل)

العربية (الأردن)

العربية (الإمارات)

العربية (البحرين)

العربية (الجزائر)

العربية (السعودية)

العربية (العراق)

العربية (الكويت)

العربية (المغرب)

العربية (تونس)

العربية (عُمان)

العربية (فلسطين)

العربية (قطر)

العربية (لبنان)

العربية (مصر)

فارسی (ایران)

हिन्दी (भारत)

ไทย (ประเทศไทย)

한국어 (대한민국)

國語 (台灣)

廣東話 (香港)

日本語（日本）

普通話 (香港)

普通话 (中国大陆)

Processing

00:00 / 0:30

This webpage includes a live demo of the Cloud Speech API which requires access to your microphone. In order to enable the demo, please connect a microphone and allow your browser to access it. If you previously blocked microphone access to this page, you should re-enable it in your browser's settings."

error

We are unable to process your request at this time. Please try again later.

You can also try Cloud Speech API with many languages besides English. Pick from 80 supported languages and their variants from the drop-down menu. Personally, I found the technology works impressively with Japanese too.

Try Natural Language API

Many developers use simple keyword or regular expression matches to process natural language text. In other words, they process text as unstructured data without any clue about what it means.

With Cloud Natural Language API, powerful machine learning models reveal the structure and meaning with an easy-to-use REST API. Now that you can handle text as structured data with various attributes and metadata, it’s possible to add intelligence to your application by processing, analyzing or querying on the text generated by end customers.

Let's look at Natural Language API’s Try the API box. Clicking the Analyze button to explore the default sample text.

https://storage.googleapis.com/gweb-cloudblog-publish/images/machine-intelligence-6kaj7.max-700x700.PNG

The Entities tab displays the entity analysis result. Given the default sample text, the word Google is classified as an organization, and Mountain View is a location. And according to the API, Sundar Pichai is a popular person, and Android must be a consumer good. Cloud Natural Language API also returns links to Wikipedia pages of those entities if they are available.

Now give it a try on your own.

Sentiment and syntactic analysis

For sentiment analysis of the text, click on the Sentiment tab.

https://storage.googleapis.com/gweb-cloudblog-publish/images/machine-intelligence-10g24m.max-300x300.PNG

According to the Cloud Natural Language API, the sentence "Sundar Pichai said in his keynote that users love their new Android phones" has a positive sentiment.

On the Syntax tab, you can see the sentence’s syntactic analysis.

https://storage.googleapis.com/gweb-cloudblog-publish/images/machine-intelligence-7k9zk.max-800x800.PNG

The JSON response from the syntactic analysis method provides the data to build a dependency parse tree of the text, like the one pictured above. With this feature, you can split the whole sentence into several tokens, as well as the parts of speech (POS) of each token such as noun and verb, and dependencies between them. Now the unstructured data becomes structured data with insights about it.

Develop amazing apps with Cloud Machine Learning APIs

As we have seen in this article, it’s easy to experience the power of Google's latest machine learning technologies with their respective Try the API boxes.

Cloud Vision API is now generally available and ready for production use. Speech API and Natural Language API are in beta and anyone can start evaluating them. The time is now for developers to start playing with this game-changing technology.

Posted in