Google Cloud Big Data and Machine Learning Blog

Innovation in data processing and machine learning technology

Classifying text content with the Natural Language API

Tuesday, September 19, 2017
By Sara Robinson, Developer Advocate

If you work in the media industry, chances are you’ve spent more hours than you’d like manually tagging text content like blogposts, news articles or marketing copy. With the Natural Language API, you can now tag all of this content with a single API call.

Using the new classify_text endpoint, the Natural Language API will return a content category for your text. The content categories includes a set of Tier 1 high level categories (like “Arts & Entertainment”) along with a set of Tier 2 categories that provide more granularity (like “Visual Art & Design”), with around 700 categories in total.

To try it out, I wrote a Python script that uses data provided by the New York Times API to get the top stories for each section. Then, I combined the title and abstract for each article and sent it to the classify_text endpoint for categorization. For example, the following title and abstract from this article:

Rafael Montero Shines in Mets’ Victory Over the Reds. Montero, who was demoted at midseason, took a one-hitter into the ninth inning as the Mets continued to dominate Cincinnati with a win at Great American Ball Park.

Results in this JSON response from the NL API:

{ categories: 
  name: '/Sports/Team Sports/Baseball',
       confidence: 0.99 

Each response includes a Tier 1 and Tier 2 category, and we can look at the original article to confirm that these categories are correct.

Once I get the article title and abstract text from the NYT API, calling the Natural Language API is just a few lines of code. Here’s an example using Python:

from import language_v1beta2
from import enums
from import types

language_client = language_v1beta2.LanguageServiceClient()

document = types.Document(
    content="Your text to classify here",

result = language_client.classify_text(document)

for category in result.categories:
    print('category name: ',
    print('category confidence: ', category.confidence, '\n')
The API can also return multiple categories. Here’s an example of an article from the food section of The New York Times that fits more than one category:

A Smoky Lobster Salad With a Tapa Twist. This spin on the Spanish pulpo a la gallega skips the octopus, but keeps the sea salt, olive oil, pimentón and boiled potatoes.

And here’s the NL API’s response:

{ categories: 
      name: '/Food & Drink/Cooking & Recipes',
       confidence: 0.85 
       name: '/Food & Drink/Food/Meat & Seafood',
       confidence: 0.63 

Get started

Start classifying your own text by diving into the docs here. We’d love to hear what you build with the NL API. Let us know what you think in the comments or find me on Twitter @SRobTweets.

  • Big Data Solutions

  • Product deep dives, technical comparisons, how-to's and tips and tricks for using the latest data processing and machine learning technologies.

  • Learn More

12 Months FREE TRIAL

Try BigQuery, Machine Learning and other cloud products and get $300 free credit to spend over 12 months.