Classifying text content with the Natural Language API
Sara Robinson
Developer Advocate, Google Cloud Platform
If you work in the media industry, chances are you’ve spent more hours than you’d like manually tagging text content like blogposts, news articles or marketing copy. With the Natural Language API, you can now tag all of this content with a single API call.
Using the new classify_text
endpoint, the Natural Language API will return a content category for your text. The content categories includes a set of Tier 1 high level categories (like “Arts & Entertainment”) along with a set of Tier 2 categories that provide more granularity (like “Visual Art & Design”), with around 700 categories in total.
To try it out, I wrote a Python script that uses data provided by the New York Times API to get the top stories for each section. Then, I combined the title and abstract for each article and sent it to the classify_text
endpoint for categorization. For example, the following title and abstract from this article:
Rafael Montero Shines in Mets’ Victory Over the Reds. Montero, who was demoted at midseason, took a one-hitter into the ninth inning as the Mets continued to dominate Cincinnati with a win at Great American Ball Park.
Results in this JSON response from the NL API:
Each response includes a Tier 1 and Tier 2 category, and we can look at the original article to confirm that these categories are correct.
Once I get the article title and abstract text from the NYT API, calling the Natural Language API is just a few lines of code. Here’s an example using Python:
The API can also return multiple categories. Here’s an example of an article from the food section of The New York Times that fits more than one category:
A Smoky Lobster Salad With a Tapa Twist. This spin on the Spanish pulpo a la gallega skips the octopus, but keeps the sea salt, olive oil, pimentón and boiled potatoes.
And here’s the NL API’s response: