Sentiment Analysis Tutorial

Audience

This tutorial is designed to let you quickly start exploring and developing applications with the Google Cloud Natural Language API. It is designed for people familiar with basic programming, though even without much programming knowledge, you should be able to follow along. Having walked through this tutorial, you should be able to use the Reference documentation to create your own basic applications.

This tutorial steps through a Natural Language API application using Python code. The purpose here is not to explain the Python client libraries, but to explain how to make calls to the Natural Language API. Applications in Java and Node.js are essentially similar. Consult the Natural Language API Samples for samples in other languages (including this sample within the tutorial).

Prerequisites

This tutorial has several prerequisites:

Analyzing document sentiment

This tutorial walks you through a basic Natural Language API application, using an analyzeSentiment request, which performs sentiment analysis on text. Sentiment analysis attempts to determine the overall attitude (positive or negative) and is represented by numerical score and magnitude values. (For more information on these concepts, consult Natural Language Basics.)

We'll show the entire code first. (Note that we have removed most comments from this code in order to show you how brief it is. We'll provide more comments as we walk through the code.)

"""Demonstrates how to make a simple call to the Natural Language API."""

import argparse

from google.cloud import language
from google.cloud.language import enums
from google.cloud.language import types


def print_result(annotations):
    score = annotations.document_sentiment.score
    magnitude = annotations.document_sentiment.magnitude

    for index, sentence in enumerate(annotations.sentences):
        sentence_sentiment = sentence.sentiment.score
        print('Sentence {} has a sentiment score of {}'.format(
            index, sentence_sentiment))

    print('Overall Sentiment: score of {} with magnitude of {}'.format(
        score, magnitude))
    return 0


def analyze(movie_review_filename):
    """Run a sentiment analysis request on text within a passed filename."""
    client = language.LanguageServiceClient()

    with open(movie_review_filename, 'r') as review_file:
        # Instantiates a plain text document.
        content = review_file.read()

    document = types.Document(
        content=content,
        type=enums.Document.Type.PLAIN_TEXT)
    annotations = client.analyze_sentiment(document=document)

    # Print the results
    print_result(annotations)


if __name__ == '__main__':
    parser = argparse.ArgumentParser(
        description=__doc__,
        formatter_class=argparse.RawDescriptionHelpFormatter)
    parser.add_argument(
        'movie_review_filename',
        help='The filename of the movie review you\'d like to analyze.')
    args = parser.parse_args()

    analyze(args.movie_review_filename)

This simple application performs the following tasks:

  • Imports the libraries necessary to run the application
  • Takes a text file and passes it to the main() function
  • Reads the text file and makes a request to the service
  • Parses the response from the service and displays it to the user

We'll go over these steps in more detail below.

Importing libraries

import argparse

from google.cloud import language
from google.cloud.language import enums
from google.cloud.language import types

We import argparse, a standard library, to allow the application to accept input filenames as arguments.

For using the Cloud Natural Language API, we'll also want to import the language module from the google-cloud-language library. The types module contains classes that are required for creating requests.

Running your application

if __name__ == '__main__':
    parser = argparse.ArgumentParser(
        description=__doc__,
        formatter_class=argparse.RawDescriptionHelpFormatter)
    parser.add_argument(
        'movie_review_filename',
        help='The filename of the movie review you\'d like to analyze.')
    args = parser.parse_args()

    analyze(args.movie_review_filename)

Here, we simply parse the passed argument for the text filename and pass it to the analyze() function.

Authenticating to the API

Before communicating with the Natural Language API service, you need to authenticate your service using previously acquired credentials. Within an application, the simplest way to obtain credentials is to use Application Default Credentials (ADC). By default, ADC will attempt to obtain credentials from the GOOGLE_APPLICATION_CREDENTIALS environment file, which should be set to point to your service account's JSON key file. (You should have set up your service account and environment to use ADC in the Quickstart. See Setting Up a Service Account for more information.)

The Google Cloud Client Library for Python automatically uses the application default credentials.

Making the request

Now that our Natural Language API service is ready, we can access the service by calling the analyze_sentiment method of the LanguageServiceClient instance.

The client library encapsulates the details for requests and responses to the API. See the Natural Language API Reference for complete information on the specific structure of such a request.

def analyze(movie_review_filename):
    """Run a sentiment analysis request on text within a passed filename."""
    client = language.LanguageServiceClient()

    with open(movie_review_filename, 'r') as review_file:
        # Instantiates a plain text document.
        content = review_file.read()

    document = types.Document(
        content=content,
        type=enums.Document.Type.PLAIN_TEXT)
    annotations = client.analyze_sentiment(document=document)

    # Print the results
    print_result(annotations)

This code snippet performs the following tasks:

  1. Instantiates a LanguageServiceClient instance as the client.
  2. Reads the filename containing the text data into a variable.
  3. Instantiates a Document object with the contents of the file.
  4. Calls the client's analyze_sentiment method.

Parsing the response

def print_result(annotations):
    score = annotations.document_sentiment.score
    magnitude = annotations.document_sentiment.magnitude

    for index, sentence in enumerate(annotations.sentences):
        sentence_sentiment = sentence.sentiment.score
        print('Sentence {} has a sentiment score of {}'.format(
            index, sentence_sentiment))

    print('Overall Sentiment: score of {} with magnitude of {}'.format(
        score, magnitude))
    return 0

We walk through the response to extract the sentiment score values for each sentence, and the overall score and magnitude values for the entire review, and display those to the user.

Run the sample

To run our sample, we'll test it on a set of (fake) movie reviews for the movie "Bladerunner."

  1. Download the samples from Google Cloud Storage:

    gsutil cp gs://cloud-samples-tests/natural-language/sentiment-samples.tgz .
    

  2. Unzip those samples, which will create a "reviews" folder:

    gunzip sentiment-samples.tgz
    tar -xvf sentiment-samples.tar
    

  3. Run our sentiment analysis on one of the specified files:

    python sentiment-analysis.py reviews/bladerunner-pos.txt
    Sentence 0 has a sentiment score of 0.8
    Sentence 1 has a sentiment score of 0.9
    Sentence 2 has a sentiment score of 0.8
    Sentence 3 has a sentiment score of 0.2
    Sentence 4 has a sentiment score of 0.1
    Sentence 5 has a sentiment score of 0.4
    Sentence 6 has a sentiment score of 0.3
    Sentence 7 has a sentiment score of 0.4
    Sentence 8 has a sentiment score of 0.2
    Sentence 9 has a sentiment score of 0.9
    Overall Sentiment: score of 0.5 with magnitude of 5.5
    

The above example would indicate a review that was relatively positive (score of 0.5), and relatively emotional (magnitude of 5.5).

Running analysis on the other examples should produce values similar to those shown below:

python sentiment-analysis.py reviews/bladerunner-neg.txt
...
Overall Sentiment: score of -0.6 with magnitude of 3.3

python sentiment-analysis.py reviews/bladerunner-mixed.txt
...
Overall Sentiment: score of 0 with magnitude of 4.7

python sentiment-analysis.py reviews/bladerunner-neutral.txt
...
Overall Sentiment: score of -0.1 with magnitude of 1.8

Note that the magnitudes are all similar (indicating a relative equal amount of emotionally significant sentiment) except for the "neutral" case, which indicates a review with not very much emotional sentiment, either positive or negative. (For more information on sentiment scores and magnitude, and how to interpret these values, see Interpreting Sentiment Analysis Values.)

If you wish to explore sentiment analysis with more data, Stanford provides a dataset of IMDB movie reviews. To retrieve these movie reviews:

  1. Download the Large Movie Review dataset.
  2. Unzip the file into your working directory. The movie reviews are divided into pos and neg directories within train and test data directories, with each text file containing one movie review.
  3. Run the sentiment-analysis.py tool on any of the movie review text files.

Congratulations! You've performed your first inference tasks using the Google Cloud Natural Language API!

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...

Google Cloud Natural Language API Documentation