此页面由 Cloud Translation API 翻译。

情感分析教程

受众群体

本教程旨在让您快速开始通过 Cloud Natural Language API 探索和开发应用它专为熟悉基本编程的人设计，但即使没有太多编程知识，您也应该能够按照说明操作。阅读完本教程后，您应该能够根据参考文档创建您自己的基本应用。

本教程介绍使用 Python 代码的 Natural Language API 应用。但我们的目的不是解说 Python 客户端库，而是说明如何调用 Natural Language API。Java 和 Node.js 中的应用本质上是相似的。请参阅 Natural Language API 示例，获取其他语言的示例（包括本教程中的示例）。

前提条件

本教程有几个前提条件：

您拥有一个 Google Cloud 账号。如果您是该平台的新手，请创建一个账号来评估我们的产品在实际场景中的表现。新客户还可获享 $300 赠金，用于运行、测试和部署工作负载。
您已经在 Google Cloud 控制台中设置了 Cloud Natural Language API 项目。
您已使用应用默认凭据设置您的环境。
您基本熟悉 Python 编程。
您已经设置了 Python 开发环境。建议您在自己的系统中安装最新版本的 Python、pip、virtualenv。如需查看说明，请参阅 Google Cloud Platform 的 Python 开发环境设置指南。
您已经安装了 Python 版 Google Cloud 客户端库

分析文档情感

本教程引导您使用对文本执行情感分析的 analyzeSentiment 请求完成基本的 Natural Language API 应用。情感分析试图确定文本中表达的整体态度（积极还是消极），并用数字 score 和 magnitude 值表示。（如需详细了解这些概念，请参阅 Natural Language 基础知识。）

我们首先显示整个代码。（请注意，我们已从该代码中删除了大多数注释，以便向您展示它的简短程度。在解说代码的过程中，我们将提供更多注释。）

如需详细了解如何安装和使用 Python 版 Google Cloud Natural Language 客户端库，请参阅 Natural Language API 客户端库。

"""Demonstrates how to make a simple call to the Natural Language API."""

import argparse

from google.cloud import language_v1



def print_result(annotations):
    score = annotations.document_sentiment.score
    magnitude = annotations.document_sentiment.magnitude

    for index, sentence in enumerate(annotations.sentences):
        sentence_sentiment = sentence.sentiment.score
        print(f"Sentence {index} has a sentiment score of {sentence_sentiment}")

    print(f"Overall Sentiment: score of {score} with magnitude of {magnitude}")
    return 0




def analyze(movie_review_filename):
    """Run a sentiment analysis request on text within a passed filename."""
    client = language_v1.LanguageServiceClient()

    with open(movie_review_filename) as review_file:
        # Instantiates a plain text document.
        content = review_file.read()

    document = language_v1.Document(
        content=content, type_=language_v1.Document.Type.PLAIN_TEXT
    )
    annotations = client.analyze_sentiment(request={"document": document})

    # Print the results
    print_result(annotations)




if __name__ == "__main__":
    parser = argparse.ArgumentParser(
        description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter
    )
    parser.add_argument(
        "movie_review_filename",
        help="The filename of the movie review you'd like to analyze.",
    )
    args = parser.parse_args()

    analyze(args.movie_review_filename)

此简单应用执行以下任务：

导入运行应用所需的库
接收一个文本文件并将其传递给 main() 函数
读取文本文件并向服务发出请求
解析服务的响应并将其显示给用户

我们将在下面更详细地介绍这些步骤。

导入库

如需详细了解如何安装和使用 Python 版 Google Cloud Natural Language 客户端库，请参阅 Natural Language API 客户端库。

import argparse

from google.cloud import language_v1

我们导入 argparse（标准库），以允许应用接受输入文件名作为参数。

为了使用 Cloud Natural Language API，我们还需要从 google-cloud-language 库中导入 language 模块。types 模块包含创建请求所需的类。

运行应用

if __name__ == "__main__":
    parser = argparse.ArgumentParser(
        description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter
    )
    parser.add_argument(
        "movie_review_filename",
        help="The filename of the movie review you'd like to analyze.",
    )
    args = parser.parse_args()

    analyze(args.movie_review_filename)

此时，我们简单解析传递的文本文件名参数并将其传递给 analyze() 函数。

向 API 进行身份验证

与 Natural Language API 服务进行通信之前，您需要使用先前获得的凭据来验证您的服务。在应用中，获取凭证的最简单方法是使用应用默认凭证 (ADC)。默认情况下，ADC 将尝试从 GOOGLE_APPLICATION_CREDENTIALS 环境文件获取凭据，该文件应设置为指向您的服务账号的 JSON 密钥文件。（您应该已在快速入门中设置了您的服务账号和环境以使用 ADC。如需了解详情，请参阅设置服务账号。）

Python 版 Google Cloud 客户端库自动使用应用默认凭据。

发出请求

由于我们的 Natural Language API 服务现已就绪，因此我们可以通过调用 LanguageServiceClient 实例的 analyze_sentiment 方法来访问该服务。

客户端库封装了对 API 请求和响应的详细信息。有关此类请求具体结构的完整信息，请参阅 Natural Language API 参考。

如需详细了解如何安装和使用 Python 版 Google Cloud Natural Language 客户端库，请参阅 Natural Language API 客户端库。

def analyze(movie_review_filename):
    """Run a sentiment analysis request on text within a passed filename."""
    client = language_v1.LanguageServiceClient()

    with open(movie_review_filename) as review_file:
        # Instantiates a plain text document.
        content = review_file.read()

    document = language_v1.Document(
        content=content, type_=language_v1.Document.Type.PLAIN_TEXT
    )
    annotations = client.analyze_sentiment(request={"document": document})

    # Print the results
    print_result(annotations)

此代码段执行以下任务：

将 LanguageServiceClient 实例作为客户端实例化。
将包含文本数据的文件名读入一个变量。
借助文件内容实例化 Document 对象。
调用客户端的 analyze_sentiment 方法。

解析响应

def print_result(annotations):
    score = annotations.document_sentiment.score
    magnitude = annotations.document_sentiment.magnitude

    for index, sentence in enumerate(annotations.sentences):
        sentence_sentiment = sentence.sentiment.score
        print(f"Sentence {index} has a sentiment score of {sentence_sentiment}")

    print(f"Overall Sentiment: score of {score} with magnitude of {magnitude}")
    return 0

我们处理响应以提取每个句子的情感 score 值，以及整个评论的整体 score 和 magnitude 值，并将这些值显示给用户。

运行示例

为了运行我们的示例，我们将对电影“Bladerunner”的一组（假）电影评论进行测试。

从 Google Cloud Storage 下载示例：
```
gcloud storage cp gs://cloud-samples-tests/natural-language/sentiment-samples.tgz .
```
如需安装最新版本的 Google Cloud CLI，请参阅 gcloud CLI 文档。
解压缩这些示例，这将创建一个“评论”文件夹：
```
gunzip sentiment-samples.tgz
tar -xvf sentiment-samples.tar
```

对其中一个指定文件运行我们的情感分析：

python sentiment_analysis.py reviews/bladerunner-pos.txt
Sentence 0 has a sentiment score of 0.8
Sentence 1 has a sentiment score of 0.9
Sentence 2 has a sentiment score of 0.8
Sentence 3 has a sentiment score of 0.2
Sentence 4 has a sentiment score of 0.1
Sentence 5 has a sentiment score of 0.4
Sentence 6 has a sentiment score of 0.3
Sentence 7 has a sentiment score of 0.4
Sentence 8 has a sentiment score of 0.2
Sentence 9 has a sentiment score of 0.9
Overall Sentiment: score of 0.5 with magnitude of 5.5

上面的例子表明评论相对积极（分数为 0.5）且相对情绪化（量级为 5.5）。

对其他示例运行分析应产生类似于下面所示的值：

python sentiment_analysis.py reviews/bladerunner-neg.txt
...
Overall Sentiment: score of -0.6 with magnitude of 3.3

python sentiment_analysis.py reviews/bladerunner-mixed.txt
...
Overall Sentiment: score of 0 with magnitude of 4.7

python sentiment_analysis.py reviews/bladerunner-neutral.txt
...
Overall Sentiment: score of -0.1 with magnitude of 1.8

请注意，除了“中性”情况外，这些量级都是相似的（表示情绪上重要的情感量相对平等）。中性则表示评论没有太多情绪化的感情，无论是正面还是负面。（有关情绪分数和量级以及如何解读这些值的更多信息，请参阅解读情感分析值。）

如果您想通过更多数据来探索情感分析，可以使用斯坦福大学提供的 IMDB 电影评论数据集。要检索这些电影评论：

下载大型电影评论数据集。
将文件解压缩到您的工作目录中。电影评论分别位于 train 和 test 数据目录的 pos 和 neg 目录中，其中每个文本文件都包含一篇电影评论。
对任一电影评论文本文件运行 sentiment_analysis.py 工具。

恭喜！您已使用 Google Cloud Natural Language API 执行了您的第一个推理任务！