Migrating to Python Client Library v0.26.1

The Client Library for Python v0.26.1 includes some significant changes to how previous client libraries were designed. These changes can be summarized as follows:

  • Consolidation of modules into fewer types

  • Replacing untyped parameters with strongly-typed classes and enumerations

This topic provides details on the changes that you will need to make to your Python code for the Cloud Natural Language API client libraries in order to use the v0.26.1 Python client library.

Running previous versions of the client library

You are not required to upgrade your Python client library to v0.26.1. If you want to continue using a previous version of the Python client library and do not want to migrate your code, then you should specify the version of the Python client library used by your app. To specify a specific library version, edit the requirements.txt file as shown following:

google-cloud-language==0.25

Removed Modules

The following modules were removed in the Python Client Library v0.26.1 package.

  • google.cloud.language.api_responses

  • google.cloud.language.client

  • google.cloud.language.document

  • google.cloud.language.entity

  • google.cloud.language.sentence

  • google.cloud.language.sentiment

  • google.cloud.language.syntax

Required Code Changes

Imports

Include the new google.cloud.language.types module in order to access the new types in the Python Client Library v0.26.1.

The types module contains the new classes that are required for creating requests, such as types.Document. The enums module contains the enumerations for specifying document type. You can continue to use strings such as 'PLAIN_TEXT' and 'HTML' to specify your document type, however we recommend that you use the enumerations in the enums module.

from google.cloud import language
from google.cloud.language import enums
from google.cloud.language import types

Additionally, the new google.cloud.language.enums module contains the enumerations useful for parsing and understanding API responses, such as enums.Entity.Type.PERSON and enums.PartOfSpeech.Tag.ADJ.

Create a client

The Client class has been replaced with the LanguageServiceClient class. Replace references to the Client class with LanguageServiceClient.

Previous versions of the client libraries:

old_client = language.Client()

Python Client Library v0.26.1:

client = language.LanguageServiceClient()

Constructing objects that represent text content

To identify text content from a text string or from a Google Cloud Storage URI use the new Document class.

Constructing objects that represent text content from text string

The following example shows the new way to represent text content from a text string.

Previous versions of the client libraries:

document = old_client.document_from_text(content=text)

Python Client Library v0.26.1:

# Available types: PLAIN_TEXT, HTML
type_ = enums.Document.Type.PLAIN_TEXT

# Optional. If not specified, the language is automatically detected.
# For list of supported languages:
# https://cloud.google.com/natural-language/docs/languages
language = "en"
document = {"content": text_content, "type": type_, "language": language}

You can also analyze HTML by setting document.type = enums.Document.Type.HTML.

Constructing objects that represent text content from Google Cloud Storage URI

The following example shows the new way to represent text content from a Google Cloud Storage URI or a web URI. gcs_uri is the URI to a text file on Google Cloud Storage.

Previous versions of the client libraries:

document = old_client.document_from_gcs_uri(gcs_uri=gcs_uri)

Python Client Library v0.26.1:

# Available types: PLAIN_TEXT, HTML
type_ = enums.Document.Type.PLAIN_TEXT

# Optional. If not specified, the language is automatically detected.
# For list of supported languages:
# https://cloud.google.com/natural-language/docs/languages
language = "en"
document = {"gcs_content_uri": gcs_content_uri, "type": type_, "language": language}

You can also analyze HTML by setting document.type = enums.Document.Type.HTML.

Making requests and processing responses

With the Python Client Library v0.26.1 the API methods such as analyze_sentiment belong to the LanguageServiceClient object as opposed to the Document objects.

The returned values are slightly different for analyze_entities and analyze_syntax as explained below.

Making an analyze sentiment request and processing the response

Previous versions of the client libraries:

document = language_client.document_from_text(text)

sentiment = document.analyze_sentiment().sentiment

print('Score: {}'.format(sentiment.score))
print('Magnitude: {}'.format(sentiment.magnitude))

Python Client Library v0.26.1:

# Available types: PLAIN_TEXT, HTML
type_ = enums.Document.Type.PLAIN_TEXT

# Optional. If not specified, the language is automatically detected.
# For list of supported languages:
# https://cloud.google.com/natural-language/docs/languages
language = "en"
document = {"content": text_content, "type": type_, "language": language}

# Available values: NONE, UTF8, UTF16, UTF32
encoding_type = enums.EncodingType.UTF8

response = client.analyze_sentiment(document, encoding_type=encoding_type)
# Get overall sentiment of the input document
print(u"Document sentiment score: {}".format(response.document_sentiment.score))
print(
    u"Document sentiment magnitude: {}".format(
        response.document_sentiment.magnitude
    )
)

Making an analyze entities request and processing the response

Entities' types are now stored as entity.type as opposed to entity.entity_type.

Previous versions of the client libraries:

document = language_client.document_from_text(text)

entities = document.analyze_entities().entities

for entity in entities:
    print('=' * 20)
    print(u'{:<16}: {}'.format('name', entity.name))
    print(u'{:<16}: {}'.format('type', entity.entity_type))
    print(u'{:<16}: {}'.format('metadata', entity.metadata))
    print(u'{:<16}: {}'.format('salience', entity.salience))
    print(u'{:<16}: {}'.format('wikipedia_url',
          entity.metadata.get('wikipedia_url', '-')))

Python Client Library v0.26.1:

# Available types: PLAIN_TEXT, HTML
type_ = enums.Document.Type.PLAIN_TEXT

# Optional. If not specified, the language is automatically detected.
# For list of supported languages:
# https://cloud.google.com/natural-language/docs/languages
language = "en"
document = {"content": text_content, "type": type_, "language": language}

# Available values: NONE, UTF8, UTF16, UTF32
encoding_type = enums.EncodingType.UTF8

response = client.analyze_entities(document, encoding_type=encoding_type)
# Loop through entitites returned from the API
for entity in response.entities:
    print(u"Representative name for the entity: {}".format(entity.name))
    # Get entity type, e.g. PERSON, LOCATION, ADDRESS, NUMBER, et al
    print(u"Entity type: {}".format(enums.Entity.Type(entity.type).name))
    # Get the salience score associated with the entity in the [0, 1.0] range
    print(u"Salience score: {}".format(entity.salience))
    # Loop over the metadata associated with entity. For many known entities,
    # the metadata is a Wikipedia URL (wikipedia_url) and Knowledge Graph MID (mid).
    # Some entity types may have additional metadata, e.g. ADDRESS entities
    # may have metadata for the address street_name, postal_code, et al.
    for metadata_name, metadata_value in entity.metadata.items():
        print(u"{}: {}".format(metadata_name, metadata_value))

    # Loop over the mentions of this entity in the input document.
    # The API currently supports proper noun mentions.
    for mention in entity.mentions:
        print(u"Mention text: {}".format(mention.text.content))
        # Get the mention type, e.g. PROPER for proper noun
        print(
            u"Mention type: {}".format(enums.EntityMention.Type(mention.type).name)
        )

Making an analyze syntax request and processing the response

Tokens' part of speech tags, token.part_of_speech.tag, are now returned as enumerations, whose names can be recovered by importing google.cloud.language.enums.PartOfSpeech.Tag.

Tokens' text contents are now stored as token.text.content as opposed to token.text_content.

Previous versions of the client libraries:

document = language_client.document_from_text(text)

tokens = document.analyze_syntax().tokens

for token in tokens:
    print(u'{}: {}'.format(token.part_of_speech.tag, token.text_content))

Python Client Library v0.26.1:

# Available values: NONE, UTF8, UTF16, UTF32
encoding_type = enums.EncodingType.UTF8

response = client.analyze_syntax(document, encoding_type=encoding_type)
# Loop through tokens returned from the API
for token in response.tokens:
    # Get the text content of this token. Usually a word or punctuation.
    text = token.text
    print(u"Token text: {}".format(text.content))
    print(
        u"Location of this token in overall document: {}".format(text.begin_offset)
    )
    # Get the part of speech information for this token.
    # Parts of speech are as defined in:
    # http://www.lrec-conf.org/proceedings/lrec2012/pdf/274_Paper.pdf
    part_of_speech = token.part_of_speech
    # Get the tag, e.g. NOUN, ADJ for Adjective, et al.
    print(
        u"Part of Speech tag: {}".format(
            enums.PartOfSpeech.Tag(part_of_speech.tag).name
        )
    )