Analyzing Entities

Entity Analysis inspects the given text for known entities (proper nouns such as public figures, landmarks, etc.), and returns information about those entities. Entity analysis is performed with the analyzeEntities method. For information on which languages are supported by the Natural Language API, see Language Support.

This section demonstrates a few ways to detect entities in a document.

Analyzing Entities in a String

Here is an example of performing entity analysis on a text string sent directly to the Natural Language API:

Protocol

To analyze entities in a document, make a POST request to the documents:analyzeEntities REST method and provide the appropriate request body as shown in the following example.

The example uses the gcloud auth application-default print-access-token command to obtain an access token for a service account set up for the project using the Google Cloud Platform Cloud SDK. For instructions on installing the Cloud SDK, setting up a project with a service account see the Quickstart.

curl -X POST \
     -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
     -H "Content-Type: application/json; charset=utf-8" \
     --data "{
  'encodingType': 'UTF8',
  'document': {
    'type': 'PLAIN_TEXT',
    'content': 'President Obama is speaking at the White House.'
  }
}" "https://language.googleapis.com/v1/documents:analyzeEntities"

If you don't specify document.language, then the language will be automatically detected. For information on which languages are supported by the Natural Language API, see Language Support. See the Document reference documentation for more information on configuring the request body.

If the request is successful, the server returns a 200 OK HTTP status code and the response in JSON format:

{
  "entities": [
    {
      "name": "Obama",
      "type": "PERSON",
      "metadata": {
        "mid": "/m/02mjmr",
        "wikipedia_url": "http://en.wikipedia.org/wiki/Barack_Obama"
      },
      "salience": 0.9143443,
      "mentions": [
        {
          "text": {
            "content": "Obama",
            "beginOffset": 10
          },
          "type": "PROPER"
        },
        {
          "text": {
            "content": "President",
            "beginOffset": 0
          },
          "type": "COMMON"
        }
      ]
    },
    {
      "name": "White House",
      "type": "LOCATION",
      "metadata": {
        "mid": "/m/081sq",
        "wikipedia_url": "http://en.wikipedia.org/wiki/White_House"
      },
      "salience": 0.08565566,
      "mentions": [
        {
          "text": {
            "content": "White House",
            "beginOffset": 35
          },
          "type": "PROPER"
        }
      ]
    }
  ],
  "language": "en"
}

The entities array contains Entity objects representing the detected entities, which include information such as the entity name and type.

GCLOUD COMMAND

Refer to the analyze-entities command for complete details.

To perform entity analysis, use the gcloud command line tool and use the --content flag to identify the content to analyze:

gcloud ml language analyze-entities --content="President Obama is speaking at the White House."

If the request is successful, the server returns a response in JSON format:

{
  "entities": [
    {
      "name": "Obama",
      "type": "PERSON",
      "metadata": {
        "mid": "/m/02mjmr",
        "wikipedia_url": "http://en.wikipedia.org/wiki/Barack_Obama"
      },
      "salience": 0.9143443,
      "mentions": [
        {
          "text": {
            "content": "Obama",
            "beginOffset": 10
          },
          "type": "PROPER"
        },
        {
          "text": {
            "content": "President",
            "beginOffset": 0
          },
          "type": "COMMON"
        }
      ]
    },
    {
      "name": "White House",
      "type": "LOCATION",
      "metadata": {
        "mid": "/m/081sq",
        "wikipedia_url": "http://en.wikipedia.org/wiki/White_House"
      },
      "salience": 0.08565566,
      "mentions": [
        {
          "text": {
            "content": "White House",
            "beginOffset": 35
          },
          "type": "PROPER"
        }
      ]
    }
  ],
  "language": "en"
}

The entities array contains Entity objects representing the detected entities, which include information such as the entity name and type.

C#

private static void AnalyzeEntitiesFromText(string text)
{
    var client = LanguageServiceClient.Create();
    var response = client.AnalyzeEntities(new Document()
    {
        Content = text,
        Type = Document.Types.Type.PlainText
    });
    WriteEntities(response.Entities);
}

private static void WriteEntities(IEnumerable<Entity> entities)
{
    Console.WriteLine("Entities:");
    foreach (var entity in entities)
    {
        Console.WriteLine($"\tName: {entity.Name}");
        Console.WriteLine($"\tType: {entity.Type}");
        Console.WriteLine($"\tSalience: {entity.Salience}");
        Console.WriteLine("\tMentions:");
        foreach (var mention in entity.Mentions)
            Console.WriteLine($"\t\t{mention.Text.BeginOffset}: {mention.Text.Content}");
        Console.WriteLine("\tMetadata:");
        foreach (var keyval in entity.Metadata)
        {
            Console.WriteLine($"\t\t{keyval.Key}: {keyval.Value}");
        }
    }
}

Go

func analyzeEntities(ctx context.Context, client *language.Client, text string) (*languagepb.AnalyzeEntitiesResponse, error) {
	return client.AnalyzeEntities(ctx, &languagepb.AnalyzeEntitiesRequest{
		Document: &languagepb.Document{
			Source: &languagepb.Document_Content{
				Content: text,
			},
			Type: languagepb.Document_PLAIN_TEXT,
		},
		EncodingType: languagepb.EncodingType_UTF8,
	})
}

Java

// Instantiate the Language client com.google.cloud.language.v1.LanguageServiceClient
try (LanguageServiceClient language = LanguageServiceClient.create()) {
  Document doc = Document.newBuilder()
      .setContent(text)
      .setType(Type.PLAIN_TEXT)
      .build();
  AnalyzeEntitiesRequest request = AnalyzeEntitiesRequest.newBuilder()
      .setDocument(doc)
      .setEncodingType(EncodingType.UTF16)
      .build();

  AnalyzeEntitiesResponse response = language.analyzeEntities(request);

  // Print the response
  for (Entity entity : response.getEntitiesList()) {
    System.out.printf("Entity: %s", entity.getName());
    System.out.printf("Salience: %.3f\n", entity.getSalience());
    System.out.println("Metadata: ");
    for (Map.Entry<String, String> entry : entity.getMetadataMap().entrySet()) {
      System.out.printf("%s : %s", entry.getKey(), entry.getValue());
    }
    for (EntityMention mention : entity.getMentionsList()) {
      System.out.printf("Begin offset: %d\n", mention.getText().getBeginOffset());
      System.out.printf("Content: %s\n", mention.getText().getContent());
      System.out.printf("Type: %s\n\n", mention.getType());
    }
  }
}

Node.js

// Imports the Google Cloud client library
const language = require('@google-cloud/language');

// Creates a client
const client = new language.LanguageServiceClient();

/**
 * TODO(developer): Uncomment the following line to run this code.
 */
// const text = 'Your text to analyze, e.g. Hello, world!';

// Prepares a document, representing the provided text
const document = {
  content: text,
  type: 'PLAIN_TEXT',
};

// Detects entities in the document
client
  .analyzeEntities({document: document})
  .then(results => {
    const entities = results[0].entities;

    console.log('Entities:');
    entities.forEach(entity => {
      console.log(entity.name);
      console.log(` - Type: ${entity.type}, Salience: ${entity.salience}`);
      if (entity.metadata && entity.metadata.wikipedia_url) {
        console.log(` - Wikipedia URL: ${entity.metadata.wikipedia_url}$`);
      }
    });
  })
  .catch(err => {
    console.error('ERROR:', err);
  });

PHP

namespace Google\Cloud\Samples\Language;

use Google\Cloud\Language\LanguageClient;

/**
 * Find the entities in text.
 * ```
 * analyze_entities('Do you know the way to San Jose?');
 * ```
 *
 * @param string $text The text to analyze.
 * @param string $projectId (optional) Your Google Cloud Project ID
 *
 */
function analyze_entities($text, $projectId = null)
{
    // Create the Natural Language client
    $language = new LanguageClient([
        'projectId' => $projectId,
    ]);

    // Call the analyzeEntities function
    $annotation = $language->analyzeEntities($text);

    // Print out information about each entity
    $entities = $annotation->entities();
    foreach ($entities as $entity) {
        printf('Name: %s' . PHP_EOL, $entity['name']);
        printf('Type: %s' . PHP_EOL, $entity['type']);
        printf('Salience: %s' . PHP_EOL, $entity['salience']);
        if (array_key_exists('wikipedia_url', $entity['metadata'])) {
            printf('Wikipedia URL: %s' . PHP_EOL, $entity['metadata']['wikipedia_url']);
        }
        if (array_key_exists('mid', $entity['metadata'])) {
            printf('Knowledge Graph MID: %s' . PHP_EOL, $entity['metadata']['mid']);
        }
        printf(PHP_EOL);
    }
}

Python

def entities_text(text):
    """Detects entities in the text."""
    client = language.LanguageServiceClient()

    if isinstance(text, six.binary_type):
        text = text.decode('utf-8')

    # Instantiates a plain text document.
    document = types.Document(
        content=text,
        type=enums.Document.Type.PLAIN_TEXT)

    # Detects entities in the document. You can also analyze HTML with:
    #   document.type == enums.Document.Type.HTML
    entities = client.analyze_entities(document).entities

    # entity types from enums.Entity.Type
    entity_type = ('UNKNOWN', 'PERSON', 'LOCATION', 'ORGANIZATION',
                   'EVENT', 'WORK_OF_ART', 'CONSUMER_GOOD', 'OTHER')

    for entity in entities:
        print('=' * 20)
        print(u'{:<16}: {}'.format('name', entity.name))
        print(u'{:<16}: {}'.format('type', entity_type[entity.type]))
        print(u'{:<16}: {}'.format('metadata', entity.metadata))
        print(u'{:<16}: {}'.format('salience', entity.salience))
        print(u'{:<16}: {}'.format('wikipedia_url',
              entity.metadata.get('wikipedia_url', '-')))

Ruby

# text_content = "Text to extract entities from"

require "google/cloud/language"

language = Google::Cloud::Language.new

response = language.analyze_entities content: text_content, type: :PLAIN_TEXT

entities = response.entities

entities.each do |entity|
  puts "Entity #{entity.name} #{entity.type}"

  if entity.metadata["wikipedia_url"]
    puts "URL: #{entity.metadata['wikipedia_url']}"
  end
end

Analyzing Entities from Google Cloud Storage

For your convenience, the Natural Language API can perform entity analysis directly on a file located in Google Cloud Storage, without the need to send the contents of the file in the body of your request.

Here is an example of performing entity analysis on a file located in Cloud Storage.

Protocol

To analyze entities from a document stored in Google Cloud Storage, make a POST request to the documents:analyzeEntities REST method and provide the appropriate request body with the path to the document as shown in the following example.

curl -X POST \
     -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
     -H "Content-Type: application/json; charset=utf-8" \
     --data "{
  'document':{
    'type':'PLAIN_TEXT',
    'gcsContentUri':'gs://<bucket-name>/<object-name>'
  }
}" "https://language.googleapis.com/v1/documents:analyzeEntities"

If you don't specify document.language, then the language will be automatically detected. For information on which languages are supported by the Natural Language API, see Language Support. See the Document reference documentation for more information on configuring the request body.

If the request is successful, the server returns a 200 OK HTTP status code and the response in JSON format:

{
  "entities": [
    {
      "name": "Obama",
      "type": "PERSON",
      "metadata": {
        "mid": "/m/02mjmr",
        "wikipedia_url": "http://en.wikipedia.org/wiki/Barack_Obama"
      },
      "salience": 0.9143443,
      "mentions": [
        {
          "text": {
            "content": "Obama",
            "beginOffset": 10
          },
          "type": "PROPER"
        },
        {
          "text": {
            "content": "President",
            "beginOffset": 0
          },
          "type": "COMMON"
        }
      ]
    },
    {
      "name": "White House",
      "type": "LOCATION",
      "metadata": {
        "mid": "/m/081sq",
        "wikipedia_url": "http://en.wikipedia.org/wiki/White_House"
      },
      "salience": 0.08565566,
      "mentions": [
        {
          "text": {
            "content": "White House",
            "beginOffset": 35
          },
          "type": "PROPER"
        }
      ]
    }
  ],
  "language": "en"
}

The entities array contains Entity objects representing the detected entities, which include information such as the entity name and type.

GCLOUD COMMAND

Refer to the analyze-entities command for complete details.

To perform entity analysis on a file in Google Cloud Storage, use the gcloud command line tool and use the --content-file flag to identify the file path that contains the content to analyze:

gcloud ml language analyze-entities --content-file=gs://YOUR_BUCKET_NAME/YOUR_FILE_NAME

If the request is successful, the server returns a response in JSON format:

{
  "entities": [
    {
      "name": "Obama",
      "type": "PERSON",
      "metadata": {
        "mid": "/m/02mjmr",
        "wikipedia_url": "http://en.wikipedia.org/wiki/Barack_Obama"
      },
      "salience": 0.9143443,
      "mentions": [
        {
          "text": {
            "content": "Obama",
            "beginOffset": 10
          },
          "type": "PROPER"
        },
        {
          "text": {
            "content": "President",
            "beginOffset": 0
          },
          "type": "COMMON"
        }
      ]
    },
    {
      "name": "White House",
      "type": "LOCATION",
      "metadata": {
        "mid": "/m/081sq",
        "wikipedia_url": "http://en.wikipedia.org/wiki/White_House"
      },
      "salience": 0.08565566,
      "mentions": [
        {
          "text": {
            "content": "White House",
            "beginOffset": 35
          },
          "type": "PROPER"
        }
      ]
    }
  ],
  "language": "en"
}

The entities array contains Entity objects representing the detected entities, which include information such as the entity name and type.

C#

private static void AnalyzeEntitiesFromFile(string gcsUri)
{
    var client = LanguageServiceClient.Create();
    var response = client.AnalyzeEntities(new Document()
    {
        GcsContentUri = gcsUri,
        Type = Document.Types.Type.PlainText
    });
    WriteEntities(response.Entities);
}
private static void WriteEntities(IEnumerable<Entity> entities)
{
    Console.WriteLine("Entities:");
    foreach (var entity in entities)
    {
        Console.WriteLine($"\tName: {entity.Name}");
        Console.WriteLine($"\tType: {entity.Type}");
        Console.WriteLine($"\tSalience: {entity.Salience}");
        Console.WriteLine("\tMentions:");
        foreach (var mention in entity.Mentions)
            Console.WriteLine($"\t\t{mention.Text.BeginOffset}: {mention.Text.Content}");
        Console.WriteLine("\tMetadata:");
        foreach (var keyval in entity.Metadata)
        {
            Console.WriteLine($"\t\t{keyval.Key}: {keyval.Value}");
        }
    }
}

Go

func analyzeEntitiesFromGCS(ctx context.Context, gcsURI string) (*languagepb.AnalyzeEntitiesResponse, error) {
	return client.AnalyzeEntities(ctx, &languagepb.AnalyzeEntitiesRequest{
		Document: &languagepb.Document{
			Source: &languagepb.Document_GcsContentUri{
				GcsContentUri: gcsURI,
			},
			Type: languagepb.Document_PLAIN_TEXT,
		},
		EncodingType: languagepb.EncodingType_UTF8,
	})
}

Java

// Instantiate the Language client com.google.cloud.language.v1.LanguageServiceClient
try (LanguageServiceClient language = LanguageServiceClient.create()) {
  // set the GCS Content URI path to the file to be analyzed
  Document doc = Document.newBuilder()
      .setGcsContentUri(gcsUri)
      .setType(Type.PLAIN_TEXT)
      .build();
  AnalyzeEntitiesRequest request = AnalyzeEntitiesRequest.newBuilder()
      .setDocument(doc)
      .setEncodingType(EncodingType.UTF16)
      .build();

  AnalyzeEntitiesResponse response = language.analyzeEntities(request);

  // Print the response
  for (Entity entity : response.getEntitiesList()) {
    System.out.printf("Entity: %s", entity.getName());
    System.out.printf("Salience: %.3f\n", entity.getSalience());
    System.out.println("Metadata: ");
    for (Map.Entry<String, String> entry : entity.getMetadataMap().entrySet()) {
      System.out.printf("%s : %s", entry.getKey(), entry.getValue());
    }
    for (EntityMention mention : entity.getMentionsList()) {
      System.out.printf("Begin offset: %d\n", mention.getText().getBeginOffset());
      System.out.printf("Content: %s\n", mention.getText().getContent());
      System.out.printf("Type: %s\n\n", mention.getType());
    }
  }
}

Node.js

// Imports the Google Cloud client library
const language = require('@google-cloud/language');

// Creates a client
const client = new language.LanguageServiceClient();

/**
 * TODO(developer): Uncomment the following lines to run this code
 */
// const bucketName = 'Your bucket name, e.g. my-bucket';
// const fileName = 'Your file name, e.g. my-file.txt';

// Prepares a document, representing a text file in Cloud Storage
const document = {
  gcsContentUri: `gs://${bucketName}/${fileName}`,
  type: 'PLAIN_TEXT',
};

// Detects entities in the document
client
  .analyzeEntities({document: document})
  .then(results => {
    const entities = results[0].entities;

    console.log('Entities:');
    entities.forEach(entity => {
      console.log(entity.name);
      console.log(` - Type: ${entity.type}, Salience: ${entity.salience}`);
      if (entity.metadata && entity.metadata.wikipedia_url) {
        console.log(` - Wikipedia URL: ${entity.metadata.wikipedia_url}$`);
      }
    });
  })
  .catch(err => {
    console.error('ERROR:', err);
  });

PHP

namespace Google\Cloud\Samples\Language;

use Google\Cloud\Language\LanguageClient;
use Google\Cloud\Storage\StorageClient;

/**
 * Find the entities in text stored in a Cloud Storage bucket.
 * ```
 * analyze_entities_from_file('my-bucket', 'file_with_text.txt');
 * ```
 *
 * @param string $bucketName The Cloud Storage bucket.
 * @param string $objectName The Cloud Storage object with text.
 * @param string $projectId (optional) Your Google Cloud Project ID
 *
 */
function analyze_entities_from_file($bucketName, $objectName, $projectId = null)
{
    // Create the Cloud Storage object
    $storage = new StorageClient();
    $bucket = $storage->bucket($bucketName);
    $storageObject = $bucket->object($objectName);

    // Create the Natural Language client
    $language = new LanguageClient([
        'projectId' => $projectId,
    ]);

    // Call the analyzeEntities function
    $annotation = $language->analyzeEntities($storageObject);

    // Print out information about each entity
    $entities = $annotation->entities();
    foreach ($entities as $entity) {
        printf('Name: %s' . PHP_EOL, $entity['name']);
        printf('Type: %s' . PHP_EOL, $entity['type']);
        printf('Salience: %s' . PHP_EOL, $entity['salience']);
        if (array_key_exists('wikipedia_url', $entity['metadata'])) {
            printf('Wikipedia URL: %s' . PHP_EOL, $entity['metadata']['wikipedia_url']);
        }
        if (array_key_exists('mid', $entity['metadata'])) {
            printf('Knowledge Graph MID: %s' . PHP_EOL, $entity['metadata']['mid']);
        }
        printf(PHP_EOL);
    }
}

Python

def entities_file(gcs_uri):
    """Detects entities in the file located in Google Cloud Storage."""
    client = language.LanguageServiceClient()

    # Instantiates a plain text document.
    document = types.Document(
        gcs_content_uri=gcs_uri,
        type=enums.Document.Type.PLAIN_TEXT)

    # Detects sentiment in the document. You can also analyze HTML with:
    #   document.type == enums.Document.Type.HTML
    entities = client.analyze_entities(document).entities

    # entity types from enums.Entity.Type
    entity_type = ('UNKNOWN', 'PERSON', 'LOCATION', 'ORGANIZATION',
                   'EVENT', 'WORK_OF_ART', 'CONSUMER_GOOD', 'OTHER')

    for entity in entities:
        print('=' * 20)
        print(u'{:<16}: {}'.format('name', entity.name))
        print(u'{:<16}: {}'.format('type', entity_type[entity.type]))
        print(u'{:<16}: {}'.format('metadata', entity.metadata))
        print(u'{:<16}: {}'.format('salience', entity.salience))
        print(u'{:<16}: {}'.format('wikipedia_url',
              entity.metadata.get('wikipedia_url', '-')))

Ruby

# storage_path = "Path to file in Google Cloud Storage, eg. gs://bucket/file"

require "google/cloud/language"

language = Google::Cloud::Language.new
response = language.analyze_entities gcs_content_uri: storage_path, type: :PLAIN_TEXT

entities = response.entities

entities.each do |entity|
  puts "Entity #{entity.name} #{entity.type}"

  if entity.metadata["wikipedia_url"]
    puts "URL: #{entity.metadata['wikipedia_url']}"
  end
end

Oliko tästä sivusta apua? Kerro mielipiteesi

Palautteen aihe:

Tämä sivu
Cloud Natural Language API
Tarvitsetko apua? Siirry tukisivullemme.