항목 분석

항목 분석은 주어진 텍스트에서 알려진 개체(유명 인사, 명소 등의 고유 명사)를 검사하고 항목의 정보를 반환합니다. 항목 분석은 analyzeEntities 메소드를 통해 수행됩니다. Natural Language API가 지원하는 언어에 대해서는 언어 지원을 참조하세요.

이 섹션에서는 문서에서 항목을 감지하는 몇 가지 방법을 보여줍니다.

문자열 내 항목 분석

다음은 Natural Language API에 직접 전송된 텍스트 문자열에 대한 항목 분석을 수행하는 예입니다.

프로토콜

문서 항목을 분석하려면 documents:analyzeEntities REST 메소드에 POST 요청을 하고 다음 예와 같이 적절한 요청 본문을 제공해야 합니다.

이 예에서는 gcloud auth application-default print-access-token 명령어를 사용하여 Google Cloud Platform Cloud SDK를 사용하는 프로젝트용으로 설정된 서비스 계정에 대한 액세스 토큰을 얻습니다. Cloud SDK 설치 및 서비스 계정을 통한 프로젝트 설정 지침을 보려면 빠른 시작을 참조하세요.

curl -X POST \
     -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
     -H "Content-Type: application/json; charset=utf-8" \
     --data "{
  'encodingType': 'UTF8',
  'document': {
    'type': 'PLAIN_TEXT',
    'content': 'President Obama is speaking at the White House.'
  }
}" "https://language.googleapis.com/v1/documents:analyzeEntities"

document.language를 지정하지 않으면 언어가 자동으로 감지됩니다. Natural Language API에서 지원되는 언어에 대한 자세한 내용은 언어 지원을 참조하세요. 요청 본문 구성에 대한 자세한 정보는 Document 참조 문서를 확인하세요.

요청이 성공하면 서버는 200 OK HTTP 상태 코드와 응답을 JSON 형식으로 반환합니다.

{
  "entities": [
    {
      "name": "Obama",
      "type": "PERSON",
      "metadata": {
        "mid": "/m/02mjmr",
        "wikipedia_url": "http://en.wikipedia.org/wiki/Barack_Obama"
      },
      "salience": 0.9143443,
      "mentions": [
        {
          "text": {
            "content": "Obama",
            "beginOffset": 10
          },
          "type": "PROPER"
        },
        {
          "text": {
            "content": "President",
            "beginOffset": 0
          },
          "type": "COMMON"
        }
      ]
    },
    {
      "name": "White House",
      "type": "LOCATION",
      "metadata": {
        "mid": "/m/081sq",
        "wikipedia_url": "http://en.wikipedia.org/wiki/White_House"
      },
      "salience": 0.08565566,
      "mentions": [
        {
          "text": {
            "content": "White House",
            "beginOffset": 35
          },
          "type": "PROPER"
        }
      ]
    }
  ],
  "language": "en"
}

entities 배열에는 항목 이름 및 유형과 같은 정보를 담고 있으며 감지된 항목을 나타내는 Entity 객체가 포함되어 있습니다.

gcloud 명령어

전체적인 세부정보는 analyze-entities 명령어를 참조하세요.

항목 분석을 수행하려면 gcloud 명령줄 도구와 --content 플래그를 사용하여 분석할 콘텐츠를 식별합니다.

gcloud ml language analyze-entities --content="President Obama is speaking at the White House."

요청이 성공하면 서버는 JSON 형식의 응답을 반환합니다.

{
  "entities": [
    {
      "name": "Obama",
      "type": "PERSON",
      "metadata": {
        "mid": "/m/02mjmr",
        "wikipedia_url": "http://en.wikipedia.org/wiki/Barack_Obama"
      },
      "salience": 0.9143443,
      "mentions": [
        {
          "text": {
            "content": "Obama",
            "beginOffset": 10
          },
          "type": "PROPER"
        },
        {
          "text": {
            "content": "President",
            "beginOffset": 0
          },
          "type": "COMMON"
        }
      ]
    },
    {
      "name": "White House",
      "type": "LOCATION",
      "metadata": {
        "mid": "/m/081sq",
        "wikipedia_url": "http://en.wikipedia.org/wiki/White_House"
      },
      "salience": 0.08565566,
      "mentions": [
        {
          "text": {
            "content": "White House",
            "beginOffset": 35
          },
          "type": "PROPER"
        }
      ]
    }
  ],
  "language": "en"
}

entities 배열에는 항목 이름 및 유형과 같은 정보를 담고 있으며 감지된 항목을 나타내는 Entity 객체가 포함되어 있습니다.

C#

private static void AnalyzeEntitiesFromText(string text)
{
    var client = LanguageServiceClient.Create();
    var response = client.AnalyzeEntities(new Document()
    {
        Content = text,
        Type = Document.Types.Type.PlainText
    });
    WriteEntities(response.Entities);
}

private static void WriteEntities(IEnumerable<Entity> entities)
{
    Console.WriteLine("Entities:");
    foreach (var entity in entities)
    {
        Console.WriteLine($"\tName: {entity.Name}");
        Console.WriteLine($"\tType: {entity.Type}");
        Console.WriteLine($"\tSalience: {entity.Salience}");
        Console.WriteLine("\tMentions:");
        foreach (var mention in entity.Mentions)
            Console.WriteLine($"\t\t{mention.Text.BeginOffset}: {mention.Text.Content}");
        Console.WriteLine("\tMetadata:");
        foreach (var keyval in entity.Metadata)
        {
            Console.WriteLine($"\t\t{keyval.Key}: {keyval.Value}");
        }
    }
}

Go

func analyzeEntities(ctx context.Context, client *language.Client, text string) (*languagepb.AnalyzeEntitiesResponse, error) {
	return client.AnalyzeEntities(ctx, &languagepb.AnalyzeEntitiesRequest{
		Document: &languagepb.Document{
			Source: &languagepb.Document_Content{
				Content: text,
			},
			Type: languagepb.Document_PLAIN_TEXT,
		},
		EncodingType: languagepb.EncodingType_UTF8,
	})
}

자바

// Instantiate the Language client com.google.cloud.language.v1.LanguageServiceClient
try (LanguageServiceClient language = LanguageServiceClient.create()) {
  Document doc = Document.newBuilder()
      .setContent(text)
      .setType(Type.PLAIN_TEXT)
      .build();
  AnalyzeEntitiesRequest request = AnalyzeEntitiesRequest.newBuilder()
      .setDocument(doc)
      .setEncodingType(EncodingType.UTF16)
      .build();

  AnalyzeEntitiesResponse response = language.analyzeEntities(request);

  // Print the response
  for (Entity entity : response.getEntitiesList()) {
    System.out.printf("Entity: %s", entity.getName());
    System.out.printf("Salience: %.3f\n", entity.getSalience());
    System.out.println("Metadata: ");
    for (Map.Entry<String, String> entry : entity.getMetadataMap().entrySet()) {
      System.out.printf("%s : %s", entry.getKey(), entry.getValue());
    }
    for (EntityMention mention : entity.getMentionsList()) {
      System.out.printf("Begin offset: %d\n", mention.getText().getBeginOffset());
      System.out.printf("Content: %s\n", mention.getText().getContent());
      System.out.printf("Type: %s\n\n", mention.getType());
    }
  }
}

Node.js

// Imports the Google Cloud client library
const language = require('@google-cloud/language');

// Creates a client
const client = new language.LanguageServiceClient();

/**
 * TODO(developer): Uncomment the following line to run this code.
 */
// const text = 'Your text to analyze, e.g. Hello, world!';

// Prepares a document, representing the provided text
const document = {
  content: text,
  type: 'PLAIN_TEXT',
};

// Detects entities in the document
const [result] = await client.analyzeEntities({document});

const entities = result.entities;

console.log('Entities:');
entities.forEach(entity => {
  console.log(entity.name);
  console.log(` - Type: ${entity.type}, Salience: ${entity.salience}`);
  if (entity.metadata && entity.metadata.wikipedia_url) {
    console.log(` - Wikipedia URL: ${entity.metadata.wikipedia_url}$`);
  }
});

PHP

namespace Google\Cloud\Samples\Language;

use Google\Cloud\Language\V1beta2\Document;
use Google\Cloud\Language\V1beta2\LanguageServiceClient;

/**
 * Find the entities in text.
 * ```
 * analyze_entities('Do you know the way to San Jose?');
 * ```
 *
 * @param string $text The text to analyze.
 * @param string $projectId (optional) Your Google Cloud Project ID
 *
 */
function analyze_entities($text, $projectId = null)
{
    // Create the Natural Language client
    $languageServiceClient = new LanguageServiceClient(['projectId' => $projectId]);
    try {
        $entity_types = [
            0 => 'UNKNOWN',
            1 => 'PERSON',
            2 => 'LOCATION',
            3 => 'ORGANIZATION',
            4 => 'EVENT',
            5 => 'WORK_OF_ART',
            6 => 'CONSUMER_GOOD',
            7 => 'OTHER',
        ];
        $document = new Document();
        // Add text as content and set document type to PLAIN_TEXT
        $document->setContent($text)->setType(1);
        // Call the analyzeEntities function
        $response = $languageServiceClient->analyzeEntities($document, []);
        $entities = $response->getEntities();
        // Print out information about each entity
        foreach ($entities as $entity) {
            printf('Name: %s' . PHP_EOL, $entity->getName());
            printf('Type: %s' . PHP_EOL, $entity_types[$entity->getType()]);
            printf('Salience: %s' . PHP_EOL, $entity->getSalience());
            if ($entity->getMetadata()->offsetExists('wikipedia_url')) {
                printf('Wikipedia URL: %s' . PHP_EOL, $entity->getMetadata()->offsetGet('wikipedia_url'));
            }
            if ($entity->getMetadata()->offsetExists('mid')) {
                printf('Knowledge Graph MID: %s' . PHP_EOL, $entity->getMetadata()->offsetGet('mid'));
            }
            printf(PHP_EOL);
        }
    } finally {
        $languageServiceClient->close();
    }
}

Python

import six
from google.cloud import language
from google.cloud.language import enums
from google.cloud.language import types

text = 'President Kennedy spoke at the White House.'

client = language.LanguageServiceClient()

if isinstance(text, six.binary_type):
    text = text.decode('utf-8')

# Instantiates a plain text document.
document = types.Document(
    content=text,
    type=enums.Document.Type.PLAIN_TEXT)

# Detects entities in the document. You can also analyze HTML with:
#   document.type == enums.Document.Type.HTML
entities = client.analyze_entities(document).entities

for entity in entities:
    entity_type = enums.Entity.Type(entity.type)
    print('=' * 20)
    print(u'{:<16}: {}'.format('name', entity.name))
    print(u'{:<16}: {}'.format('type', entity_type.name))
    print(u'{:<16}: {}'.format('salience', entity.salience))
    print(u'{:<16}: {}'.format('wikipedia_url',
          entity.metadata.get('wikipedia_url', '-')))
    print(u'{:<16}: {}'.format('mid', entity.metadata.get('mid', '-')))

Ruby

# text_content = "Text to extract entities from"

require "google/cloud/language"

language = Google::Cloud::Language.new

response = language.analyze_entities content: text_content, type: :PLAIN_TEXT

entities = response.entities

entities.each do |entity|
  puts "Entity #{entity.name} #{entity.type}"

  if entity.metadata["wikipedia_url"]
    puts "URL: #{entity.metadata['wikipedia_url']}"
  end
end

Google Cloud Storage에서 항목 분석하기

편의를 위해 Natural Language API는 요청 본문의 파일 내용을 보낼 필요 없이 Google Cloud Storage에 있는 파일에서 직접 항목 분석을 수행할 수 있습니다.

다음은 Cloud Storage에 있는 파일에서 항목 분석을 수행하는 예입니다.

프로토콜

Google Cloud Storage에 저장된 문서에서 항목을 분석하려면 documents:analyzeEntities REST 메소드에 POST 요청을 하고 다음 예와 같이 적절한 요청 본문 및 문서 경로를 제공해야 합니다.

curl -X POST \
     -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
     -H "Content-Type: application/json; charset=utf-8" \
     --data "{
  'document':{
    'type':'PLAIN_TEXT',
    'gcsContentUri':'gs://<bucket-name>/<object-name>'
  }
}" "https://language.googleapis.com/v1/documents:analyzeEntities"

document.language를 지정하지 않으면 언어가 자동으로 감지됩니다. Natural Language API에서 지원되는 언어에 대한 자세한 내용은 언어 지원을 참조하세요. 요청 본문 구성에 대한 자세한 정보는 Document 참조 문서를 확인하세요.

요청이 성공하면 서버는 200 OK HTTP 상태 코드와 응답을 JSON 형식으로 반환합니다.

{
  "entities": [
    {
      "name": "Obama",
      "type": "PERSON",
      "metadata": {
        "mid": "/m/02mjmr",
        "wikipedia_url": "http://en.wikipedia.org/wiki/Barack_Obama"
      },
      "salience": 0.9143443,
      "mentions": [
        {
          "text": {
            "content": "Obama",
            "beginOffset": 10
          },
          "type": "PROPER"
        },
        {
          "text": {
            "content": "President",
            "beginOffset": 0
          },
          "type": "COMMON"
        }
      ]
    },
    {
      "name": "White House",
      "type": "LOCATION",
      "metadata": {
        "mid": "/m/081sq",
        "wikipedia_url": "http://en.wikipedia.org/wiki/White_House"
      },
      "salience": 0.08565566,
      "mentions": [
        {
          "text": {
            "content": "White House",
            "beginOffset": 35
          },
          "type": "PROPER"
        }
      ]
    }
  ],
  "language": "en"
}

entities 배열에는 항목 이름 및 유형과 같은 정보를 담고 있으며 감지된 항목을 나타내는 Entity 객체가 포함되어 있습니다.

gcloud 명령어

전체적인 세부정보는 analyze-entities 명령어를 참조하세요.

Google Cloud Storage의 파일에 대한 항목 분석을 수행하려면 gcloud 명령줄 도구와 --content-file 플래그를 사용하여 분석할 콘텐츠가 포함된 파일 경로를 식별합니다.

gcloud ml language analyze-entities --content-file=gs://YOUR_BUCKET_NAME/YOUR_FILE_NAME

요청이 성공하면 서버는 JSON 형식의 응답을 반환합니다.

{
  "entities": [
    {
      "name": "Obama",
      "type": "PERSON",
      "metadata": {
        "mid": "/m/02mjmr",
        "wikipedia_url": "http://en.wikipedia.org/wiki/Barack_Obama"
      },
      "salience": 0.9143443,
      "mentions": [
        {
          "text": {
            "content": "Obama",
            "beginOffset": 10
          },
          "type": "PROPER"
        },
        {
          "text": {
            "content": "President",
            "beginOffset": 0
          },
          "type": "COMMON"
        }
      ]
    },
    {
      "name": "White House",
      "type": "LOCATION",
      "metadata": {
        "mid": "/m/081sq",
        "wikipedia_url": "http://en.wikipedia.org/wiki/White_House"
      },
      "salience": 0.08565566,
      "mentions": [
        {
          "text": {
            "content": "White House",
            "beginOffset": 35
          },
          "type": "PROPER"
        }
      ]
    }
  ],
  "language": "en"
}

entities 배열에는 항목 이름 및 유형과 같은 정보를 담고 있으며 감지된 항목을 나타내는 Entity 객체가 포함되어 있습니다.

C#

private static void AnalyzeEntitiesFromFile(string gcsUri)
{
    var client = LanguageServiceClient.Create();
    var response = client.AnalyzeEntities(new Document()
    {
        GcsContentUri = gcsUri,
        Type = Document.Types.Type.PlainText
    });
    WriteEntities(response.Entities);
}
private static void WriteEntities(IEnumerable<Entity> entities)
{
    Console.WriteLine("Entities:");
    foreach (var entity in entities)
    {
        Console.WriteLine($"\tName: {entity.Name}");
        Console.WriteLine($"\tType: {entity.Type}");
        Console.WriteLine($"\tSalience: {entity.Salience}");
        Console.WriteLine("\tMentions:");
        foreach (var mention in entity.Mentions)
            Console.WriteLine($"\t\t{mention.Text.BeginOffset}: {mention.Text.Content}");
        Console.WriteLine("\tMetadata:");
        foreach (var keyval in entity.Metadata)
        {
            Console.WriteLine($"\t\t{keyval.Key}: {keyval.Value}");
        }
    }
}

Go

func analyzeEntitiesFromGCS(ctx context.Context, gcsURI string) (*languagepb.AnalyzeEntitiesResponse, error) {
	return client.AnalyzeEntities(ctx, &languagepb.AnalyzeEntitiesRequest{
		Document: &languagepb.Document{
			Source: &languagepb.Document_GcsContentUri{
				GcsContentUri: gcsURI,
			},
			Type: languagepb.Document_PLAIN_TEXT,
		},
		EncodingType: languagepb.EncodingType_UTF8,
	})
}

자바

// Instantiate the Language client com.google.cloud.language.v1.LanguageServiceClient
try (LanguageServiceClient language = LanguageServiceClient.create()) {
  // set the GCS Content URI path to the file to be analyzed
  Document doc = Document.newBuilder()
      .setGcsContentUri(gcsUri)
      .setType(Type.PLAIN_TEXT)
      .build();
  AnalyzeEntitiesRequest request = AnalyzeEntitiesRequest.newBuilder()
      .setDocument(doc)
      .setEncodingType(EncodingType.UTF16)
      .build();

  AnalyzeEntitiesResponse response = language.analyzeEntities(request);

  // Print the response
  for (Entity entity : response.getEntitiesList()) {
    System.out.printf("Entity: %s", entity.getName());
    System.out.printf("Salience: %.3f\n", entity.getSalience());
    System.out.println("Metadata: ");
    for (Map.Entry<String, String> entry : entity.getMetadataMap().entrySet()) {
      System.out.printf("%s : %s", entry.getKey(), entry.getValue());
    }
    for (EntityMention mention : entity.getMentionsList()) {
      System.out.printf("Begin offset: %d\n", mention.getText().getBeginOffset());
      System.out.printf("Content: %s\n", mention.getText().getContent());
      System.out.printf("Type: %s\n\n", mention.getType());
    }
  }
}

Node.js

// Imports the Google Cloud client library
const language = require('@google-cloud/language');

// Creates a client
const client = new language.LanguageServiceClient();

/**
 * TODO(developer): Uncomment the following lines to run this code
 */
// const bucketName = 'Your bucket name, e.g. my-bucket';
// const fileName = 'Your file name, e.g. my-file.txt';

// Prepares a document, representing a text file in Cloud Storage
const document = {
  gcsContentUri: `gs://${bucketName}/${fileName}`,
  type: 'PLAIN_TEXT',
};

// Detects entities in the document
const [result] = await client.analyzeEntities({document});
const entities = result.entities;

console.log('Entities:');
entities.forEach(entity => {
  console.log(entity.name);
  console.log(` - Type: ${entity.type}, Salience: ${entity.salience}`);
  if (entity.metadata && entity.metadata.wikipedia_url) {
    console.log(` - Wikipedia URL: ${entity.metadata.wikipedia_url}$`);
  }
});

PHP

namespace Google\Cloud\Samples\Language;

use Google\Cloud\Language\V1beta2\Document;
use Google\Cloud\Language\V1beta2\LanguageServiceClient;

/**
 * Find the entities in text stored in a Cloud Storage bucket.
 * ```
 * analyze_entities_from_file('gs://my-bucket/text.txt');
 * ```
 *
 * @param string $gcsUri The Cloud Storage path with text.
 * @param string $projectId (optional) Your Google Cloud Project ID
 *
 */
function analyze_entities_from_file($gcsUri, $projectId = null)
{
    // Create the Natural Language client
    $languageServiceClient = new LanguageServiceClient(['projectId' => $projectId]);
    try {
        $entity_types = [
            0 => 'UNKNOWN',
            1 => 'PERSON',
            2 => 'LOCATION',
            3 => 'ORGANIZATION',
            4 => 'EVENT',
            5 => 'WORK_OF_ART',
            6 => 'CONSUMER_GOOD',
            7 => 'OTHER',
        ];
        $document = new Document();
        // Pass GCS URI and set document type to PLAIN_TEXT
        $document->setGcsContentUri($gcsUri)->setType(1);
        // Call the analyzeEntities function
        $response = $languageServiceClient->analyzeEntities($document, []);
        $entities = $response->getEntities();
        // Print out information about each entity
        foreach ($entities as $entity) {
            printf('Name: %s' . PHP_EOL, $entity->getName());
            printf('Type: %s' . PHP_EOL, $entity_types[$entity->getType()]);
            printf('Salience: %s' . PHP_EOL, $entity->getSalience());
            if ($entity->getMetadata()->offsetExists('wikipedia_url')) {
                printf('Wikipedia URL: %s' . PHP_EOL, $entity->getMetadata()->offsetGet('wikipedia_url'));
            }
            if ($entity->getMetadata()->offsetExists('mid')) {
                printf('Knowledge Graph MID: %s' . PHP_EOL, $entity->getMetadata()->offsetGet('mid'));
            }
            printf(PHP_EOL);
        }
    } finally {
        $languageServiceClient->close();
    }
}

Python

from google.cloud import language
from google.cloud.language import enums
from google.cloud.language import types

gcs_uri = 'gs://cloud-samples-data/language/president.txt'

client = language.LanguageServiceClient()

# Instantiates a plain text document.
document = types.Document(
    gcs_content_uri=gcs_uri,
    type=enums.Document.Type.PLAIN_TEXT)

# Detects sentiment in the document. You can also analyze HTML with:
#   document.type == enums.Document.Type.HTML
entities = client.analyze_entities(document).entities

for entity in entities:
    entity_type = enums.Entity.Type(entity.type)
    print('=' * 20)
    print(u'{:<16}: {}'.format('name', entity.name))
    print(u'{:<16}: {}'.format('type', entity_type.name))
    print(u'{:<16}: {}'.format('salience', entity.salience))
    print(u'{:<16}: {}'.format('wikipedia_url',
          entity.metadata.get('wikipedia_url', '-')))
    print(u'{:<16}: {}'.format('mid', entity.metadata.get('mid', '-')))

Ruby

# storage_path = "Path to file in Google Cloud Storage, eg. gs://bucket/file"

require "google/cloud/language"

language = Google::Cloud::Language.new
response = language.analyze_entities gcs_content_uri: storage_path, type: :PLAIN_TEXT

entities = response.entities

entities.each do |entity|
  puts "Entity #{entity.name} #{entity.type}"

  if entity.metadata["wikipedia_url"]
    puts "URL: #{entity.metadata['wikipedia_url']}"
  end
end

이 페이지가 도움이 되었나요? 평가를 부탁드립니다.

다음에 대한 의견 보내기...

Cloud Natural Language API
도움이 필요하시나요? 지원 페이지를 방문하세요.