Analyzing Syntax

While most Natural Language API methods analyze what a given text is about, the analyzeSyntax method inspects the structure of the language itself. Syntactic Analysis breaks up the given text into a series of sentences and tokens (generally, words) and provides linguistic information about those tokens. See Morphology & Dependency Trees for details about the linguistic analysis and Language Support for a list of the languages whose syntax the Natural Language API can analyze.

This section demonstrates a few ways to detect syntax in a document.

Analyzing Syntax in a String

Here is an example of performing syntactic analysis on a text string sent directly to the Natural Language API:

Protocol

To analyze syntax in a document, make a POST request to the documents:analyzeSyntax REST method and provide the appropriate request body as shown in the following example.

The example uses the gcloud auth application-default print-access-token command to obtain an access token for a service account set up for the project using the Google Cloud Platform Cloud SDK. For instructions on installing the Cloud SDK, setting up a project with a service account see the Quickstart.

curl -X POST \
     -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
     -H "Content-Type: application/json; charset=utf-8" \
     --data "{
  'encodingType': 'UTF8',
  'document': {
    'type': 'PLAIN_TEXT',
    'content': 'Google, headquartered in Mountain View, unveiled the new Android phone at the Consumer Electronic Show.  Sundar Pichai said in his keynote that users love their new Android phones.'
  }
}" "https://language.googleapis.com/v1/documents:analyzeSyntax"

If you don't specify document.language, then the language will be automatically detected. For information on which languages are supported by the Natural Language API, see Language Support. See the Document reference documentation for more information on configuring the request body.

If the request is successful, the server returns a 200 OK HTTP status code and the response in JSON format:

{
  "sentences": [
    {
      "text": {
        "content": "Google, headquartered in Mountain View, unveiled the new Android phone at the Consumer Electronic Show.",
        "beginOffset": 0
      }
    },
    {
      "text": {
        "content": "Sundar Pichai said in his keynote that users love their new Android phones.",
        "beginOffset": 105
      }
    }
  ],
  "tokens": [
    {
      "text": {
        "content": "Google",
        "beginOffset": 0
      },
      "partOfSpeech": {
        "tag": "NOUN",
        "aspect": "ASPECT_UNKNOWN",
        "case": "CASE_UNKNOWN",
        "form": "FORM_UNKNOWN",
        "gender": "GENDER_UNKNOWN",
        "mood": "MOOD_UNKNOWN",
        "number": "SINGULAR",
        "person": "PERSON_UNKNOWN",
        "proper": "PROPER",
        "reciprocity": "RECIPROCITY_UNKNOWN",
        "tense": "TENSE_UNKNOWN",
        "voice": "VOICE_UNKNOWN"
      },
      "dependencyEdge": {
        "headTokenIndex": 7,
        "label": "NSUBJ"
      },
      "lemma": "Google"
    },
    ...
    {
      "text": {
        "content": ".",
        "beginOffset": 179
      },
      "partOfSpeech": {
        "tag": "PUNCT",
        "aspect": "ASPECT_UNKNOWN",
        "case": "CASE_UNKNOWN",
        "form": "FORM_UNKNOWN",
        "gender": "GENDER_UNKNOWN",
        "mood": "MOOD_UNKNOWN",
        "number": "NUMBER_UNKNOWN",
        "person": "PERSON_UNKNOWN",
        "proper": "PROPER_UNKNOWN",
        "reciprocity": "RECIPROCITY_UNKNOWN",
        "tense": "TENSE_UNKNOWN",
        "voice": "VOICE_UNKNOWN"
      },
      "dependencyEdge": {
        "headTokenIndex": 20,
        "label": "P"
      },
      "lemma": "."
    }
  ],
  "language": "en"
}

The tokens array contains Token objects representing the detected sentence tokens, which include information such as a token's part of speech and its position in the sentence.

GCLOUD COMMAND

Refer to the analyze-syntax command for complete details.

To perform syntax analysis, use the gcloud command line tool and use the --content flag to identify the content to analyze:

gcloud ml language analyze-syntax --content="Google, headquartered in Mountain View, unveiled the new Android phone at the Consumer Electronic Show.  Sundar Pichai said in his keynote that users love their new Android phones."

If the request is successful, the server returns a response in JSON format:

{
  "sentences": [
    {
      "text": {
        "content": "Google, headquartered in Mountain View, unveiled the new Android phone at the Consumer Electronic Show.",
        "beginOffset": 0
      }
    },
    {
      "text": {
        "content": "Sundar Pichai said in his keynote that users love their new Android phones.",
        "beginOffset": 105
      }
    }
  ],
  "tokens": [
    {
      "text": {
        "content": "Google",
        "beginOffset": 0
      },
      "partOfSpeech": {
        "tag": "NOUN",
        "aspect": "ASPECT_UNKNOWN",
        "case": "CASE_UNKNOWN",
        "form": "FORM_UNKNOWN",
        "gender": "GENDER_UNKNOWN",
        "mood": "MOOD_UNKNOWN",
        "number": "SINGULAR",
        "person": "PERSON_UNKNOWN",
        "proper": "PROPER",
        "reciprocity": "RECIPROCITY_UNKNOWN",
        "tense": "TENSE_UNKNOWN",
        "voice": "VOICE_UNKNOWN"
      },
      "dependencyEdge": {
        "headTokenIndex": 7,
        "label": "NSUBJ"
      },
      "lemma": "Google"
    },
    ...
    {
      "text": {
        "content": ".",
        "beginOffset": 179
      },
      "partOfSpeech": {
        "tag": "PUNCT",
        "aspect": "ASPECT_UNKNOWN",
        "case": "CASE_UNKNOWN",
        "form": "FORM_UNKNOWN",
        "gender": "GENDER_UNKNOWN",
        "mood": "MOOD_UNKNOWN",
        "number": "NUMBER_UNKNOWN",
        "person": "PERSON_UNKNOWN",
        "proper": "PROPER_UNKNOWN",
        "reciprocity": "RECIPROCITY_UNKNOWN",
        "tense": "TENSE_UNKNOWN",
        "voice": "VOICE_UNKNOWN"
      },
      "dependencyEdge": {
        "headTokenIndex": 20,
        "label": "P"
      },
      "lemma": "."
    }
  ],
  "language": "en"
}

The tokens array contains Token objects representing the detected sentence tokens, which include information such as a token's part of speech and its position in the sentence.

C#

private static void AnalyzeSyntaxFromText(string text)
{
    var client = LanguageServiceClient.Create();
    var response = client.AnnotateText(new Document()
    {
        Content = text,
        Type = Document.Types.Type.PlainText
    },
    new Features() { ExtractSyntax = true });
    WriteSentences(response.Sentences, response.Tokens);
}

private static void WriteSentences(IEnumerable<Sentence> sentences,
    RepeatedField<Token> tokens)
{
    Console.WriteLine("Sentences:");
    foreach (var sentence in sentences)
    {
        Console.WriteLine($"\t{sentence.Text.BeginOffset}: {sentence.Text.Content}");
    }
    Console.WriteLine("Tokens:");
    foreach (var token in tokens)
    {
        Console.WriteLine($"\t{token.PartOfSpeech.Tag} "
            + $"{token.Text.Content}");
    }
}

Go

func analyzeSyntax(ctx context.Context, client *language.Client, text string) (*languagepb.AnnotateTextResponse, error) {
	return client.AnnotateText(ctx, &languagepb.AnnotateTextRequest{
		Document: &languagepb.Document{
			Source: &languagepb.Document_Content{
				Content: text,
			},
			Type: languagepb.Document_PLAIN_TEXT,
		},
		Features: &languagepb.AnnotateTextRequest_Features{
			ExtractSyntax: true,
		},
		EncodingType: languagepb.EncodingType_UTF8,
	})
}

Java

// Instantiate the Language client com.google.cloud.language.v1.LanguageServiceClient
try (LanguageServiceClient language = LanguageServiceClient.create()) {
  Document doc = Document.newBuilder()
      .setContent(text)
      .setType(Type.PLAIN_TEXT)
      .build();
  AnalyzeSyntaxRequest request = AnalyzeSyntaxRequest.newBuilder()
      .setDocument(doc)
      .setEncodingType(EncodingType.UTF16)
      .build();
  // analyze the syntax in the given text
  AnalyzeSyntaxResponse response = language.analyzeSyntax(request);
  // print the response
  for (Token token : response.getTokensList()) {
    System.out.printf("\tText: %s\n", token.getText().getContent());
    System.out.printf("\tBeginOffset: %d\n", token.getText().getBeginOffset());
    System.out.printf("Lemma: %s\n", token.getLemma());
    System.out.printf("PartOfSpeechTag: %s\n", token.getPartOfSpeech().getTag());
    System.out.printf("\tAspect: %s\n", token.getPartOfSpeech().getAspect());
    System.out.printf("\tCase: %s\n", token.getPartOfSpeech().getCase());
    System.out.printf("\tForm: %s\n", token.getPartOfSpeech().getForm());
    System.out.printf("\tGender: %s\n", token.getPartOfSpeech().getGender());
    System.out.printf("\tMood: %s\n", token.getPartOfSpeech().getMood());
    System.out.printf("\tNumber: %s\n", token.getPartOfSpeech().getNumber());
    System.out.printf("\tPerson: %s\n", token.getPartOfSpeech().getPerson());
    System.out.printf("\tProper: %s\n", token.getPartOfSpeech().getProper());
    System.out.printf("\tReciprocity: %s\n", token.getPartOfSpeech().getReciprocity());
    System.out.printf("\tTense: %s\n", token.getPartOfSpeech().getTense());
    System.out.printf("\tVoice: %s\n", token.getPartOfSpeech().getVoice());
    System.out.println("DependencyEdge");
    System.out.printf("\tHeadTokenIndex: %d\n", token.getDependencyEdge().getHeadTokenIndex());
    System.out.printf("\tLabel: %s\n\n", token.getDependencyEdge().getLabel());
  }
  return response.getTokensList();
}

Node.js

// Imports the Google Cloud client library
const language = require('@google-cloud/language');

// Creates a client
const client = new language.LanguageServiceClient();

/**
 * TODO(developer): Uncomment the following line to run this code.
 */
// const text = 'Your text to analyze, e.g. Hello, world!';

// Prepares a document, representing the provided text
const document = {
  content: text,
  type: 'PLAIN_TEXT',
};

// Detects syntax in the document
const [syntax] = await client.analyzeSyntax({document});

console.log('Tokens:');
syntax.tokens.forEach(part => {
  console.log(`${part.partOfSpeech.tag}: ${part.text.content}`);
  console.log(`Morphology:`, part.partOfSpeech);
});

PHP

use Google\Cloud\Language\V1\Document;
use Google\Cloud\Language\V1\Document\Type;
use Google\Cloud\Language\V1\LanguageServiceClient;
use Google\Cloud\Language\V1\PartOfSpeech\Tag;

/** Uncomment and populate these variables in your code */
// $text = 'The text to analyze.';

// Create the Natural Language client
$languageServiceClient = new LanguageServiceClient();

try {
    // Create a new Document, add text as content and set type to PLAIN_TEXT
    $document = (new Document())
        ->setContent($text)
        ->setType(Type::PLAIN_TEXT);

    // Call the analyzeEntities function
    $response = $languageServiceClient->analyzeSyntax($document, []);
    $tokens = $response->getTokens();
    // Print out information about each entity
    foreach ($tokens as $token) {
        printf('Token text: %s' . PHP_EOL, $token->getText()->getContent());
        printf('Token part of speech: %s' . PHP_EOL, Tag::name($token->getPartOfSpeech()->getTag()));
        print(PHP_EOL);
    }
} finally {
    $languageServiceClient->close();
}

Python

import six
from google.cloud import language
from google.cloud.language import enums
from google.cloud.language import types

text = 'President Kennedy spoke at the White House.'

client = language.LanguageServiceClient()

if isinstance(text, six.binary_type):
    text = text.decode('utf-8')

# Instantiates a plain text document.
document = types.Document(
    content=text,
    type=enums.Document.Type.PLAIN_TEXT)

# Detects syntax in the document. You can also analyze HTML with:
#   document.type == enums.Document.Type.HTML
tokens = client.analyze_syntax(document).tokens

for token in tokens:
    part_of_speech_tag = enums.PartOfSpeech.Tag(token.part_of_speech.tag)
    print(u'{}: {}'.format(part_of_speech_tag.name,
                           token.text.content))

Ruby

# text_content = "Text to analyze syntax of"

require "google/cloud/language"

language = Google::Cloud::Language.new
response = language.analyze_syntax content: text_content, type: :PLAIN_TEXT

sentences = response.sentences
tokens    = response.tokens

puts "Sentences: #{sentences.count}"
puts "Tokens: #{tokens.count}"

tokens.each do |token|
  puts "#{token.part_of_speech.tag} #{token.text.content}"
end

Analyzing Syntax from Google Cloud Storage

For your convenience, the Natural Language API can perform syntactic analysis directly on a file located in Google Cloud Storage, without the need to send the contents of the file in the body of your request.

Here is an example of performing syntactic analysis on a file located in Cloud Storage.

Protocol

To analyze syntax in a document stored in Google Cloud Storage, make a POST request to the documents:analyzeSyntax REST method and provide the appropriate request body with the path to the document as shown in the following example.

curl -X POST \
     -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
     -H "Content-Type: application/json; charset=utf-8" \
     --data "{
  'encodingType': 'UTF8',
  'document': {
    'type': 'PLAIN_TEXT',
    'gcsContentUri': 'gs://<bucket-name>/<object-name>'
  }
}" "https://language.googleapis.com/v1/documents:analyzeSyntax"

If you don't specify document.language, then the language will be automatically detected. For information on which languages are supported by the Natural Language API, see Language Support. See the Document reference documentation for more information on configuring the request body.

If the request is successful, the server returns a 200 OK HTTP status code and the response in JSON format:

{
  "sentences": [
    {
      "text": {
        "content": "Hello, world!",
        "beginOffset": 0
      }
    }
  ],
  "tokens": [
    {
      "text": {
        "content": "Hello",
        "beginOffset": 0
      },
      "partOfSpeech": {
        "tag": "X",
        // ...
      },
      "dependencyEdge": {
        "headTokenIndex": 2,
        "label": "DISCOURSE"
      },
      "lemma": "Hello"
    },
    {
      "text": {
        "content": ",",
        "beginOffset": 5
      },
      "partOfSpeech": {
        "tag": "PUNCT",
        // ...
      },
      "dependencyEdge": {
        "headTokenIndex": 2,
        "label": "P"
      },
      "lemma": ","
    },
    // ...
  ],
  "language": "en"
}

The tokens array contains Token objects representing the detected sentence tokens, which include information such as a token's part of speech and its position in the sentence.

GCLOUD COMMAND

Refer to the analyze-syntax command for complete details.

To perform syntax analysis on a file in Google Cloud Storage, use the gcloud command line tool and use the --content-file flag to identify the file path that contains the content to analyze:

gcloud ml language analyze-syntax --content-file=gs://YOUR_BUCKET_NAME/YOUR_FILE_NAME

If the request is successful, the server returns a response in JSON format:

{
  "sentences": [
    {
      "text": {
        "content": "Hello, world!",
        "beginOffset": 0
      }
    }
  ],
  "tokens": [
    {
      "text": {
        "content": "Hello",
        "beginOffset": 0
      },
      "partOfSpeech": {
        "tag": "X",
        // ...
      },
      "dependencyEdge": {
        "headTokenIndex": 2,
        "label": "DISCOURSE"
      },
      "lemma": "Hello"
    },
    {
      "text": {
        "content": ",",
        "beginOffset": 5
      },
      "partOfSpeech": {
        "tag": "PUNCT",
        // ...
      },
      "dependencyEdge": {
        "headTokenIndex": 2,
        "label": "P"
      },
      "lemma": ","
    },
    // ...
  ],
  "language": "en"
}

The tokens array contains Token objects representing the detected sentence tokens, which include information such as a token's part of speech and its position in the sentence.

C#

private static void AnalyzeSyntaxFromFile(string gcsUri)
{
    var client = LanguageServiceClient.Create();
    var response = client.AnnotateText(new Document()
    {
        GcsContentUri = gcsUri,
        Type = Document.Types.Type.PlainText
    },
    new Features() { ExtractSyntax = true });
    WriteSentences(response.Sentences, response.Tokens);
}
private static void WriteSentences(IEnumerable<Sentence> sentences,
    RepeatedField<Token> tokens)
{
    Console.WriteLine("Sentences:");
    foreach (var sentence in sentences)
    {
        Console.WriteLine($"\t{sentence.Text.BeginOffset}: {sentence.Text.Content}");
    }
    Console.WriteLine("Tokens:");
    foreach (var token in tokens)
    {
        Console.WriteLine($"\t{token.PartOfSpeech.Tag} "
            + $"{token.Text.Content}");
    }
}

Go

func analyzeSyntaxFromGCS(ctx context.Context, gcsURI string) (*languagepb.AnnotateTextResponse, error) {
	return client.AnnotateText(ctx, &languagepb.AnnotateTextRequest{
		Document: &languagepb.Document{
			Source: &languagepb.Document_GcsContentUri{
				GcsContentUri: gcsURI,
			},
			Type: languagepb.Document_PLAIN_TEXT,
		},
		Features: &languagepb.AnnotateTextRequest_Features{
			ExtractSyntax: true,
		},
		EncodingType: languagepb.EncodingType_UTF8,
	})
}

Java

// Instantiate the Language client com.google.cloud.language.v1.LanguageServiceClient
try (LanguageServiceClient language = LanguageServiceClient.create()) {
  Document doc = Document.newBuilder()
      .setGcsContentUri(gcsUri)
      .setType(Type.PLAIN_TEXT)
      .build();
  AnalyzeSyntaxRequest request = AnalyzeSyntaxRequest.newBuilder()
      .setDocument(doc)
      .setEncodingType(EncodingType.UTF16)
      .build();
  // analyze the syntax in the given text
  AnalyzeSyntaxResponse response = language.analyzeSyntax(request);
  // print the response
  for (Token token : response.getTokensList()) {
    System.out.printf("\tText: %s\n", token.getText().getContent());
    System.out.printf("\tBeginOffset: %d\n", token.getText().getBeginOffset());
    System.out.printf("Lemma: %s\n", token.getLemma());
    System.out.printf("PartOfSpeechTag: %s\n", token.getPartOfSpeech().getTag());
    System.out.printf("\tAspect: %s\n", token.getPartOfSpeech().getAspect());
    System.out.printf("\tCase: %s\n", token.getPartOfSpeech().getCase());
    System.out.printf("\tForm: %s\n", token.getPartOfSpeech().getForm());
    System.out.printf("\tGender: %s\n", token.getPartOfSpeech().getGender());
    System.out.printf("\tMood: %s\n", token.getPartOfSpeech().getMood());
    System.out.printf("\tNumber: %s\n", token.getPartOfSpeech().getNumber());
    System.out.printf("\tPerson: %s\n", token.getPartOfSpeech().getPerson());
    System.out.printf("\tProper: %s\n", token.getPartOfSpeech().getProper());
    System.out.printf("\tReciprocity: %s\n", token.getPartOfSpeech().getReciprocity());
    System.out.printf("\tTense: %s\n", token.getPartOfSpeech().getTense());
    System.out.printf("\tVoice: %s\n", token.getPartOfSpeech().getVoice());
    System.out.println("DependencyEdge");
    System.out.printf("\tHeadTokenIndex: %d\n", token.getDependencyEdge().getHeadTokenIndex());
    System.out.printf("\tLabel: %s\n\n", token.getDependencyEdge().getLabel());
  }

  return response.getTokensList();
}

Node.js

// Imports the Google Cloud client library
const language = require('@google-cloud/language');

// Creates a client
const client = new language.LanguageServiceClient();

/**
 * TODO(developer): Uncomment the following lines to run this code
 */
// const bucketName = 'Your bucket name, e.g. my-bucket';
// const fileName = 'Your file name, e.g. my-file.txt';

// Prepares a document, representing a text file in Cloud Storage
const document = {
  gcsContentUri: `gs://${bucketName}/${fileName}`,
  type: 'PLAIN_TEXT',
};

// Detects syntax in the document
const [syntax] = await client.analyzeSyntax({document});

console.log('Parts of speech:');
syntax.tokens.forEach(part => {
  console.log(`${part.partOfSpeech.tag}: ${part.text.content}`);
  console.log(`Morphology:`, part.partOfSpeech);
});

PHP

use Google\Cloud\Language\V1\Document;
use Google\Cloud\Language\V1\Document\Type;
use Google\Cloud\Language\V1\LanguageServiceClient;
use Google\Cloud\Language\V1\PartOfSpeech\Tag;

/** Uncomment and populate these variables in your code */
// $uri = 'The cloud storage object to analyze (gs://your-bucket-name/your-object-name)';

// Create the Natural Language client
$languageServiceClient = new LanguageServiceClient();

try {
    // Create a new Document, pass GCS URI and set type to PLAIN_TEXT
    $document = (new Document())
        ->setGcsContentUri($uri)
        ->setType(Type::PLAIN_TEXT);

    // Call the analyzeEntities function
    $response = $languageServiceClient->analyzeSyntax($document, []);
    $tokens = $response->getTokens();
    // Print out information about each entity
    foreach ($tokens as $token) {
        printf('Token text: %s' . PHP_EOL, $token->getText()->getContent());
        printf('Token part of speech: %s' . PHP_EOL, Tag::name($token->getPartOfSpeech()->getTag()));
        print(PHP_EOL);
    }
} finally {
    $languageServiceClient->close();
}

Python

from google.cloud import language
from google.cloud.language import enums
from google.cloud.language import types

gcs_uri = 'gs://cloud-samples-data/language/president.txt'

client = language.LanguageServiceClient()

# Instantiates a plain text document.
document = types.Document(
    gcs_content_uri=gcs_uri,
    type=enums.Document.Type.PLAIN_TEXT)

# Detects syntax in the document. You can also analyze HTML with:
#   document.type == enums.Document.Type.HTML
tokens = client.analyze_syntax(document).tokens

for token in tokens:
    part_of_speech_tag = enums.PartOfSpeech.Tag(token.part_of_speech.tag)
    print(u'{}: {}'.format(part_of_speech_tag.name,
                           token.text.content))

Ruby

# storage_path = "Path to file in Google Cloud Storage, eg. gs://bucket/file"

require "google/cloud/language"

language = Google::Cloud::Language.new
response = language.analyze_syntax gcs_content_uri: storage_path, type: :PLAIN_TEXT

sentences = response.sentences
tokens    = response.tokens

puts "Sentences: #{sentences.count}"
puts "Tokens: #{tokens.count}"

tokens.each do |token|
  puts "#{token.part_of_speech.tag} #{token.text.content}"
end

Var denne siden nyttig? Si fra hva du synes:

Send tilbakemelding om ...

Cloud Natural Language API
Trenger du hjelp? Gå til brukerstøttesiden vår.