Analyzing Syntax

Syntactic Analysis extracts linguistic information, breaking up the given text into a series of sentences and tokens (generally, word boundaries), providing further analysis on those tokens. Syntactic Analysis is performed with the analyzeSyntax method. For information on which languages are supported by the Natural Language API, see Language Support.

This section demonstrates a few ways to detect syntax in a document.

Analyzing Syntax in a String

Here is an example of performing syntactic analysis on a text string sent directly to the Natural Language API:

Protocol

Refer to the documents:analyzeSyntax API endpoint for complete details.

To perform syntactic analysis, make a POST request and provide the appropriate request body:

POST https://language.googleapis.com/v1/documents:analyzeSyntax?key=YOUR_API_KEY
{
  "encodingType": "UTF8",
  "document": {
    "type": "PLAIN_TEXT",
    "content": "Hello, world!"
  }
}

If you don't specify document.language, then the language will be automatically detected. For information on which languages are supported by the Natural Language API, see Language Support. See the Document reference documentation for more information on configuring the request body.

If the request is successful, the server returns a 200 OK HTTP status code and the response in JSON format:

{
  "sentences": [
    {
      "text": {
        "content": "Hello, world!",
        "beginOffset": 0
      }
    }
  ],
  "tokens": [
    {
      "text": {
        "content": "Hello",
        "beginOffset": 0
      },
      "partOfSpeech": {
        "tag": "X",
        // ...
      },
      "dependencyEdge": {
        "headTokenIndex": 2,
        "label": "DISCOURSE"
      },
      "lemma": "Hello"
    },
    {
      "text": {
        "content": ",",
        "beginOffset": 5
      },
      "partOfSpeech": {
        "tag": "PUNCT",
        // ...
      },
      "dependencyEdge": {
        "headTokenIndex": 2,
        "label": "P"
      },
      "lemma": ","
    },
    // ...
  ],
  "language": "en"
}

The tokens array contains Token objects representing the detected sentence tokens, which include information such as a token's part of speech and its position in the sentence.

C#

For more on installing and creating a Natural Language API client, refer to Natural Language API Client Libraries.

private static void AnalyzeSyntaxFromText(string text)
{
    var client = LanguageServiceClient.Create();
    var response = client.AnnotateText(new Document()
    {
        Content = text,
        Type = Document.Types.Type.PlainText
    },
    new Features() { ExtractSyntax = true });
    WriteSentences(response.Sentences, response.Tokens);
}

private static void WriteSentences(IEnumerable<Sentence> sentences,
    RepeatedField<Token> tokens)
{
    Console.WriteLine("Sentences:");
    foreach (var sentence in sentences)
    {
        Console.WriteLine($"\t{sentence.Text.BeginOffset}: {sentence.Text.Content}");
    }
    Console.WriteLine("Tokens:");
    foreach (var token in tokens)
    {
        Console.WriteLine($"{token.PartOfSpeech.Tag} "
            + $"{token.Text.Content}");
    }
}

Go

For more on installing and creating a Natural Language API client, refer to Natural Language API Client Libraries.

func analyzeSyntax(ctx context.Context, client *language.Client, text string) (*languagepb.AnnotateTextResponse, error) {
	return client.AnnotateText(ctx, &languagepb.AnnotateTextRequest{
		Document: &languagepb.Document{
			Source: &languagepb.Document_Content{
				Content: text,
			},
			Type: languagepb.Document_PLAIN_TEXT,
		},
		Features: &languagepb.AnnotateTextRequest_Features{
			ExtractSyntax: true,
		},
		EncodingType: languagepb.EncodingType_UTF8,
	})
}

Java

For more on installing and creating a Natural Language API client, refer to Natural Language API Client Libraries.

public List<Token> analyzeSyntaxText(String text) throws IOException {
  Document doc = Document.newBuilder()
      .setContent(text).setType(Type.PLAIN_TEXT).build();
  AnalyzeSyntaxRequest request = AnalyzeSyntaxRequest.newBuilder()
      .setDocument(doc)
      .setEncodingType(EncodingType.UTF16).build();
  AnalyzeSyntaxResponse response = languageApi.analyzeSyntax(request);
  return response.getTokensList();
}

Node.js

For more on installing and creating a Natural Language API client, refer to Natural Language API Client Libraries.

// Imports the Google Cloud client library
const Language = require('@google-cloud/language');

// Instantiates a client
const language = Language();

// The text to analyze, e.g. "Hello, world!"
// const text = 'Hello, world!';

// Instantiates a Document, representing the provided text
const document = language.document({ content: text });

// Detects syntax in the document
document.detectSyntax()
  .then((results) => {
    const syntax = results[0];

    console.log('Parts of speech:');
    syntax.forEach((part) => {
      console.log(`${part.partOfSpeech.tag}: ${part.text.content}`);
      console.log(`Morphology:`, part.partOfSpeech);
    });
  })
  .catch((err) => {
    console.error('ERROR:', err);
  });

PHP

For more on installing and creating a Natural Language API client, refer to Natural Language API Client Libraries.

use Google\Cloud\NaturalLanguage\NaturalLanguageClient;
use Google\Cloud\NaturalLanguage\Annotation;

/**
 * Find the syntax in text.
 * ```
 * analyze_syntax('Do you know the way to San Jose?');
 * ```
 *
 * @param string $text The text to analyze.
 *
 * @return Annotation
 */
function analyze_syntax($text, $options = [])
{
    $language = new NaturalLanguageClient();
    $annotation = $language->analyzeSyntax($text, $options);
    return $annotation;
}

Python

For more on installing and creating a Natural Language API client, refer to Natural Language API Client Libraries.

def syntax_text(text):
    """Detects syntax in the text."""
    language_client = language.Client()

    if isinstance(text, six.binary_type):
        text = text.decode('utf-8')

    # Instantiates a plain text document.
    document = language_client.document_from_text(text)

    # Detects syntax in the document. You can also analyze HTML with:
    #   document.doc_type == language.Document.HTML
    tokens = document.analyze_syntax().tokens

    for token in tokens:
        print(u'{}: {}'.format(token.part_of_speech, token.text_content))

Ruby

For more on installing and creating a Natural Language API client, refer to Natural Language API Client Libraries.

# project_id   = "Your Google Cloud project ID"
# text_content = "Text to analyze syntax of"

require "google/cloud/language"

language = Google::Cloud::Language.new project: project_id
document = language.document text_content
syntax   = document.syntax

puts "Sentences: #{syntax.sentences.count}"
puts "Tokens: #{syntax.tokens.count}"

syntax.tokens.each do |token|
  puts "#{token.part_of_speech.tag} #{token.text_span.text}"
end

Analyzing Syntax in a Remote File

For your convenience, the Natural Language API can perform syntactic analysis directly on a file located in Google Cloud Storage, without the need to send the contents of the file in the body of your request.

Here is an example of performing syntactic analysis on a file located in Cloud Storage.

Protocol

Refer to the documents:analyzeSyntax API endpoint for complete details.

To perform syntactic analysis on a file in Google Cloud Storage, make a POST request and provide the appropriate request body:

POST https://language.googleapis.com/v1/documents:analyzeSyntax?key=YOUR_API_KEY
{
  "encodingType": "UTF8",
  "document": {
    "type": "PLAIN_TEXT",
    "gcsContentUri": "gs://your-bucket-name/your-file.txt"
  }
}

If you don't specify document.language, then the language will be automatically detected. For information on which languages are supported by the Natural Language API, see Language Support. See the Document reference documentation for more information on configuring the request body.

If the request is successful, the server returns a 200 OK HTTP status code and the response in JSON format:

{
  "sentences": [
    {
      "text": {
        "content": "Hello, world!",
        "beginOffset": 0
      }
    }
  ],
  "tokens": [
    {
      "text": {
        "content": "Hello",
        "beginOffset": 0
      },
      "partOfSpeech": {
        "tag": "X",
        // ...
      },
      "dependencyEdge": {
        "headTokenIndex": 2,
        "label": "DISCOURSE"
      },
      "lemma": "Hello"
    },
    {
      "text": {
        "content": ",",
        "beginOffset": 5
      },
      "partOfSpeech": {
        "tag": "PUNCT",
        // ...
      },
      "dependencyEdge": {
        "headTokenIndex": 2,
        "label": "P"
      },
      "lemma": ","
    },
    // ...
  ],
  "language": "en"
}

The tokens array contains Token objects representing the detected sentence tokens, which include information such as a token's part of speech and its position in the sentence.

C#

For more on installing and creating a Natural Language API client, refer to Natural Language API Client Libraries.

private static void AnalyzeSyntaxFromFile(string gcsUri)
{
    var client = LanguageServiceClient.Create();
    var response = client.AnnotateText(new Document()
    {
        GcsContentUri = gcsUri,
        Type = Document.Types.Type.PlainText
    },
    new Features() { ExtractSyntax = true });
    WriteSentences(response.Sentences, response.Tokens);
}
private static void WriteSentences(IEnumerable<Sentence> sentences,
    RepeatedField<Token> tokens)
{
    Console.WriteLine("Sentences:");
    foreach (var sentence in sentences)
    {
        Console.WriteLine($"\t{sentence.Text.BeginOffset}: {sentence.Text.Content}");
    }
    Console.WriteLine("Tokens:");
    foreach (var token in tokens)
    {
        Console.WriteLine($"{token.PartOfSpeech.Tag} "
            + $"{token.Text.Content}");
    }
}

Go

For more on installing and creating a Natural Language API client, refer to Natural Language API Client Libraries.

func analyzeSyntaxFromGCS(ctx context.Context, gcsURI string) (*languagepb.AnnotateTextResponse, error) {
	return client.AnnotateText(ctx, &languagepb.AnnotateTextRequest{
		Document: &languagepb.Document{
			Source: &languagepb.Document_GcsContentUri{
				GcsContentUri: gcsURI,
			},
			Type: languagepb.Document_PLAIN_TEXT,
		},
		Features: &languagepb.AnnotateTextRequest_Features{
			ExtractSyntax: true,
		},
		EncodingType: languagepb.EncodingType_UTF8,
	})
}

Java

For more on installing and creating a Natural Language API client, refer to Natural Language API Client Libraries.

public List<Token> analyzeSyntaxFile(String path) throws IOException {
  Document doc = Document.newBuilder()
      .setGcsContentUri(path).setType(Type.PLAIN_TEXT).build();
  AnalyzeSyntaxRequest request = AnalyzeSyntaxRequest.newBuilder()
      .setDocument(doc)
      .setEncodingType(EncodingType.UTF16).build();
  AnalyzeSyntaxResponse response = languageApi.analyzeSyntax(request);
  return response.getTokensList();
}

Node.js

For more on installing and creating a Natural Language API client, refer to Natural Language API Client Libraries.

// Imports the Google Cloud client libraries
const Language = require('@google-cloud/language');
const Storage = require('@google-cloud/storage');

// Instantiates the clients
const language = Language();
const storage = Storage();

// The name of the bucket where the file resides, e.g. "my-bucket"
// const bucketName = 'my-bucket';

// The name of the file to analyze, e.g. "file.txt"
// const fileName = 'file.txt';

// Instantiates a Document, representing a text file in Cloud Storage
const document = language.document({
  // The Google Cloud Storage file
  content: storage.bucket(bucketName).file(fileName)
});

// Detects syntax in the document
document.detectSyntax()
  .then((results) => {
    const syntax = results[0];

    console.log('Parts of speech:');
    syntax.forEach((part) => {
      console.log(`${part.partOfSpeech.tag}: ${part.text.content}`);
      console.log(`Morphology:`, part.partOfSpeech);
    });
  })
  .catch((err) => {
    console.error('ERROR:', err);
  });

PHP

For more on installing and creating a Natural Language API client, refer to Natural Language API Client Libraries.

use Google\Cloud\NaturalLanguage\NaturalLanguageClient;
use Google\Cloud\NaturalLanguage\Annotation;
use Google\Cloud\Storage\StorageClient;

/**
 * Find the syntax in text stored in a Cloud Storage bucket.
 * ```
 * analyze_syntax_from_file('my-bucket', 'file_with_text.txt');
 * ```
 *
 * @param string $bucketName The Cloud Storage bucket.
 * @param string $objectName The Cloud Storage object with text.
 *
 * @return Annotation
 */
function analyze_syntax_from_file($bucketName, $objectName, $options = [])
{
    // Create the Cloud Storage object
    $storage = new StorageClient();
    $bucket = $storage->bucket($bucketName);
    $storageObject = $bucket->object($objectName);

    // Call the Natural Language client
    $language = new NaturalLanguageClient();
    $annotation = $language->analyzeSyntax($storageObject, $options);
    return $annotation;
}

Python

For more on installing and creating a Natural Language API client, refer to Natural Language API Client Libraries.

def syntax_file(gcs_uri):
    """Detects syntax in the file located in Google Cloud Storage."""
    language_client = language.Client()

    # Instantiates a plain text document.
    document = language_client.document_from_url(gcs_uri)

    # Detects syntax in the document. You can also analyze HTML with:
    #   document.doc_type == language.Document.HTML
    tokens = document.analyze_syntax().tokens

    for token in tokens:
        print(u'{}: {}'.format(token.part_of_speech, token.text_content))

Ruby

For more on installing and creating a Natural Language API client, refer to Natural Language API Client Libraries.

# project_id   = "Your Google Cloud project ID"
# storage_path = "Path to file in Google Cloud Storage, eg. gs://bucket/file"

require "google/cloud/language"

language = Google::Cloud::Language.new project: project_id
document = language.document storage_path
syntax   = document.syntax

puts "Sentences: #{syntax.sentences.count}"
puts "Tokens: #{syntax.tokens.count}"

syntax.tokens.each do |token|
  puts "#{token.part_of_speech.tag} #{token.text_span.text}"
end

Send feedback about...

Google Cloud Natural Language API Documentation