Analyzing Syntax

Syntactic Analysis extracts linguistic information, breaking up the given text into a series of sentences and tokens (generally, word boundaries), providing further analysis on those tokens. Syntactic Analysis is performed with the analyzeSyntax method. For information on which languages are supported by the Natural Language API, see Language Support.

This section demonstrates a few ways to detect syntax in a document.

Analyzing Syntax in a String

Here is an example of performing syntactic analysis on a text string sent directly to the Natural Language API:

Protocol

Refer to the documents:analyzeSyntax API endpoint for complete details.

To perform syntactic analysis, make a POST request and provide the appropriate request body:

POST https://language.googleapis.com/v1/documents:analyzeSyntax?key=YOUR_API_KEY
{
  "encodingType": "UTF8",
  "document": {
    "type": "PLAIN_TEXT",
    "content": "Hello, world!"
  }
}

If you don't specify document.language, then the language will be automatically detected. For information on which languages are supported by the Natural Language API, see Language Support. See the Document reference documentation for more information on configuring the request body.

If the request is successful, the server returns a 200 OK HTTP status code and the response in JSON format:

{
  "sentences": [
    {
      "text": {
        "content": "Hello, world!",
        "beginOffset": 0
      }
    }
  ],
  "tokens": [
    {
      "text": {
        "content": "Hello",
        "beginOffset": 0
      },
      "partOfSpeech": {
        "tag": "X",
        // ...
      },
      "dependencyEdge": {
        "headTokenIndex": 2,
        "label": "DISCOURSE"
      },
      "lemma": "Hello"
    },
    {
      "text": {
        "content": ",",
        "beginOffset": 5
      },
      "partOfSpeech": {
        "tag": "PUNCT",
        // ...
      },
      "dependencyEdge": {
        "headTokenIndex": 2,
        "label": "P"
      },
      "lemma": ","
    },
    // ...
  ],
  "language": "en"
}

The tokens array contains Token objects representing the detected sentence tokens, which include information such as a token's part of speech and its position in the sentence.

GCLOUD

Refer to the analyze-syntax command for complete details.

To perform syntax analysis, use the gcloud command line tool and use the --content flag to identify the content to analyze:

gcloud ml language analyze-syntax --content="Hello world!"

If the request is successful, the server returns a response in JSON format:

{
  "sentences": [
    {
      "text": {
        "content": "Hello, world!",
        "beginOffset": 0
      }
    }
  ],
  "tokens": [
    {
      "text": {
        "content": "Hello",
        "beginOffset": 0
      },
      "partOfSpeech": {
        "tag": "X",
        // ...
      },
      "dependencyEdge": {
        "headTokenIndex": 2,
        "label": "DISCOURSE"
      },
      "lemma": "Hello"
    },
    {
      "text": {
        "content": ",",
        "beginOffset": 5
      },
      "partOfSpeech": {
        "tag": "PUNCT",
        // ...
      },
      "dependencyEdge": {
        "headTokenIndex": 2,
        "label": "P"
      },
      "lemma": ","
    },
    // ...
  ],
  "language": "en"
}

The tokens array contains Token objects representing the detected sentence tokens, which include information such as a token's part of speech and its position in the sentence.

C#

For more on installing and creating a Natural Language API client, refer to Natural Language API Client Libraries.

private static void AnalyzeSyntaxFromText(string text)
{
    var client = LanguageServiceClient.Create();
    var response = client.AnnotateText(new Document()
    {
        Content = text,
        Type = Document.Types.Type.PlainText
    },
    new Features() { ExtractSyntax = true });
    WriteSentences(response.Sentences, response.Tokens);
}

private static void WriteSentences(IEnumerable<Sentence> sentences,
    RepeatedField<Token> tokens)
{
    Console.WriteLine("Sentences:");
    foreach (var sentence in sentences)
    {
        Console.WriteLine($"\t{sentence.Text.BeginOffset}: {sentence.Text.Content}");
    }
    Console.WriteLine("Tokens:");
    foreach (var token in tokens)
    {
        Console.WriteLine($"\t{token.PartOfSpeech.Tag} "
            + $"{token.Text.Content}");
    }
}

Go

For more on installing and creating a Natural Language API client, refer to Natural Language API Client Libraries.

func analyzeSyntax(ctx context.Context, client *language.Client, text string) (*languagepb.AnnotateTextResponse, error) {
	return client.AnnotateText(ctx, &languagepb.AnnotateTextRequest{
		Document: &languagepb.Document{
			Source: &languagepb.Document_Content{
				Content: text,
			},
			Type: languagepb.Document_PLAIN_TEXT,
		},
		Features: &languagepb.AnnotateTextRequest_Features{
			ExtractSyntax: true,
		},
		EncodingType: languagepb.EncodingType_UTF8,
	})
}

Java

For more on installing and creating a Natural Language API client, refer to Natural Language API Client Libraries.

// Instantiate the Language client com.google.cloud.language.v1.LanguageServiceClient
try (LanguageServiceClient language = LanguageServiceClient.create()) {
  Document doc = Document.newBuilder()
      .setContent(text)
      .setType(Type.PLAIN_TEXT)
      .build();
  AnalyzeSyntaxRequest request = AnalyzeSyntaxRequest.newBuilder()
      .setDocument(doc)
      .setEncodingType(EncodingType.UTF16)
      .build();
  // analyze the syntax in the given text
  AnalyzeSyntaxResponse response = language.analyzeSyntax(request);
  // print the response
  for (Token token : response.getTokensList()) {
    System.out.printf("\tText: %s\n", token.getText().getContent());
    System.out.printf("\tBeginOffset: %d\n", token.getText().getBeginOffset());
    System.out.printf("Lemma: %s\n", token.getLemma());
    System.out.printf("PartOfSpeechTag: %s\n", token.getPartOfSpeech().getTag());
    System.out.printf("\tAspect: %s\n", token.getPartOfSpeech().getAspect());
    System.out.printf("\tCase: %s\n", token.getPartOfSpeech().getCase());
    System.out.printf("\tForm: %s\n", token.getPartOfSpeech().getForm());
    System.out.printf("\tGender: %s\n", token.getPartOfSpeech().getGender());
    System.out.printf("\tMood: %s\n", token.getPartOfSpeech().getMood());
    System.out.printf("\tNumber: %s\n", token.getPartOfSpeech().getNumber());
    System.out.printf("\tPerson: %s\n", token.getPartOfSpeech().getPerson());
    System.out.printf("\tProper: %s\n", token.getPartOfSpeech().getProper());
    System.out.printf("\tReciprocity: %s\n", token.getPartOfSpeech().getReciprocity());
    System.out.printf("\tTense: %s\n", token.getPartOfSpeech().getTense());
    System.out.printf("\tVoice: %s\n", token.getPartOfSpeech().getVoice());
    System.out.println("DependencyEdge");
    System.out.printf("\tHeadTokenIndex: %d\n", token.getDependencyEdge().getHeadTokenIndex());
    System.out.printf("\tLabel: %s\n\n", token.getDependencyEdge().getLabel());
  }
  return response.getTokensList();
}

Node.js

For more on installing and creating a Natural Language API client, refer to Natural Language API Client Libraries.

// Imports the Google Cloud client library
const language = require('@google-cloud/language');

// Creates a client
const client = new language.LanguageServiceClient();

/**
 * TODO(developer): Uncomment the following line to run this code.
 */
// const text = 'Your text to analyze, e.g. Hello, world!';

// Prepares a document, representing the provided text
const document = {
  content: text,
  type: 'PLAIN_TEXT',
};

// Detects syntax in the document
client
  .analyzeSyntax({document: document})
  .then(results => {
    const syntax = results[0];

    console.log('Tokens:');
    syntax.tokens.forEach(part => {
      console.log(`${part.partOfSpeech.tag}: ${part.text.content}`);
      console.log(`Morphology:`, part.partOfSpeech);
    });
  })
  .catch(err => {
    console.error('ERROR:', err);
  });

PHP

For more on installing and creating a Natural Language API client, refer to Natural Language API Client Libraries.

namespace Google\Cloud\Samples\Language;

use Google\Cloud\Language\LanguageClient;

/**
 * Find the syntax in text.
 * ```
 * analyze_syntax('Do you know the way to San Jose?');
 * ```
 *
 * @param string $text The text to analyze.
 * @param string $projectId (optional) Your Google Cloud Project ID
 *
 */
function analyze_syntax($text, $projectId = null)
{
    // Create the Natural Language client
    $language = new LanguageClient([
        'projectId' => $projectId,
    ]);

    // Call the analyzeSyntax function
    $annotation = $language->analyzeSyntax($text);

    // Print syntax information. See https://cloud.google.com/natural-language/docs/reference/rest/v1/Token
    // to learn about more information you can extract from Token objects.
    $tokens = $annotation->tokens();
    foreach ($tokens as $token) {
        printf('Token text: %s' . PHP_EOL, $token['text']['content']);
        printf('Token part of speech: %s' . PHP_EOL, $token['partOfSpeech']['tag']);
        printf(PHP_EOL);
    }
}

Python

For more on installing and creating a Natural Language API client, refer to Natural Language API Client Libraries.

def syntax_text(text):
    """Detects syntax in the text."""
    client = language.LanguageServiceClient()

    if isinstance(text, six.binary_type):
        text = text.decode('utf-8')

    # Instantiates a plain text document.
    document = types.Document(
        content=text,
        type=enums.Document.Type.PLAIN_TEXT)

    # Detects syntax in the document. You can also analyze HTML with:
    #   document.type == enums.Document.Type.HTML
    tokens = client.analyze_syntax(document).tokens

    # part-of-speech tags from enums.PartOfSpeech.Tag
    pos_tag = ('UNKNOWN', 'ADJ', 'ADP', 'ADV', 'CONJ', 'DET', 'NOUN', 'NUM',
               'PRON', 'PRT', 'PUNCT', 'VERB', 'X', 'AFFIX')

    for token in tokens:
        print(u'{}: {}'.format(pos_tag[token.part_of_speech.tag],
                               token.text.content))

Ruby

For more on installing and creating a Natural Language API client, refer to Natural Language API Client Libraries.

# text_content = "Text to analyze syntax of"

require "google/cloud/language"

language = Google::Cloud::Language.new
response = language.analyze_syntax content: text_content, type: :PLAIN_TEXT

sentences = response.sentences
tokens    = response.tokens

puts "Sentences: #{sentences.count}"
puts "Tokens: #{tokens.count}"

tokens.each do |token|
  puts "#{token.part_of_speech.tag} #{token.text.content}"
end

Analyzing Syntax in a Remote File

For your convenience, the Natural Language API can perform syntactic analysis directly on a file located in Google Cloud Storage, without the need to send the contents of the file in the body of your request.

Here is an example of performing syntactic analysis on a file located in Cloud Storage.

Protocol

Refer to the documents:analyzeSyntax API endpoint for complete details.

To perform syntactic analysis on a file in Google Cloud Storage, make a POST request and provide the appropriate request body:

POST https://language.googleapis.com/v1/documents:analyzeSyntax?key=YOUR_API_KEY
{
  "encodingType": "UTF8",
  "document": {
    "type": "PLAIN_TEXT",
    "gcsContentUri": "gs://your-bucket-name/your-file.txt"
  }
}

If you don't specify document.language, then the language will be automatically detected. For information on which languages are supported by the Natural Language API, see Language Support. See the Document reference documentation for more information on configuring the request body.

If the request is successful, the server returns a 200 OK HTTP status code and the response in JSON format:

{
  "sentences": [
    {
      "text": {
        "content": "Hello, world!",
        "beginOffset": 0
      }
    }
  ],
  "tokens": [
    {
      "text": {
        "content": "Hello",
        "beginOffset": 0
      },
      "partOfSpeech": {
        "tag": "X",
        // ...
      },
      "dependencyEdge": {
        "headTokenIndex": 2,
        "label": "DISCOURSE"
      },
      "lemma": "Hello"
    },
    {
      "text": {
        "content": ",",
        "beginOffset": 5
      },
      "partOfSpeech": {
        "tag": "PUNCT",
        // ...
      },
      "dependencyEdge": {
        "headTokenIndex": 2,
        "label": "P"
      },
      "lemma": ","
    },
    // ...
  ],
  "language": "en"
}

The tokens array contains Token objects representing the detected sentence tokens, which include information such as a token's part of speech and its position in the sentence.

GCLOUD

Refer to the analyze-syntax command for complete details.

To perform syntax analysis on a file in Google Cloud Storage, use the gcloud command line tool and use the --content-file flag to identify the file path that contains the content to analyze:

gcloud ml language analyze-syntax --content-file=gs://YOUR_BUCKET_NAME/YOUR_FILE_NAME

If the request is successful, the server returns a response in JSON format:

{
  "sentences": [
    {
      "text": {
        "content": "Hello, world!",
        "beginOffset": 0
      }
    }
  ],
  "tokens": [
    {
      "text": {
        "content": "Hello",
        "beginOffset": 0
      },
      "partOfSpeech": {
        "tag": "X",
        // ...
      },
      "dependencyEdge": {
        "headTokenIndex": 2,
        "label": "DISCOURSE"
      },
      "lemma": "Hello"
    },
    {
      "text": {
        "content": ",",
        "beginOffset": 5
      },
      "partOfSpeech": {
        "tag": "PUNCT",
        // ...
      },
      "dependencyEdge": {
        "headTokenIndex": 2,
        "label": "P"
      },
      "lemma": ","
    },
    // ...
  ],
  "language": "en"
}

The tokens array contains Token objects representing the detected sentence tokens, which include information such as a token's part of speech and its position in the sentence.

C#

For more on installing and creating a Natural Language API client, refer to Natural Language API Client Libraries.

private static void AnalyzeSyntaxFromFile(string gcsUri)
{
    var client = LanguageServiceClient.Create();
    var response = client.AnnotateText(new Document()
    {
        GcsContentUri = gcsUri,
        Type = Document.Types.Type.PlainText
    },
    new Features() { ExtractSyntax = true });
    WriteSentences(response.Sentences, response.Tokens);
}
private static void WriteSentences(IEnumerable<Sentence> sentences,
    RepeatedField<Token> tokens)
{
    Console.WriteLine("Sentences:");
    foreach (var sentence in sentences)
    {
        Console.WriteLine($"\t{sentence.Text.BeginOffset}: {sentence.Text.Content}");
    }
    Console.WriteLine("Tokens:");
    foreach (var token in tokens)
    {
        Console.WriteLine($"\t{token.PartOfSpeech.Tag} "
            + $"{token.Text.Content}");
    }
}

Go

For more on installing and creating a Natural Language API client, refer to Natural Language API Client Libraries.

func analyzeSyntaxFromGCS(ctx context.Context, gcsURI string) (*languagepb.AnnotateTextResponse, error) {
	return client.AnnotateText(ctx, &languagepb.AnnotateTextRequest{
		Document: &languagepb.Document{
			Source: &languagepb.Document_GcsContentUri{
				GcsContentUri: gcsURI,
			},
			Type: languagepb.Document_PLAIN_TEXT,
		},
		Features: &languagepb.AnnotateTextRequest_Features{
			ExtractSyntax: true,
		},
		EncodingType: languagepb.EncodingType_UTF8,
	})
}

Java

For more on installing and creating a Natural Language API client, refer to Natural Language API Client Libraries.

// Instantiate the Language client com.google.cloud.language.v1.LanguageServiceClient
try (LanguageServiceClient language = LanguageServiceClient.create()) {
  Document doc = Document.newBuilder()
      .setGcsContentUri(gcsUri)
      .setType(Type.PLAIN_TEXT)
      .build();
  AnalyzeSyntaxRequest request = AnalyzeSyntaxRequest.newBuilder()
      .setDocument(doc)
      .setEncodingType(EncodingType.UTF16)
      .build();
  // analyze the syntax in the given text
  AnalyzeSyntaxResponse response = language.analyzeSyntax(request);
  // print the response
  for (Token token : response.getTokensList()) {
    System.out.printf("\tText: %s\n", token.getText().getContent());
    System.out.printf("\tBeginOffset: %d\n", token.getText().getBeginOffset());
    System.out.printf("Lemma: %s\n", token.getLemma());
    System.out.printf("PartOfSpeechTag: %s\n", token.getPartOfSpeech().getTag());
    System.out.printf("\tAspect: %s\n", token.getPartOfSpeech().getAspect());
    System.out.printf("\tCase: %s\n", token.getPartOfSpeech().getCase());
    System.out.printf("\tForm: %s\n", token.getPartOfSpeech().getForm());
    System.out.printf("\tGender: %s\n", token.getPartOfSpeech().getGender());
    System.out.printf("\tMood: %s\n", token.getPartOfSpeech().getMood());
    System.out.printf("\tNumber: %s\n", token.getPartOfSpeech().getNumber());
    System.out.printf("\tPerson: %s\n", token.getPartOfSpeech().getPerson());
    System.out.printf("\tProper: %s\n", token.getPartOfSpeech().getProper());
    System.out.printf("\tReciprocity: %s\n", token.getPartOfSpeech().getReciprocity());
    System.out.printf("\tTense: %s\n", token.getPartOfSpeech().getTense());
    System.out.printf("\tVoice: %s\n", token.getPartOfSpeech().getVoice());
    System.out.println("DependencyEdge");
    System.out.printf("\tHeadTokenIndex: %d\n", token.getDependencyEdge().getHeadTokenIndex());
    System.out.printf("\tLabel: %s\n\n", token.getDependencyEdge().getLabel());
  }

  return response.getTokensList();
}

Node.js

For more on installing and creating a Natural Language API client, refer to Natural Language API Client Libraries.

// Imports the Google Cloud client library
const language = require('@google-cloud/language');

// Creates a client
const client = new language.LanguageServiceClient();

/**
 * TODO(developer): Uncomment the following lines to run this code
 */
// const bucketName = 'Your bucket name, e.g. my-bucket';
// const fileName = 'Your file name, e.g. my-file.txt';

// Prepares a document, representing a text file in Cloud Storage
const document = {
  gcsContentUri: `gs://${bucketName}/${fileName}`,
  type: 'PLAIN_TEXT',
};

// Detects syntax in the document
client
  .analyzeSyntax({document: document})
  .then(results => {
    const syntax = results[0];

    console.log('Parts of speech:');
    syntax.tokens.forEach(part => {
      console.log(`${part.partOfSpeech.tag}: ${part.text.content}`);
      console.log(`Morphology:`, part.partOfSpeech);
    });
  })
  .catch(err => {
    console.error('ERROR:', err);
  });

PHP

For more on installing and creating a Natural Language API client, refer to Natural Language API Client Libraries.

namespace Google\Cloud\Samples\Language;

use Google\Cloud\Language\LanguageClient;
use Google\Cloud\Storage\StorageClient;

/**
 * Find the syntax in text stored in a Cloud Storage bucket.
 * ```
 * analyze_syntax_from_file('my-bucket', 'file_with_text.txt');
 * ```
 *
 * @param string $bucketName The Cloud Storage bucket.
 * @param string $objectName The Cloud Storage object with text.
 * @param string $projectId (optional) Your Google Cloud Project ID
 *
 */
function analyze_syntax_from_file($bucketName, $objectName, $projectId = null)
{
    // Create the Cloud Storage object
    $storage = new StorageClient();
    $bucket = $storage->bucket($bucketName);
    $storageObject = $bucket->object($objectName);

    // Create the Natural Language client
    $language = new LanguageClient([
        'projectId' => $projectId,
    ]);

    // Call the analyzeSyntax function
    $annotation = $language->analyzeSyntax($storageObject);

    // Print syntax information. See https://cloud.google.com/natural-language/docs/reference/rest/v1/Token
    // to learn about more information you can extract from Token objects.
    $tokens = $annotation->tokens();
    foreach ($tokens as $token) {
        printf('Token text: %s' . PHP_EOL, $token['text']['content']);
        printf('Token part of speech: %s' . PHP_EOL, $token['partOfSpeech']['tag']);
        printf(PHP_EOL);
    }
}

Python

For more on installing and creating a Natural Language API client, refer to Natural Language API Client Libraries.

def syntax_file(gcs_uri):
    """Detects syntax in the file located in Google Cloud Storage."""
    client = language.LanguageServiceClient()

    # Instantiates a plain text document.
    document = types.Document(
        gcs_content_uri=gcs_uri,
        type=enums.Document.Type.PLAIN_TEXT)

    # Detects syntax in the document. You can also analyze HTML with:
    #   document.type == enums.Document.Type.HTML
    tokens = client.analyze_syntax(document).tokens

    # part-of-speech tags from enums.PartOfSpeech.Tag
    pos_tag = ('UNKNOWN', 'ADJ', 'ADP', 'ADV', 'CONJ', 'DET', 'NOUN', 'NUM',
               'PRON', 'PRT', 'PUNCT', 'VERB', 'X', 'AFFIX')

    for token in tokens:
        print(u'{}: {}'.format(pos_tag[token.part_of_speech.tag],
                               token.text.content))

Ruby

For more on installing and creating a Natural Language API client, refer to Natural Language API Client Libraries.

# storage_path = "Path to file in Google Cloud Storage, eg. gs://bucket/file"

require "google/cloud/language"

language = Google::Cloud::Language.new
response = language.analyze_syntax gcs_content_uri: storage_path, type: :PLAIN_TEXT

sentences = response.sentences
tokens    = response.tokens

puts "Sentences: #{sentences.count}"
puts "Tokens: #{tokens.count}"

tokens.each do |token|
  puts "#{token.part_of_speech.tag} #{token.text.content}"
end

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...

Google Cloud Natural Language API Documentation