Classify content of a Cloud Storage file

Analyze a file stored in Google Cloud Storage and return a list of content categories that apply to the text found in the document

Explore further

For detailed documentation that includes this code sample, see the following:

Code sample

Go

To learn how to install and use the client library for Natural Language, see Natural Language client libraries. For more information, see the Natural Language Go API reference documentation.

To authenticate to Natural Language, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.


func classifyTextFromGCS(ctx context.Context, gcsURI string) (*languagepb.ClassifyTextResponse, error) {
	return client.ClassifyText(ctx, &languagepb.ClassifyTextRequest{
		Document: &languagepb.Document{
			Source: &languagepb.Document_GcsContentUri{
				GcsContentUri: gcsURI,
			},
			Type: languagepb.Document_PLAIN_TEXT,
		},
	})
}

Java

To learn how to install and use the client library for Natural Language, see Natural Language client libraries. For more information, see the Natural Language Java API reference documentation.

To authenticate to Natural Language, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

// Instantiate the Language client com.google.cloud.language.v2.LanguageServiceClient
try (LanguageServiceClient language = LanguageServiceClient.create()) {
  // Set the GCS content URI path
  Document doc =
      Document.newBuilder().setGcsContentUri(gcsUri).setType(Type.PLAIN_TEXT).build();
  ClassifyTextRequest request = ClassifyTextRequest.newBuilder().setDocument(doc).build();
  // Detect categories in the given file
  ClassifyTextResponse response = language.classifyText(request);

  for (ClassificationCategory category : response.getCategoriesList()) {
    System.out.printf(
        "Category name : %s, Confidence : %.3f\n",
        category.getName(), category.getConfidence());
  }
}

Node.js

To learn how to install and use the client library for Natural Language, see Natural Language client libraries. For more information, see the Natural Language Node.js API reference documentation.

To authenticate to Natural Language, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

// Imports the Google Cloud client library.
const language = require('@google-cloud/language');

// Creates a client.
const client = new language.LanguageServiceClient();

/**
 * TODO(developer): Uncomment the following lines to run this code
 */
// const bucketName = 'Your bucket name, e.g. my-bucket';
// const fileName = 'Your file name, e.g. my-file.txt';

// Prepares a document, representing a text file in Cloud Storage
const document = {
  gcsContentUri: `gs://${bucketName}/${fileName}`,
  type: 'PLAIN_TEXT',
};

// Classifies text in the document
const [classification] = await client.classifyText({document});

console.log('Categories:');
classification.categories.forEach(category => {
  console.log(`Name: ${category.name}, Confidence: ${category.confidence}`);
});

PHP

To learn how to install and use the client library for Natural Language, see Natural Language client libraries.

To authenticate to Natural Language, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

use Google\Cloud\Language\V1\ClassifyTextRequest;
use Google\Cloud\Language\V1\Client\LanguageServiceClient;
use Google\Cloud\Language\V1\Document;
use Google\Cloud\Language\V1\Document\Type;

/**
 * @param string $uri The cloud storage object to analyze (gs://your-bucket-name/your-object-name)
 */
function classify_text_from_file(string $uri): void
{
    $languageServiceClient = new LanguageServiceClient();

    // Create a new Document, pass GCS URI and set type to PLAIN_TEXT
    $document = (new Document())
        ->setGcsContentUri($uri)
        ->setType(Type::PLAIN_TEXT);

    // Call the analyzeSentiment function
    $request = (new ClassifyTextRequest())
        ->setDocument($document);
    $response = $languageServiceClient->classifyText($request);
    $categories = $response->getCategories();
    // Print document information
    foreach ($categories as $category) {
        printf('Category Name: %s' . PHP_EOL, $category->getName());
        printf('Confidence: %s' . PHP_EOL, $category->getConfidence());
        print(PHP_EOL);
    }
}

Python

To learn how to install and use the client library for Natural Language, see Natural Language client libraries. For more information, see the Natural Language Python API reference documentation.

To authenticate to Natural Language, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

from google.cloud import language_v1


def sample_classify_text(gcs_content_uri):
    """
    Classifying Content in text file stored in Cloud Storage

    Args:
      gcs_content_uri Google Cloud Storage URI where the file content is located.
      e.g. gs://[Your Bucket]/[Path to File]
      The text file must include at least 20 words.
    """

    client = language_v1.LanguageServiceClient()

    # gcs_content_uri = 'gs://cloud-samples-data/language/classify-entertainment.txt'

    # Available types: PLAIN_TEXT, HTML
    type_ = language_v1.Document.Type.PLAIN_TEXT

    # Optional. If not specified, the language is automatically detected.
    # For list of supported languages:
    # https://cloud.google.com/natural-language/docs/languages
    language = "en"
    document = {
        "gcs_content_uri": gcs_content_uri,
        "type_": type_,
        "language": language,
    }

    response = client.classify_text(request={"document": document})
    # Loop through classified categories returned from the API
    for category in response.categories:
        # Get the name of the category representing the document.
        # See the predefined taxonomy of categories:
        # https://cloud.google.com/natural-language/docs/categories
        print(f"Category name: {category.name}")
        # Get the confidence. Number representing how certain the classifier
        # is that this category represents the provided text.
        print(f"Confidence: {category.confidence}")

What's next

To search and filter code samples for other Google Cloud products, see the Google Cloud sample browser.