Method: projects.locations.models.batchPredict

Perform a batch prediction and return the id of a long-running operation. You can request the operation result by using the operations.get method. When the operation has completed, you can call operations.get to retrieve a BatchPredictResult from the response field.

Only available for AutoML Natural Language Entity Extraction

HTTP request

POST https://automl.googleapis.com/v1beta1/{name}:batchPredict

Path parameters

Parameters
name

string

Name of the model requested to serve the batch prediction.

Authorization requires the following Google IAM permission on the specified resource name:

  • automl.models.predict

Request body

The request body contains data with the following structure:

JSON representation
{
  "inputConfig": {
    object(BatchPredictInputConfig)
  },
  "outputConfig": {
    object(BatchPredictOutputConfig)
  },
  "params": {
    string: string,
    ...
  }
}
Fields
inputConfig

object(BatchPredictInputConfig)

Required. The input configuration for batch prediction.

outputConfig

object(BatchPredictOutputConfig)

Required. The Configuration specifying where output predictions should be written.

params

map (key: string, value: string)

Additional domain-specific parameters for the predictions, any string must be up to 25000 characters long.

See Analyzing entities for more details.

Response body

If successful, the response body contains an instance of Operation.

Authorization Scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

BatchPredictInputConfig

Input configuration for models.batchPredict Action.

Only available for AutoML Natural Language Entity Extraction

See Preparing your training data for more information.

The format of input depends on dataset_metadata the Dataset into which the import is happening has. As input source the gcsSource is expected, unless specified otherwise. If a file with identical content (even if it had different GCS_FILE_PATH) is mentioned multiple times , then its label, bounding boxes etc. are appended. The same file should be always provided with the same ML_USE and GCS_FILE_PATH, if it is not then these values are nondeterministically selected from the given ones.

The formats are represented in EBNF with commas being literal and with non-terminal symbols defined near the end of this comment. The formats are:

A CSV file(s) with each line in format:

ML_USE,GCS_FILE_PATH
  • ML_USE - Identifies the data set that the current row (file) applies to. This value can be one of the following:

    • TRAIN - Rows in this file are used to train the model.
    • TEST - Rows in this file are used to test the model during training.
    • UNASSIGNED - Rows in this file are not categorized. They are Automatically divided into train and test data. 80% for training and 20% for testing.
  • GCS_FILE_PATH - a Identifies JSON Lines (.JSONL) file stored in Google Cloud Storage that contains in-line text in-line as documents for model training.

After the training data set has been determined from the TRAIN and UNASSIGNED CSV files, the training data is divided into train and validation data sets. 70% for training and 30% for validation.

For example:

TRAIN,gs://folder/file1.jsonl
VALIDATE,gs://folder/file2.jsonl
TEST,gs://folder/file3.jsonl

For a single call to the models.batchPredict method, you can only use either in-line JSONL files, or JSONL files that reference documents.

In-line JSONL files

In-line .JSONL files contain, per line, a JSON document that wraps a textSnippet field followed by one or more annotations fields, which have displayName and textExtraction fields to describe the entity from the text snippet. Multiple JSON documents can be separated using line breaks (\n).

The supplied text must be annotated exhaustively. For example, if you include the text "horse", but do not label it as "animal", then "horse" is assumed to not be an "animal".

Any given text snippet content must have 30,000 characters or less, and also be UTF-8 NFC encoded. ASCII is accepted as it is UTF-8 NFC encoded.

For example:

{
  "textSnippet": {
    "content": "dog car cat"
  },
  "annotations": [
     {
       "displayName": "animal",
       "textExtraction": {
         "textSegment": {"startOffset": 0, "endOffset": 2}
       }
     },
     {
       "displayName": "vehicle",
       "textExtraction": {
         "textSegment": {"startOffset": 4, "endOffset": 6}
       }
     },
     {
       "displayName": "animal",
       "textExtraction": {
         "textSegment": {"startOffset": 8, "endOffset": 10}
       }
     }
  ]
}\n
{
   "textSnippet": {
     "content": "This dog is good."
   },
   "annotations": [
      {
        "displayName": "animal",
        "textExtraction": {
          "textSegment": {"startOffset": 5, "endOffset": 7}
        }
      }
   ]
}
JSONL Files that reference documents

.JSONL files contain, per line, a JSON document that wraps a inputConfig that contains the path to a source document. Multiple JSON documents can be separated using line breaks (\n).

For example:

{
  "document": {
    "inputConfig": {
      "gcsSource": { "inputUris": [ "gs://folder/document1.pdf" ]
      }
    }
  }
}\n
{
  "document": {
    "inputConfig": {
      "gcsSource": { "inputUris": [ "gs://folder/document2.pdf" ]
      }
    }
  }
}

Errors:

If any of the provided CSV files can't be parsed or if more than certain percent of CSV rows cannot be processed then the operation fails and nothing is imported. Regardless of overall success or failure the per-row failures, up to a certain count cap, will be listed in Operation.metadata.partial_failures.

See Analyzing entities for more information.

JSON representation
{
  "gcsSource": {
    object(GcsSource)
  }
}
Fields
gcsSource

object(GcsSource)

The Google Cloud Storage location for the input content.

BatchPredictOutputConfig

Output configuration for models.batchPredict action.

AutoML Natural Language creates a directory specified in the gcsDestination. The name of the directory is "prediction-<model-display-name>-<timestamp-of-prediction-call>", where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format.

AutoML Natural Language creates a file named text_extraction_n.jsonl, in the new directory where "n" is a number from 1 to the number of annotation files.

The contents of each .JSONL file depends on whether the input was in-line text, or references to documents.

  • If the input was in-line text, then each .JSONL file contains, per line, a JSON document with the supplied text in request text snippet's "id" : "" followed by a list of zero or more annotations, with the entity analysis in the textExtraction field. A single text snippet is listed only once with all of its annotations, and its annotations will never be split across files.

  • If input used documents, then each .JSONL file will contain, per line, a JSON representation of a proto that wraps given in request document proto, followed by its OCR-ed representation in the form of a text snippet, finally followed by a list of zero or more AnnotationPayload protos (called annotations), which have textExtraction detail populated and refer, via their indices, to the OCR-ed text snippet. A single document (and its text snippet) will be listed only once with all its annotations, and its annotations will never be split across files.

If prediction for any text snippet failed (partially or completely), then additional errors_1.jsonl, errors_2.jsonl,..., errors_N.jsonl files will be created (N depends on total number of failed predictions). These files will have a JSON representation of a proto that wraps either the "id" : "" (in case of inline) or the document proto (in case of document) but here followed by exactly one google.rpc.Status containing only code and message.

JSON representation
{
  "gcsDestination": {
    object(GcsDestination)
  }
}
Fields
gcsDestination

object(GcsDestination)

The Google Cloud Storage location of the directory where the output is to be written to.

Was this page helpful? Let us know how we did:

Send feedback about...

AutoML Natural Language Entity Extraction