Method: dataset.listDocuments

Full name: projects.locations.processors.dataset.listDocuments

Returns a list of documents present in the dataset.

HTTP request

POST https://{endpoint}/v1beta3/{dataset}:listDocuments

Where {endpoint} is one of the supported service endpoints.

Path parameters

Parameters
dataset

string

Required. The resource name of the dataset to be listed. Format: projects/{project}/locations/{location}/processors/{processor}/dataset It takes the form projects/{project}/locations/{location}/processors/{processor}/dataset.

Request body

The request body contains data with the following structure:

JSON representation
{
  "pageSize": integer,
  "pageToken": string,
  "filter": string,
  "returnTotalSize": boolean,
  "skip": integer
}
Fields
pageSize

integer

The maximum number of documents to return. The service may return fewer than this value. If unspecified, at most 20 documents will be returned. The maximum value is 100; values above 100 will be coerced to 100.

pageToken

string

A page token, received from a previous dataset.listDocuments call. Provide this to retrieve the subsequent page.

When paginating, all other parameters provided to dataset.listDocuments must match the call that provided the page token.

filter

string

Optional. Query to filter the documents based on https://google.aip.dev/160.

SplitType=DATASET_SPLIT_TEST|DATASET_SPLIT_TRAIN|DATASET_SPLIT_UNASSIGNED - LabelingState=DOCUMENT_LABELED|DOCUMENT_UNLABELED|DOCUMENT_AUTO_LABELED - DisplayName=\"fileName.pdf\" - EntityType=abc/def - TagName=\"auto-labeling-running\"|\"sampled\"

Note: - Only AND, = and != are supported. e.g. DisplayName=fileName AND EntityType!=abc IS supported. - Wildcard * is supported only in DisplayName filter - No duplicate filter keys are allowed, e.g. EntityType=a AND EntityType=b is NOT supported. - String match is case sensitive (for filter DisplayName & EntityType).

returnTotalSize

boolean

Optional. Controls if the request requires a total size of matched documents. See ListDocumentsResponse.total_size.

Enabling this flag may adversely impact performance.

Defaults to false.

skip

integer

Optional. Number of results to skip beginning from the pageToken if provided. https://google.aip.dev/158#skipping-results. It must be a non-negative integer. Negative values will be rejected. Note that this is not the number of pages to skip. If this value causes the cursor to move past the end of results, ListDocumentsResponse.document_metadata and ListDocumentsResponse.next_page_token will be empty.

Response body

If successful, the response body contains data with the following structure:

JSON representation
{
  "documentMetadata": [
    {
      object (DocumentMetadata)
    }
  ],
  "nextPageToken": string,
  "totalSize": integer
}
Fields
documentMetadata[]

object (DocumentMetadata)

Document metadata corresponding to the listed documents.

nextPageToken

string

A token, which can be sent as ListDocumentsRequest.page_token to retrieve the next page. If this field is omitted, there are no subsequent pages.

totalSize

integer

Total count of documents queried.

Authorization scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the dataset resource:

  • documentai.datasets.listDocuments

For more information, see the IAM documentation.

DocumentMetadata

Metadata about a document.

JSON representation
{
  "documentId": {
    object (DocumentId)
  },
  "pageCount": integer,
  "datasetType": enum (DatasetSplitType),
  "labelingState": enum (DocumentLabelingState),
  "displayName": string
}
Fields
documentId

object (DocumentId)

Document identifier.

pageCount

integer

Number of pages in the document.

datasetType

enum (DatasetSplitType)

Type of the dataset split to which the document belongs.

labelingState

enum (DocumentLabelingState)

Labeling state of the document.

displayName

string

The display name of the document.

DocumentLabelingState

Describes the labeling status of a document.

Enums
DOCUMENT_LABELING_STATE_UNSPECIFIED Default value if the enum is not set.
DOCUMENT_LABELED Document has been labeled.
DOCUMENT_UNLABELED Document has not been labeled.
DOCUMENT_AUTO_LABELED Document has been auto-labeled.