Method: dataset.listDocuments

Full name: projects.locations.processors.dataset.listDocuments

Returns a list of documents present in the dataset.

HTTP request

POST https://{endpoint}/v1beta3/{dataset}:listDocuments

Where {endpoint} is one of the supported service endpoints.

Path parameters

Parameters

Parameters
`dataset`	`string` Required. The resource name of the dataset to be listed. Format: projects/{project}/locations/{location}/processors/{processor}/dataset It takes the form `projects/{project}/locations/{location}/processors/{processor}/dataset`.

dataset

string

Required. The resource name of the dataset to be listed. Format: projects/{project}/locations/{location}/processors/{processor}/dataset It takes the form projects/{project}/locations/{location}/processors/{processor}/dataset.

Request body

The request body contains data with the following structure:

JSON representation
{ "pageSize": integer, "pageToken": string, "filter": string, "returnTotalSize": boolean, "skip": integer }

Fields
`pageSize`	`integer` The maximum number of documents to return. The service may return fewer than this value. If unspecified, at most 20 documents will be returned. The maximum value is 100; values above 100 will be coerced to 100.
`pageToken`	`string` A page token, received from a previous `dataset.listDocuments` call. Provide this to retrieve the subsequent page. When paginating, all other parameters provided to `dataset.listDocuments` must match the call that provided the page token.
`filter`	`string` Optional. Query to filter the documents based on https://google.aip.dev/160. `SplitType=DATASET_SPLIT_TEST\|DATASET_SPLIT_TRAIN\|DATASET_SPLIT_UNASSIGNED` - `LabelingState=DOCUMENT_LABELED\|DOCUMENT_UNLABELED\|DOCUMENT_AUTO_LABELED` - `DisplayName=\"fileName.pdf\"` - `EntityType=abc/def` - `TagName=\"auto-labeling-running\"\|\"sampled\"` Note: - Only `AND`, `=` and `!=` are supported. e.g. `DisplayName=fileName AND EntityType!=abc` IS supported. - Wildcard `*` is supported only in `DisplayName` filter - No duplicate filter keys are allowed, e.g. `EntityType=a AND EntityType=b` is NOT supported. - String match is case sensitive (for filter `DisplayName` & `EntityType`).
`returnTotalSize`	`boolean` Optional. Controls if the request requires a total size of matched documents. See `ListDocumentsResponse.total_size`. Enabling this flag may adversely impact performance. Defaults to false.
`skip`	`integer` Optional. Number of results to skip beginning from the `pageToken` if provided. https://google.aip.dev/158#skipping-results. It must be a non-negative integer. Negative values will be rejected. Note that this is not the number of pages to skip. If this value causes the cursor to move past the end of results, `ListDocumentsResponse.document_metadata` and `ListDocumentsResponse.next_page_token` will be empty.

Response body

If successful, the response body contains data with the following structure:

JSON representation
{ "documentMetadata": [ { object (`DocumentMetadata`) } ], "nextPageToken": string, "totalSize": integer }

Fields

Fields
`documentMetadata[]`	`object (DocumentMetadata)` Document metadata corresponding to the listed documents.
`nextPageToken`	`string` A token, which can be sent as `ListDocumentsRequest.page_token` to retrieve the next page. If this field is omitted, there are no subsequent pages.
`totalSize`	`integer` Total count of documents queried.

documentMetadata[]

object (DocumentMetadata)

Document metadata corresponding to the listed documents.

nextPageToken

string

A token, which can be sent as ListDocumentsRequest.page_token to retrieve the next page. If this field is omitted, there are no subsequent pages.

totalSize

integer

Total count of documents queried.

Authorization scopes

Requires the following OAuth scope:

https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the dataset resource:

documentai.datasets.listDocuments

For more information, see the IAM documentation.

DocumentMetadata

Metadata about a document.

JSON representation
{ "documentId": { object (`DocumentId`) }, "pageCount": integer, "datasetType": enum (`DatasetSplitType`), "labelingState": enum (`DocumentLabelingState`), "displayName": string }

Fields
`documentId`	`object (DocumentId)` Document identifier.
`pageCount`	`integer` Number of pages in the document.
`datasetType`	`enum (DatasetSplitType)` Type of the dataset split to which the document belongs.
`labelingState`	`enum (DocumentLabelingState)` Labeling state of the document.
`displayName`	`string` The display name of the document.

DocumentLabelingState

Describes the labeling status of a document.

Enums
`DOCUMENT_LABELING_STATE_UNSPECIFIED`	Default value if the enum is not set.
`DOCUMENT_LABELED`	Document has been labeled.
`DOCUMENT_UNLABELED`	Document has not been labeled.
`DOCUMENT_AUTO_LABELED`	Document has been auto-labeled.