Supported features

This page describes the supported features and limitations for Document AI Warehouse.

Key features

Feature Description Supports
Manage access control Controls who has access to which resource in Document AI Warehouse, and what level of access they have.
Manage document schemas A document schema defines the structure for a document type (for example, Invoice or Pay Stub) in Document AI Warehouse, where admins can specify properties of different data types (Text | Numeric | Date | Enumeration).
Manage documents Provides operations to create, fetch, update, and delete documents. Document AI Warehouse uses documents as a data model to organize real world documents, for example, PDF or .txt and their associated properties.
Organize documents in folders A folder serves as a container to group and label documents. Users can attach a document to multiple folders and a folder can contain multiple documents.
Search documents
  • Full-text search (Text search)
    • It provides the capability to identify natural-language documents that satisfy a query and optionally to sort them by relevance to the query. Using Document AI Warehouse, customers can specify their query in string format in the search request.
  • Property filtering (Customer metadata filtering)
    • Mark a property filterable if you want to use that property to include or exclude a portion of documents for a search. For example, you might make a property that represents a "Vendor" filterable because your users want to search for invoices from a specific vendor.
Advanced search Document AI Warehouse provides a feature called "Custom Synonyms" that enables customers to provide their own synonyms for their specific domains

Files supported

Full details for formats supported and MIME types.

Format API supported UI manual upload UI render raw_document_file_type / content_category used
Joint Photographic Experts Group (jpeg/jpg) CONTENT_CATEGORY_IMAGE
Tag Image File Format (tif/tiff) Files should be uploaded manually as TIFF files RAW_DOCUMENT_FILE_TYPE_TIFF
Microsoft Word (doc/docx) Files should be uploaded manually as docx files. RAW_DOCUMENT_FILE_TYPE_DOCX
Microsoft Excel files (xls/xlsx) RAW_DOCUMENT_FILE_TYPE_XLSX
Microsoft PowerPoint files (ppt/pptx) RAW_DOCUMENT_FILE_TYPE_PPTX
Portable Document Format (pdf) RAW_DOCUMENT_FILE_TYPE_PDF
Plain text (txt) RAW_DOCUMENT_FILE_TYPE_TEXT
Portable Network Graphics (png) CONTENT_CATEGORY_IMAGE
Bitmap (bmp) CONTENT_CATEGORY_IMAGE
Graphics Interchange Format (gif) CONTENT_CATEGORY_IMAGE
Hypertext (html) RAW_DOCUMENT_FILE_TYPE_TEXT
XML (xml) RAW_DOCUMENT_FILE_TYPE_TEXT
Rich Text Format (rtf) RAW_DOCUMENT_FILE_TYPE_UNSPECIFIED

Provisioning

Feature Stable Regular Rapid
UI service
Google Cloud console

Working with documents

Feature Stable Regular Rapid
Upload documents through UI
Bulk upload

API client libraries

Client libraries for Document AI Warehouse help support writing custom code that integrates with Google Cloud. All services are accessible through the client libraries.

Library Stable Regular Rapid
Java
Python