Security, CMEK, and audit-logs

Security overview

In order to ensure service security in Document AI, read the following questions applicable in these scenarios.

How does Google protect and ensure the security of the data I send to Document AI?

Refer to the Google Cloud Security page which describes the security measures in place for Google Cloud Services.

What security horizontals does Document AI support?

Document AI supports the following:

Security compliance

This section describes the questions related to compliance.

What compliance does Document AI offer?

Google Cloud undergoes regular independent third-party audits to verify alignment with security, privacy, and compliance controls. Google Cloud has regular audits for standards such as ISO 27001, ISO 27017, ISO 27018, SOC 2, SOC 3, and PCI DSS.

You can read more about Google Cloud compliance on the Compliance resource center

Is Document AI FedRAMP compliant?

Document AI is FedRAMP Moderate compliant.

Is Document AI HIPAA compliant?

Document AI is HIPAA compliant.

Security data usage

This section describes data inquiries.

Does Google use customer data to improve the model(s)?

No. Google does not use any of your content (such as documents and predictions) for any purpose except to provide you with the Document AI service. See the Document AI data usage policy.

At Google Cloud, we never use customer data to train our Document AI models.

For more information, see this blog post: Sharing our data privacy commitments for the AI era

In the future, will Google share the document I send to Document AI?

We won't make the document that you send available to the public, or share it with anyone else, except as necessary to provide the Document AI service. For example, sometimes we may need to use a third-party vendor to help us provide some aspect of our services, such as storage or transmission of data. Our vendors are under appropriate security and confidentiality contractual obligations. We don't share documents you send with other parties or make them public for any other purpose.

Will documents I send to Document AI, their results, or other information about requests, be stored on Google servers? If so, how long and where, and can I access it?

security.1

When you send a document to Document AIvia a batch request, we must store that document (encrypted with an ephemeral key, meaning that no human has access to it) for a short period of time in order to perform the analysis and return the results to you. For batch operations, the stored document is typically deleted immediately after the processing, with a failsafe Time to live (TTL) of one day. If the batch abends abnormally, the data may persist with a TTL of up to seven days.

Synchronous processes

security-2

For online (immediate response) operations, the document data (sent in the request) is processed in memory, encrypted in flight, and not persisted to disk. Google also temporarily logs some metadata about your Document AI API requests (such as the time the request was received and the size of the request) to improve our service and combat abuse.

For more information, see:

Does Google claim ownership of the content I send in the request to Document AI

Google does not claim any ownership in any of the content (including documents and predictions) that you transmit to Document AI. Documents and custom models are considered to be (private) customer data. We never use customer data to improve our models. In the rare circumstance where both parties agree to such an arrangement, an explicit data sharing agreement is crafted.

What is considered Personally Identifiable Information (PII) that needs to be redacted on documents before being shared with Google?

For document sharing purposes, PII is any information defined as personal identifiable data under applicable laws. Customers must redact the documents prior to sharing them with Google, for example when voluntarily done for technical support purposes to reproduce a problem.

Examples of PII include but are not limited to:

  1. Date of birth
  2. Names of individuals
  3. Personal address
  4. Email address of individuals
  5. Telephone number(s) of individuals
  6. Drivers license number
  7. National ID number
  8. Drivers license number
  9. Employer identification number
  10. Bank account information - account IDs, routing numbers, SWIFT IDs
  11. Payment card number
  12. Gender
  13. Ethnicity
  14. Usernames, ID number of third parties
  15. Passport number
  16. Marital status
  17. Number of allowances or exemptions
  18. Dependent names
  19. Vehicle identifiers (VIN, license plates, etc.)
  20. Any other unique identifying number, characteristic or code of an individual that could identify an individual consumer, family, or device over time or across services.

Can I resell the Document AI API?

No, you are not permitted to resell Document AI service. You can still integrate Document AI into applications of independent value.

How can customers control Google Cloud} support access to their documents or data?

All Document AI parsers support access transparency and access approvals. By default, Google support wouldn't have access to any of customer data or applications. In the situation where access is required from the Google support team, customers can use the Access Approvals process to authorize access to data or applications. This process starts with the creation of a ticket in the Google support portal. The customer then receive a notification (usually email) and an option to authorize or deny access.

Google also offers a service called Access Transparency which gives customer visibility into all the tasks that Google support performs while they have access to the system.

CMEK overview

By default, Google Cloud automatically encrypts data when it is at rest using encryption keys managed by Google.

If you have specific compliance or regulatory requirements related to the keys that protect your data, you can use customer-managed encryption keys (CMEK) for Document AI. Instead of Google managing the encryption keys that protect your data, your Document AI processor is protected using a key that you control and manage in Cloud Key Management Service (KMS).

This guide describes CMEK for Document AI. For more information about CMEK in general, including when and why to enable it, see the Cloud Key Management Service documentation.

Using CMEK

Encryption settings are available when you create a processor. To use CMEK, select the CMEK option and select a key.

security-3

The CMEK key is used for all data associated with the processor and its child resources. All customer-related data that is sent to the processor is automatically encrypted with the provided key before writing to disk.

Once a processor has been created, you cannot change its encryption settings. To use a different key, you must create a new processor.

External keys

You can use Cloud External Key Manager (EKM) to create and manage external keys to encrypt data within Google Cloud.

When you use a Cloud EKM key, Google has no control over the availability of your externally managed key. If you request access to a resource encrypted with an externally managed key, and the key is unavailable, then Document AI will reject the request. There can be a delay of up to 10 minutes before you can access the resource after the key becomes available.

For more considerations when using external keys, see EKM considerations.

CMEK supported resources

When storing any resource to disk, if any customer data is stored as part of the resource, Document AI first encrypts the contents using the CMEK key.

Resource Material Encrypted
Processor N/A - no user data. However, if you specify a CMEK key during processor creation then it must be valid.
ProcessorVersion All
Evaluation All

CMEK supported APIs

The APIs that use the CMEK key for encryption include the following:

Method Encryption
processDocument N/A - no data saved to disk.
batchProcessDocuments Data is temporarily stored on disk and encrypted using an ephemeral key (see CMEK compliance).
reviewDocument Documents pending review are stored in a Cloud Storage bucket encrypted using the provided KMS/CMEK key.
trainProcessorVersion Documents used for training are encrypted using the provided KMS/CMEK key.
evaluateProcessorVersion Evaluations are encrypted using the provided KMS/CMEK key.

API requests that access encrypted resources fail if the key is disabled or is unreachable. Examples include the following:

Method Decryption
getProcessorVersion Processor versions trained using customer data are encrypted. Access requires decryption.
processDocument Processing documents using an encrypted processor version requires decryption.
Import Documents Importing documents with auto-labeling enabled using an encrypted processor version requires decryption.

CMEK and Cloud Storage

APIs, such as batchProcess and reviewDocument, can read from and write to Cloud Storage buckets.

Any data written to Cloud Storage by Document AI is encrypted using the bucket's configured encryption key, which can be different than your processor's CMEK key.

For more information, see the CMEK documentation for Cloud Storage.

Audit logs

This document describes the audit logs created by Document AI as part of Cloud Audit Logs.

Overview

Google Cloud services write audit logs to help you answer the questions, "Who did what, where, and when?" within your Google Cloud resources.

Your Google Cloud projects contain only the audit logs for resources that are directly within the Google Cloud project. Other Google Cloud resources, such as folders, organizations, and billing accounts, contain the audit logs for the entity itself.

For a general overview of Cloud Audit Logs, see Cloud Audit Logs overview. For a deeper understanding of the audit log format, see Understand audit logs.

Available audit logs

The following types of audit logs are available for Document AI:

  • Admin Activity audit logs

    Includes "admin write" operations that write metadata or configuration information.

    You can't disable Admin Activity audit logs.

  • Data Access audit logs

    Includes "admin read" operations that read metadata or configuration information. Also includes "data read" and "data write" operations that read or write user-provided data.

    To receive Data Access audit logs, you must explicitly enable them.

For fuller descriptions of the audit log types, see Types of audit logs.

Audited operations

The following table summarizes which API operations correspond to each audit log type in Document AI:

Audit logs category Document AI operations
Admin Activity audit logs humanReviewConfigs.update
operations.cancel
processors.create
processors.delete
processors.disable
processors.enable
processors.setDefaultProcessorVersion
procesorVersions.create
procesorVersions.delete
procesorVersions.deploy
procesorVersions.undeploy
Data Access audit logs humanReviewConfigs.get
humanReviewConfigs.update
processors.batchProcess
processors.get
processors.list
processors.process
processorVersions.batchProcess
processorVersions.get
processorVersions.list
processorVersions.process

Audit log format

Audit log entries include the following objects:

  • The log entry itself, which is an object of type LogEntry. Useful fields include the following:

    • The logName contains the resource ID and audit log type.
    • The resource contains the target of the audited operation.
    • The timeStamp contains the time of the audited operation.
    • The protoPayload contains the audited information.
  • The audit logging data, which is an AuditLog object held in the protoPayload field of the log entry.

  • Optional service-specific audit information, which is a service-specific object. For earlier integrations, this object is held in the serviceData field of the AuditLog object; later integrations use the metadata field.

For other fields in these objects, and how to interpret them, review Understand audit logs.

Log name

Cloud Audit Logs log names include resource identifiers indicating the Google Cloud project or other Google Cloud entity that owns the audit logs, and whether the log contains Admin Activity, Data Access, Policy Denied, or System Event audit logging data.

The following are the audit log names, including variables for the resource identifiers:

   projects/PROJECT_ID/logs/cloudaudit.googleapis.com%2Factivity
   projects/PROJECT_ID/logs/cloudaudit.googleapis.com%2Fdata_access
   projects/PROJECT_ID/logs/cloudaudit.googleapis.com%2Fsystem_event
   projects/PROJECT_ID/logs/cloudaudit.googleapis.com%2Fpolicy

   folders/FOLDER_ID/logs/cloudaudit.googleapis.com%2Factivity
   folders/FOLDER_ID/logs/cloudaudit.googleapis.com%2Fdata_access
   folders/FOLDER_ID/logs/cloudaudit.googleapis.com%2Fsystem_event
   folders/FOLDER_ID/logs/cloudaudit.googleapis.com%2Fpolicy

   billingAccounts/BILLING_ACCOUNT_ID/logs/cloudaudit.googleapis.com%2Factivity
   billingAccounts/BILLING_ACCOUNT_ID/logs/cloudaudit.googleapis.com%2Fdata_access
   billingAccounts/BILLING_ACCOUNT_ID/logs/cloudaudit.googleapis.com%2Fsystem_event
   billingAccounts/BILLING_ACCOUNT_ID/logs/cloudaudit.googleapis.com%2Fpolicy

   organizations/ORGANIZATION_ID/logs/cloudaudit.googleapis.com%2Factivity
   organizations/ORGANIZATION_ID/logs/cloudaudit.googleapis.com%2Fdata_access
   organizations/ORGANIZATION_ID/logs/cloudaudit.googleapis.com%2Fsystem_event
   organizations/ORGANIZATION_ID/logs/cloudaudit.googleapis.com%2Fpolicy

Service name

Document AI audit logs use the service name documentai.googleapis.com.

For a list of all the Cloud Logging API service names and their corresponding monitored resource type, see Map services to resources.

Resource types

Document AI audit logs use the resource type audited_resource for all audit logs.

For a list of all the Cloud Logging monitored resource types and descriptive information, see Monitored resource types.

Caller identities

The IP address of the caller is held in the RequestMetadata.caller_ip field of the AuditLog object. Logging might redact certain caller identities and IP addresses.

For information about what information is redacted in audit logs, see Caller identities in audit logs.

Enable audit logging

Admin Activity audit logs are always enabled; you can't disable them.

Data Access audit logs are disabled by default and aren't written unless explicitly enabled (the exception is Data Access audit logs for BigQuery, which can't be disabled).

For information about enabling some or all of your Data Access audit logs, see Enable Data Access audit logs.

Permissions and roles

IAM permissions and roles determine your ability to access audit logs data in Google Cloud resources.

When deciding which Logging-specific permissions and roles apply to your use case, consider the following:

  • The Logs Viewer role (roles/logging.viewer) gives you read-only access to Admin Activity, Policy Denied, and System Event audit logs. If you have just this role, you cannot view Data Access audit logs that are in the _Default bucket.

  • The Private Logs Viewer role(roles/logging.privateLogViewer) includes the permissions contained in roles/logging.viewer, plus the ability to read Data Access audit logs in the _Default bucket.

    Note that if these private logs are stored in user-defined buckets, then any user who has permissions to read logs in those buckets can read the private logs. For more information about log buckets, see Routing and storage overview.

For more information about the IAM permissions and roles that apply to audit logs data, see Access control with IAM.

View logs

You can query for all audit logs or you can query for logs by their audit log name. The audit log name includes the resource identifier of the Google Cloud project, folder, billing account, or organization for which you want to view audit logging information. Your queries can specify indexed LogEntry fields, and if you use the Log Analytics page, which supports SQL queries, then you can view your query results as a chart.

For more information about querying your logs, see the following pages:

You can view audit logs in Cloud Logging by using the Google Cloud console, the Google Cloud CLI, or the Logging API.

Console

In the Google Cloud console, you can use the Logs Explorer to retrieve your audit log entries for your Google Cloud project, folder, or organization:

  1. In the navigation panel of the Google Cloud console, select Logging, and then select Logs Explorer:

    Go to Logs Explorer

  2. Select an existing Google Cloud project, folder, or organization.

  3. To display all audit logs, enter either of the following queries into the query-editor field, and then click Run query:

    logName:"cloudaudit.googleapis.com"
    
    protoPayload."@type"="type.googleapis.com/google.cloud.audit.AuditLog"
    
  4. To display the audit logs for a specific resource and audit log type, in the Query builder pane, do the following:

    • In Resource type, select the Google Cloud resource whose audit logs you want to see.

    • In Log name, select the audit log type that you want to see:

      • For Admin Activity audit logs, select activity.
      • For Data Access audit logs, select data_access.
      • For System Event audit logs, select system_event.
      • For Policy Denied audit logs, select policy.
    • Click Run query.

    If you don't see these options, then there aren't any audit logs of that type available in the Google Cloud project, folder, or organization.

    If you're experiencing issues when trying to view logs in the Logs Explorer, see the troubleshooting information.

    For more information about querying by using the Logs Explorer, see Build queries in the Logs Explorer. For information about summarizing log entries in the Logs Explorer by using Gemini, see Summarize log entries with Gemini assistance.

gcloud

The Google Cloud CLI provides a command-line interface to the Logging API. Supply a valid resource identifier in each of the log names. For example, if your query includes a PROJECT_ID, then the project identifier you supply must refer to the currently selected Google Cloud project.

To read your Google Cloud project-level audit log entries, run the following command:

gcloud logging read "logName : projects/PROJECT_ID/logs/cloudaudit.googleapis.com" \
    --project=PROJECT_ID

To read your folder-level audit log entries, run the following command:

gcloud logging read "logName : folders/FOLDER_ID/logs/cloudaudit.googleapis.com" \
    --folder=FOLDER_ID

To read your organization-level audit log entries, run the following command:

gcloud logging read "logName : organizations/ORGANIZATION_ID/logs/cloudaudit.googleapis.com" \
    --organization=ORGANIZATION_ID

To read your Cloud Billing account-level audit log entries, run the following command:

gcloud logging read "logName : billingAccounts/BILLING_ACCOUNT_ID/logs/cloudaudit.googleapis.com" \
    --billing-account=BILLING_ACCOUNT_ID

Add the --freshness flag to your command to read logs that are more than 1 day old.

For more information about using the gcloud CLI, see gcloud logging read.

API

When building your queries, supply a valid resource identifier in each of the log names. For example, if your query includes a PROJECT_ID, then the project identifier you supply must refer to the currently selected Google Cloud project.

For example, to use the Logging API to view your project-level audit log entries, do the following:

  1. Go to the Try this API section in the documentation for the entries.list method.

  2. Put the following into the Request body part of the Try this API form. Clicking this prepopulated form automatically fills the request body, but you need to supply a valid PROJECT_ID in each of the log names.

    {
      "resourceNames": [
        "projects/PROJECT_ID"
      ],
      "pageSize": 5,
      "filter": "logName : projects/PROJECT_ID/logs/cloudaudit.googleapis.com"
    }
    
  3. Click Execute.

For example, to view all the project-level audit logs for Document AI, use the following query, supplying a valid resource identifier in each of the log names:

logName=("projects/PROJECT_ID/logs/cloudaudit.googleapis.com%2Factivity"
OR "projects/PROJECT_ID/logs/cloudaudit.googleapis.com%2Fdata_access"
OR "projects/PROJECT_ID/logs/cloudaudit.googleapis.com%2Fsystem_event"
OR "projects/PROJECT_ID/logs/cloudaudit.googleapis.com%2Fpolicy")
protoPayload.serviceName="documentai.googleapis.com"

Route audit logs

You can route audit logs to supported destinations in the same way that you can route other kinds of logs. Here are some reasons you might want to route your audit logs:

  • To keep audit logs for a longer period of time or to use more powerful search capabilities, you can route copies of your audit logs to Cloud Storage, BigQuery, or Pub/Sub. Using Pub/Sub, you can route to other applications, other repositories, and to third parties.

  • To manage your audit logs across an entire organization, you can create aggregated sinks that can route logs from any or all Google Cloud projects in the organization.

  • If your enabled Data Access audit logs are pushing your Google Cloud projects over your log allotments, you can create sinks that exclude the Data Access audit logs from Logging.

For instructions about routing logs, see Route logs to supported destinations.

Pricing

For more information about pricing, see Cloud Logging pricing summary.