Quota management best practices

This page describes best practices for managing Cloud Healthcare API quota. Use this page if your Google Cloud project has, or might have, a large amount of traffic and you need more quota than the Cloud Healthcare API provides by default.

Cloud Healthcare API default quotas

The default Cloud Healthcare API quotas aren't designed for all use cases, particularly if your Google Cloud project has a large amount of traffic. The Cloud Healthcare API doesn't automatically grow quota. You must plan and monitor your quota usage.

Best practices for monitoring and viewing quota

There are several methods for viewing your quota usage. When estimating and viewing quota for the Cloud Healthcare API, we recommend that you use the Service Quota Model. The model lets you accurately assess the available quota you have based on the following criteria:

  • Whether an admin override is present. A principal granted the Quota Administrator role in an organization can apply an admin override to quota in Google Cloud projects within the organization. An admin override supersedes default limits and producer overrides.
  • Whether a producer override is present. A service owner grants a producer override to a consumer of a service. Google Cloud is the service owner of the Cloud Healthcare API service. Any quota override that Google Cloud provides is a producer override.

  • Whether a consumer override is present. Someone who makes requests to the Cloud Healthcare API is a consumer of the Cloud Healthcare API service. You can apply consumer overrides for various situations, such as limiting quotas in your Google Cloud project as a cost-control measure to prevent going over your budget.

If you have any of these overrides in effect, you can compute your consumer quota limit to get an accurate assessment of your available quota.

Best practices for requesting additional quota

Google Cloud has procedures to request higher quota. To learn how quota increase requests are processed, see About quota increase requests.

Before requesting additional quota, ensure that you've implemented both of the following:

These implementations might reduce the amount of quota you require for the following reasons:

  • Both implementations spread load spikes out across several hours or minutes, rather than seconds.
  • Both implementations make efficient use of quota over a 24 hour period. If requests that significantly exceed the default quota are consistent over a 24 hour period, larger pools of resources can be allocated to the Cloud Healthcare API service. The additional allocation of resources is by request only and is determined on a case-by-case basis.
  • Consistent resource usage makes it simpler for Google Cloud to understand your quota requirements and provide you with the quota you need.

To manage your capacity and quota effectively, you need to know your organization's capacity requirements. If you're planning your capacity requirements and think that you'll need a large quota increase when your Google Cloud project is in production, request an increase from Google Cloud Customer Care. Customer Care can assist you with allocating and increasing quota during the testing and rollout phases of your Google Cloud project.

You don't need to have a paid Customer Care service to request a quota increase. Some quota increase requests are completed within 2-3 business days, but we recommend that you plan for longer. If your quota increase is large, it can take 10 business days or more for the quota increase request to be completed. Part of your planning must involve allocating time to respond to Customer Care to resolve any questions or open issues about the request. If you ensure that your initial quota increase request is sufficiently detailed, you might be able to reduce the time spent waiting for the request to be fulfilled.

Best practices for anticipating quota needs

Before your Google Cloud project goes into production, anticipate and plan for how much quota you will need. Planning your quota requirements prevents unexpected limiting of your resource consumption later.

The following sections explain what to consider when planning for quota.

Anticipate total usage for all data stores and clients

Understand your total usage across all Cloud Healthcare API data stores, and understand the total usage of all clients that make requests to your Google Cloud project.

  • Some Google Cloud projects implement multiple Cloud Healthcare API use cases. For example, your Google Cloud project might use multiple Cloud Healthcare API datasets and data stores for different types of data, thus increasing your total quota usage.
  • Quotas are enforced on a per-Google Cloud-project and per-region basis. Ensure that you have accurate measurements of your required quota across multiple regions. If you have multiple Google Cloud projects, you might need more accurate measurements across the projects. For more information on planning for quota per-region, see Anticipate per-region usage.
  • The Cloud Healthcare API doesn't load balance quota across clients, datasets, or data stores. The client must determine whether to implement a prioritization scheme to ensure that the most critical traffic doesn't encounter 429 RESOURCE_EXHAUSTED errors.
  • All get, search, create, update, and delete methods share a single quota, which means that user-facing applications can encounter 429 RESOURCE_EXHAUSTED errors if non-user-facing applications exhaust quota. For example, an application in your Google Cloud project that ingests data into the Cloud Healthcare API using create requests might use up a substantial amount of quota and cause 429 RESOURCE_EXHAUSTED errors in a user-facing application where a user is performing search operations.

Anticipate per-region usage

Cloud Healthcare API measures quotas at a per-Google Cloud-project and per-region basis. Quotas are typically measured per minute, which allows for small spikes of requests per second to balance out on a per-minute scale.

If your Google Cloud project uses multiple regions, you can set per-region quotas.

If your Cloud Healthcare API dataset is in the us multi-regional location, and you want to request additional quota, state in your quota request that the quota is for the "US meta region". The us multi-regional location consists of the following subregions:

  • us-central1
  • us-east1
  • us-west1

If you already have Cloud Healthcare API traffic using quota in any of the us- subregions, ensure that you take the existing traffic in those subregions into account when making a quota increase request for the us multi-region. For example, if you have datasets in us-central1 and us, and you request a quota increase in us, specify in your request that you have datasets in us-central1.

Favor low-volume transactions on a consistent basis

The following scenario explains the importance of sending smaller amounts of traffic on a consistent basis instead of sending high-volume transactions with a longer interval between transactions.

Traffic volume is calculated using the formula request payload * time = traffic volume. A high-volume transaction is one or more requests to the Cloud Healthcare API in a short interval that contain a large payload. A series of requests can also be considered high-volume if there are many requests sent over a short interval, regardless of the payload size.

Suppose that a client collects high-volume transactions and sends the transactions to the Cloud Healthcare API in a burst every five minutes. The following occurs:

  1. The initial burst of traffic consumes quota in the first minute (dependent on minute rollovers) until all quota is exhausted.
  2. Any remaining burst traffic receives 429 RESOURCE_EXHAUSTED errors. If configured, all affected requests encounter exponential backoff.
  3. Some percentage of requests that encountered the initial exponential backoff are rescheduled to be tried again in the next minute. Some requests are attempted multiple times in a single minute, and then are retried the next minute.
  4. If the request volume is high enough, retried requests might encounter 429 RESOURCE_EXHAUSTED errors and exponential backoff again. Certain bursts of traffic might encounter exponential backoff at different times, and the attempts to send traffic again might converge on the same minute in the future.
  5. If the request volume is still high, some traffic is retried when the next burst of traffic begins. The issue is exacerbated because more traffic is added to the existing backlog of requests. Your application might have difficulty maintaining the backlog of requests and sending them consistently to the Cloud Healthcare API.

This scenario shows the importance of knowing the volume of your traffic on a per-minute basis. Implement your traffic volume and backoffs to prevent network congestion and ensure that your application doesn't encounter many failures that require retries.

Review DICOM and FHIR quotas

The following sections describe the Cloud Healthcare API quotas associated with FHIR and DICOM stores and operations. See Quotas and limits for more information on Cloud Healthcare API quotas and data limitations.

DICOM quotas

The following table describes the Cloud Healthcare API quotas associated with DICOM stores and DICOM operations.

Metric name Display name Description
dicomweb_ops Number of DICOMweb operations per minute per region Includes the following methods:
  • All projects.locations.datasets.dicomStores.studies methods in v1beta1 and v1
  • All projects.locations.datasets.dicomStores.studies.series methods in v1beta1 and v1
  • All projects.locations.datasets.dicomStores.studies.series.instances methods in v1beta1 and v1
  • All projects.locations.datasets.dicomStores.studies.series.instances.frames methods in v1beta1 and v1
dicom_structured_storage_bytes Structured DICOM storage ingress in bytes per minute per region Structured bytes, in the form of DICOM tags and related metadata, sent to the Cloud Healthcare API while processing dicomweb_ops operations.
dicom_store_ops Number of DICOM store operations per minute per region Operations on the DICOM store, not DICOM data. Includes the following methods:
dicom_store_lro_ops Number of DICOM store long-running operations per minute per region Operations on the DICOM store, not DICOM data, that return a long-running operation. Includes the following methods:
dicom_structured_storage_operations_bytes Structured DICOM storage ingress for long-running operations in bytes per minute per region Structured bytes, in the form of DICOM tags and related metadata, sent to the Cloud Healthcare API while processing dicom_store_lro_ops operations.

FHIR quotas

The following table describes the Cloud Healthcare API quotas associated with FHIR stores and FHIR operations.

Metric name Display name Description
fhir_ops Number of FHIR operations per minute per region FHIR operations including individual operations in a FHIR bundle. A FHIR bundle containing 500 operations uses the same amount of quota as 500 individual FHIR operations. Includes all projects.locations.datasets.fhirStores.fhir methods in v1beta1 and v1.
fhir_storage_bytes FHIR storage ingress in bytes per minute per region Bytes sent to the Cloud Healthcare API while processing fhir_ops operations.
fhir_store_ops Number of FHIR store operations per minute per region Operations on the FHIR store, not FHIR data. Includes the following methods:
fhir_store_lro_ops Number of FHIR store long-running operations per minute per region Operations on the FHIR store, not FHIR data, that return a long-running operation. Includes the following methods:
fhir_storage_operation_bytes FHIR storage ingress for long-running operations in bytes per minute per region Bytes sent to the Cloud Healthcare API while processing fhir_store_lro_ops operations.

Quota management resources

For more information about planning and managing quota, see Manage capacity and quota.