This page describes best practices for optimizing data throughput when ingesting data into the Cloud Healthcare API. These recommendations are for technical practitioners with experience in managing data throughput for large-scale systems.
Data throughput
Data throughput is the amount of resources, such as FHIR resources or DICOM instances, or bytes that the Cloud Healthcare API ingests each second.
Data throughput constraints
The following list describes reasons why data throughput might be constrained:
- You didn't plan for large volume requests that cause traffic spikes.
- Bandwidth constraints slow the ingestion of large data volumes sent in a short amount of time.
- Multiple concurrent transactions change the same Cloud Healthcare API resource which causes data contention.
- Too many small requests are being made. For more information, see Avoid small import and export requests.
- Too many long-running operations (LROs) run concurrently and bandwidth is limited.
- Too many LROs are scheduled at the same time which leads to failures.
Retry failed requests
If a client quickly and repeatedly retries requests after failures, it can exceed Cloud Healthcare API quotas. The following sections describe how to efficiently retry failed requests.
Use exponential backoff with jitter and persistent retry queues
Exponential backoff with introduced jitter is a standard error-handling strategy for network applications. A client periodically retries failed requests with exponentially increasing delays between retries and a small, random delay.
Ensure your exponential backoff implementation is idempotent for each retry, especially if you're using custom logic to bypass failure conditions. See 9.2.2 Idempotent Methods in the HTTP specification for more information.
Most programming languages offer libraries to simplify implementing exponential backoff and similar retry strategies. For long-term or multi-process retries, implement a persistent retry queue. This queue can reset the retry mechanism if you exceed the maximum backoff time.
Use exponential backoff when retrying these requests:
- Operations that modify a FHIR resource or bundle of FHIR resources.
Synchronous LRO requests. Retry if there's an error when the LRO starts or if the LRO fails.
LROs have unique errors that might require you to implement the following retry strategies:
- Use a separate bundle to store data that failed an import or create operation.
- Use synchronous requests for data that failed to process.
Example exponential backoff algorithm
An exponential backoff algorithm retries requests exponentially, increasing the waiting time between retries up to a maximum backoff time. The following algorithm implements truncated exponential backoff with jitter:
Send a request to the Cloud Healthcare API.
If the request fails, wait 1 +
random-fraction
seconds, then retry the request.If the request fails, wait 2 +
random-fraction
seconds, then retry the request.If the request fails, wait 4 +
random-fraction
seconds, then retry the request.Continue this pattern, waiting 2n +
random-fraction
seconds after each retry, up to amaximum-backoff
time.After
deadline
seconds, stop retrying the request.
Use the following values as you implement the algorithm:
Before each retry, the wait time is
min((2n + random-fraction), maximum-backoff)
, withn
starting at 0 and incremented by 1 for each retry.Replace
random-fraction
with a random fractional value less than or equal to 1. Use a different value for each retry. Adding this random value prevents clients from becoming synchronized and sending many retries at the same time.Replace
maximum-backoff
with the maximum amount of time, in seconds, to wait between retries. Typical values are 32 or 64 (25 or 26) seconds. Choose the value that works best for your use case.Replace
deadline
with the maximum number of seconds to keep sending retries. Choose a value that reflects your use case.
The client can retry after reaching the maximum-backoff
time using the same
value as the backoff. For example, if the maximum-backoff
time is 64 seconds,
retry every 64 seconds. Ensure that the client doesn't retry indefinitely.
Implement client-side rate limiting with traffic shaping
Rate limiting protects large-scale systems by preventing them from being overwhelmed by excessive requests. If client-side rate limiting isn't sufficient, the Cloud Healthcare API quota system might restrict data throughput. For more information, see Best practices for quota management.
If you have additional requirements, like guaranteed delivery across retries, the strategies in Retry failed requests might be insufficient. Traffic shaping is a rate-limiting technique that keeps the rate of client-side requests within bandwidth constraints. This spreads load spikes across hours or minutes which improves throughput. When quota is constrained, traffic shaping can achieve higher throughput than using retries alone because it avoids pushback and tracks worker units.
You can implement traffic shaping for synchronous create, delete,
update, and delete (CRUD) operations, including
fhir.executeBundle
.
Traffic shaping requirements
To implement traffic shaping, your system must implement the following:
- A storage-backed processing queue with redundancy to avoid disk failure.
- Coordinated workers to pull from the processing queue.
- Overall use detection to adjust the number of workers and their processing speed based on quota limits.
- Disaster recovery for the storage-backed processing queue. If there's a disaster, your system must be able to purge or recover the queue.
- Reduced LROs during peak hours. For more information, see Plan and use quota efficiently and Queue and manage LROs.
In the following cases, traffic shaping might only be required for a single pipeline stage:
- Limiting the number of workers pulling from a previous pipeline step.
- Limiting each worker individually.
- Using a worker pool coordinator to adjust the rate at which individual units of work, such as queries per second (QPS) or ingested bytes per second, are processed.
Implement rate limiting in other areas of your system
You can use existing programming languages and frameworks to implement traffic shaping. Consider the following open source projects and prebuilt solutions:
Client-side throttling in Apache Beam. See Horizontal autoscaling for information on how to control throttling using the
numWorkers
andmaxNumWorkers
flags.The Java
RateLimiter
class from the Google Guava set of core Java libraries.The Python
ratelimiter
module.
For flow control, use the high-level Pub/Sub client library.
Choose between asynchronous and synchronous processing
A client-side proxy layer that wraps requests to the Cloud Healthcare API, shown in Handle errors at multiple layers, can also control throttling across services that use the Cloud Healthcare API. Depending on the type of traffic shaping required, use one of these options:
- Asynchronous
- Use asynchronous processing to queue requests and control workers.
A proxy layer writes incoming requests to the queue and
returns
200 OK
responses after each request is queued. This works best for write requests, but can be used for read requests in an LRO framework if clients can receive read results. - Synchronous
Synchronous processing provides a simple feedback mechanism if a unit of work depends on a previous unit finishing. A proxy layer delays outbound requests based on QPS or byte throughput limits, and the client blocks and waits for the proxy layer's response.
The proxy layer can adjust its rate limiting based on the number of instances, or it can coordinate with a controller process that adjusts the rate limit every few seconds. For the proxy layer to track the number of instances and their rate limits, each proxy instance can regularly read a file or make a remote procedure call (RPC) with the rate limits encoded.
Synchronous processing sometimes has the following disadvantages:
Resources in the client and proxy layers are unavailable while the client blocks and waits for a response. This can lead to errors, timeouts, and lowered data throughput, making it harder to scale.
If the client and proxy layer disconnect, more work is required to ensure data was modified as requested.
Use Cloud Tasks
Use Cloud Tasks to offload requests to a queue. Cloud Tasks automatically sets and monitors the following Google Cloud quotas:
- Max burst size and max request concurrency using the
RateLimits
object - Retry limits using the
RetryConfig
object
See Create queues to create
queues in Cloud Tasks. The Queue
resource shows the options you can set on a queue. For
example, you can use the RetryConfig
object to implement exponential backoff.
See Cloud Tasks client libraries
for language-specific libraries.
When using Cloud Tasks, consider the following:
- Cloud Tasks doesn't guarantee exactly once delivery. In exactly once delivery, any requests containing duplicated data are recognized as duplicates and ignored by the server. For more information, see After Lambda: exactly once processing in Dataflow, Part 1.
- The maximum task size might be much smaller than the maximum FHIR bundle size in the Cloud Healthcare API. For more information, see Cloud Tasks quotas and limits and Cloud Healthcare API quotas and limits.
- Cloud Tasks has issues and limitations.
Combine FHIR bundles with rate limiters
Retrying FHIR bundles with exponential backoff and rate limiters helps to maintain high data throughput and manage load spikes.
A client can send batch and transaction FHIR bundles to Cloud Tasks, which sends the requests in the bundle to the Cloud Healthcare API. If the rate limiter is full or over quota because it reached its maximum queue size and ran out of disk space, the client can implement exponential backoff to queue the bundles.
Prevent the rate limiter queue from becoming full by monitoring these resources:
- FHIR operation quotas in the Cloud Healthcare API
- Rate limiter quotas
- Rate limiter errors
If the rate limiter queue becomes full, your system must alert a human and stop the client from sending requests.
Use HTTP persistent (reusable keep-alive) connections
By default, the Cloud Healthcare API opens a new TCP connection for each CRUD request. This requires a TCP handshake, which can cause overhead and degrade performance. To improve performance, use HTTP keep-alive to keep the TCP connection open for multiple requests.
To use HTTP keep-alive in HTTP/1.1, set the Connection
header to keep-alive
:
Connection: keep-alive
HTTP/2 uses one TCP connection for sequential and concurrent requests, which avoids the overhead automatically.
The Python
requests
library uses HTTP keep-alive by default. If you're using Node.js, set
keepAlive
to true
when you create an http.Agent
object, and then pass the object
in your request.
Use a testing framework
A testing framework ensures your code works and helps you do the following:
- Prepare for sudden traffic spikes in an application or pipeline.
- Test whether exponential backoff and client-side rate limiting improve performance. Tests can show if these implementations create a backlog of tasks that must be handled separately.
- Separate and control high-priority traffic. For example, if a user is waiting for a response, the workload on background processing tasks can be reduced to ensure the user experience isn't degraded.
- Test synchronous and asynchronous queueing strategies for regulating traffic flow, or test if the proxy layer handles pushback.
- Plan for disaster recovery. This typically requires resetting incoming traffic or using queues to resume traffic after the disaster ends.
Use Cloud Monitoring
Use Cloud Monitoring to monitor your test and production environments. Follow these recommendations:
- Integrate Cloud Tasks with other Google Cloud logging and monitoring services, such as Cloud Audit Logs.
- Create custom metrics with the Cloud Monitoring API to track key metrics like retries, queue sizes, and queue age.
- Create service-level objectives (SLOs) and service-level indicators (SLIs) for your environments. See Introduction to SLIs for recommendations.
- Create alerting policies using Google Cloud Observability. Alerting policies notify you of issues like if your system is under stress or requires human intervention.
- Create operational playbooks so that system administrators know what to do if an alerting policy sends a notification.
Use the operational playbooks in a staging environment to respond to the following scenarios:
- Backlogs caused by rate limiting
- Pushback caused by exceeding quota limits
- Incoming traffic spikes
Prevent 429 Resource Exhausted operation_too_costly
errors
Making thousands of parallel updates each day to a FHIR resource can cause
lock contention, latency, and prevents
transactions from completing. Transactions that can't complete can create a backlog
of 429 Resource Exhausted operation_too_costly
errors:
HTTP/1.1 429 Too many requests ... { "issue": [ { "code": "too-costly", "details": { "text": "operation_too_costly" }, "diagnostics": "aborted due to lock contention while executing transactional bundle. Resource type: FHIR_RESOURCE_TYPE", "severity": "error" } ], "resourceType": "OperationOutcome" }
In the error, "cost" refers to resource usage and data throughput, not billing costs.
A 429 Too Many Requests
error doesn't always indicate a quota issue. The
error can occur when the Cloud Healthcare API FHIR server detects excessive lock
contention on database records. This can happen due to many operations in
a FHIR bundle or a combination of CRUD operations.
Consider the following scenario:
- A FHIR transaction bundle that updates a Patient resource and other FHIR resources locks the Patient resource until the transaction finishes.
Multiple FHIR bundles try to update the Patient resource in parallel, and lock contention occurs. Error responses include a
diagnostics
field with the textResource type: PATIENT
.You can retry updating the Patient resource with exponential backoff, but long lock contention periods can lead to timeouts, reduced throughput, and increased resource usage.
The Cloud Healthcare API FHIR server eventually detects a backlog of transactions and load-sheds by returning
operation_too_costly
errors. This limits traffic and prevents further errors.The
operation_too_costly
errors throttle all FHIR CRUD operations in your Google Cloud project, which affects all applications connected to your project.
Troubleshoot 429 Too Many Requests
errors
To troubleshoot 429 Too Many Requests
errors, search Cloud Logging.
Errors containing operation_too_costly
indicate lock contention.
If the errors are caused by resource exhaustion, check for quota issues.
If throttling occurs, transaction bundles might fail due to high levels of lock contention and produce the following error:
HTTP/1.1 429 Too many requests
...
{
"issue": [
{
"code": "too-costly",
"details": {
"text": "operation_too_costly"
},
"diagnostics": "aborted due to cumulative heavy load or lock contention in this project while executing transactional bundle, please see https://cloud.google.com/healthcare-api/docs/troubleshooting#fhir_transaction_bundle_heavy_load for more information",
"severity": "error"
}
],
"resourceType": "OperationOutcome"
}
To troubleshoot the error, go to the FHIR transactional bundle aborted due to cumulative heavy load
link in the diagnostics
field.
Avoid large bundles
The 429 Too Many Requests
error is more likely with large transaction
bundles. Bundles of any size can create throughput bottlenecks. Test different
bundles to find the optimal size.
Large bundles with retries can have diminishing performance returns and are more susceptible to having multiple failures. Clients should implement additional logic to manage the subset of FHIR resources that failed in a transaction.
Batch bundles can encounter 429 Too Many Requests
and 413 Request Entity Too Large
errors and throughput bottlenecks if they're large
or have high QPS.
Avoid using large bundles with thousands of transactions. Instead, do the following:
- Use smaller transaction bundles that support data consistency. If FHIR resources don't depend on each other, update them separately. For example, a FHIR resource might not depend on the specific version of another resource in the same bundle.
- Use some batching in bundles and avoid individual requests. Batching can improve performance, but large batches can cause errors and degrade data throughput. Similar-sized batch bundles have less contention because they don't hold locks across FHIR resource updates.
Small transaction bundles avoid contention because they only hold a few locks at a time and finish quickly. This helps prevent a backlog of stacked transactions.
LRO throughput
See LRO data throughput.
FHIR data storage options
If your FHIR data volume is small to moderate, use
fhir.create
to store data. To store large volumes of FHIR resources, use fhir.executeBundle
or fhirStores.import
. For information on each method,
see FHIR import options.
Import FHIR resources
Consider the following when deciding whether to use FHIR import:
FHIR import doesn't limit the total size of the data it imports. If a FHIR bundle exceeds 50 MB, you can upload the FHIR resources to Cloud Storage and import them. Avoid concurrent high-latency or large imports, or data throughput might be limited.
FHIR import has less complexity than using FHIR bundles. For example, you don't have to do the following:
- Partition large bundles into smaller ones
- Manage schedules
- Retry transient errors at the resource or bundle level
FHIR import doesn't enforce referential integrity. For more information, see FHIR referential integrity.
Don't use FHIR import when data freshness is a high priority. Imports can be fast, but could be delayed for hours or days.
FHIR imports perform better when there are few LROs in your Google Cloud project.
FHIR import can achieve high data throughput if your application can handle bulk errors and failures on a subset of resources.
Use FHIR bundles
Use FHIR bundles instead of FHIR import in the following cases:
It's too expensive, in either billing costs or network bandwidth, to build a pipeline to store data in Cloud Storage and import it.
Referential integrity must be enforced.
FHIR profile validation must be enforced.
You need to send Pub/Sub notifications when FHIR resources are stored. FHIR import doesn't support Pub/Sub notifications.
Data freshness is a priority and data must be ingested in seconds or minutes. However, even in a well-architected system, data throughput can be constrained by the following:
- Upstream delays in processing pipelines. Pipelines might need more time to prepare data before the data can be ingested.
- Backoffs, retries, and traffic shaping proxy layers.
FHIR bundles have the following limitations:
Quota and billing are applied to each operation in the bundle as if each operation was executed independently. For example, if a bundle has 10
POST
operations, 5GET
operations, and 1DELETE
operation, the quota and billing applied to the bundle is the same as if those operations were executed independently.Large transaction bundles are more likely to have transaction conflicts that lead to lock contention. For more information, see Prevent
429 Resource Exhausted operation_too_costly
errors.Batch bundles can improve data throughput, but don't have transactional consistency capabilities like referential integrity.
Large batch bundles can have reduced throughput. For more information, see Avoid large bundles.
DICOM data storage options
You can use the following methods to achieve high data throughput when sending data from a Picture Archiving and Communication System (PACS) to the Cloud Healthcare API:
- The open source Cloud Healthcare API DICOM adapter using the DICOM message service element (DIMSE) protocol
The adapter optimizes data throughput when you sync a PACS with the Cloud Healthcare API. Before syncing, run performance tests and verify that the adapter can sustain peak data throughput.
Use this adapter if you can't upload DICOM files to Cloud Storage using Storage Transfer Service or another transfer option. For example, you might be unable to meet these Storage Transfer Service requirements:
- Mounting a file system to every machine that hosts agents in your agent pool to retrieve source data.
- If you transfer data at a regular interval instead of a one-time batch load, you must measure changes to the data's size over time to determine what's changed.
- Maximizing agent transfer performance.
- Paying for and allocating Cloud Storage storage.
- Validating data transfers to Cloud Storage.
- Removing Cloud Storage resources after you import data to the Cloud Healthcare API and fix any import errors.
- Scheduling batch ingestion intervals based on a clinical system's network and storage capacity.
We recommend you use Storage Transfer Service for a single batch load to populate a DICOM store. Using Storage Transfer Service regularly requires additional work, like a synchronous import pipeline. For more information, see Storage Transfer Service file system transfer details.
dicomStores.import
Use this method to store large volumes of DICOM data.
- DICOMweb Store Transaction
Use this method to store DICOM data programmatically.
Manage quota to optimize data throughput
The following sections describe how to manage and plan quota to optimize data throughput. For general best practices on quota management, see Quota management best practices.
Plan quota for predictable traffic
Plan your quota requirements by first analyzing your client application's typical daily traffic. Even if traffic is predictable, plan for more quota than you need on average. This helps you avoid errors and provides a safety margin against traffic spikes or occasional increases in daily use.
The following diagram shows requests to the Cloud Healthcare API that are consistent in size and sent in predictable patterns:
Plan quota for large volume requests
Avoid scheduling large batch jobs during peak hours. For more information, see Favor low-volume transactions on a consistent basis.
The following diagram shows a predictable traffic pattern. However, a
large volume batch request during a peak traffic period exceeds the available
quota. This can cause 429 Resource Exhausted
errors for all requests in
your project.
If your system has additional flexibility quota, small traffic spikes won't cause errors or cause predictable peak loads to encounter errors. The small spikes must be distributed among many data stores, applications, and other clients producing load within the Google Cloud project.
To prevent a single large batch job from causing traffic spikes, see Avoid large bundles.
Request additional quota
To maintain high data throughput and avoid 429 Resource Exhausted
errors,
see the best practices on this page, especially Manage quota to optimize data throughput.
These best practices ensure that your client application is robust and can scale
with changes in request volume. Requesting additional quota without implementing the best practices is unlikely
to prevent errors over the long term.
If you implement the best practices and still need more quota, see Best practices for requesting additional quota.
Data ingestion throughput resources
For more information about data ingestion throughput, see Manage traffic and load for your workloads in Google Cloud.