This guide provides a list of errors that you might encounter from using the
Model API reference for Generative
AI. The errors follow
the error model of the Google Cloud API, which recommends
that we provide guidance on the causes and the solutions specific to the
generative AI models. This table provides API error codes and descriptions. Avoid spikes in traffic. Spikes are sudden and significant increases in the
number of requests within a very short period of time. Sometimes, spikes in
traffic might cause issues for quota enforcement and might increase the chance
of server overloading. Be careful about retrying an event. We recommend retrying no more than two
times. The minimum delay is one second with subsequent requests backing up
exponentially.API errors
HTTP error code
Canonical error code
Cause
Example
Solution
400
INVALID_ARGUMENT / FAILED_PRECONDITION
Request fails API validation, or you tried to access a model that requires allowlisting or is disallowed by the organization's policy.
Request exceeds the model's input token limit.
Refer to the Model API reference for Generative AI for request parameters, token count, and other parameters.
403
PERMISSION_DENIED
Client doesn't have sufficient permission to call the API.
Service account doesn't have permission to access the Cloud Storage bucket hosting image or video resources.
1. Verify that all necessary APIs are enabled, and the service account has the right permission to access the selected Vertex AI service.
2. Vertex AI per-product, per-project service account (P4SA) is granted the necessary permission to access resources referenced in the input.
404
NOT_FOUND
No valid object is found from the designated URL.
Image file not found in the storage URL.
Check and fix the file location.
429
RESOURCE_EXHAUSTED
Depending on the error message, the error could be caused by the following:
1. API quota over the limit.
2. Server overload due to shared server capacity.
3. You've reached the daily limit for requests using logprobs
.Gemini API exceeds request per minute limit.
1. Check Vertex AI Generative AI quota limits. If needed, apply for a higher quota.
2. Retry after a few seconds. If the error persists after a prolonged period of time (hours), contact Vertex AI support.
3. Consider purchasing Provisioned Throughput.
499
CANCELLED
Request is cancelled by the client.
500
UNKNOWN / INTERNAL
Server error due to overload or dependency failure.
Request is throttled, because the service is temporarily overloaded.
Retry after a few seconds. If the error persists after a prolonged period of time (hours), contact Vertex AI support.
503
UNAVAILABLE
Service is temporarily unavailable.
Server isn't responding to the incoming requests.
The unavailable status might be temporary. However, if the error persists, contact Vertex AI support.
504
DEADLINE_EXCEEDED
The client sets a deadline shorter than the server's default deadline (10 minutes), and the request didn't finish within the client-provided deadline.
Consider increasing the client-provided deadline.
Handle errors
What's next
Generative AI on Vertex AI inference API errors
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-26 UTC.