DeepSeek models on Vertex AI offer fully managed and serverless models as APIs. To use a DeepSeek model on Vertex AI, send a request directly to the Vertex AI API endpoint. Because DeepSeek models use a managed API, there's no need to provision or manage infrastructure.
You can stream your responses to reduce the end-user latency perception. A streamed response uses server-sent events (SSE) to incrementally stream the response.
Available DeepSeek models
The following models are available from DeepSeek to use in Vertex AI. To access a DeepSeek model, go to its Model Garden model card.
DeepSeek-V3.1
DeepSeek-V3.1 is a hybrid model that supports both thinking mode and non-thinking mode. Compared to the previous version, this upgrade brings improvements in hybrid thinking modes, tool calling, and thinking efficiency.
Go to the DeepSeek-V3.1 model card
DeepSeek R1 (0528)
DeepSeek R1 (0528) is the latest version of the DeepSeek R1 model. Compared to DeepSeek-R1, it has significantly improved depth of reasoning and inference capabilities. DeepSeek R1 (0528) excels in wide range tasks, such as creative writing, general question answering, editing, and summarization.
Considerations
- For production-ready safety, integrate DeepSeek R1 (0528) with Model Armor, which screens LLM prompts and responses for various security and safety risks.
Go to the DeepSeek R1 (0528) model card
Use DeepSeek models
You can use curl commands to send requests to the Vertex AI endpoint using the following model names:
- For DeepSeek-V3.1, use
deepseek-v3.1-maas
- For DeepSeek R1 (0528), use
deepseek-r1-0528-maas
To learn how to make streaming and non-streaming calls to DeepSeek models, see Call open model APIs.
DeepSeek model region availability and quotas
For DeepSeek models, a quota applies for each region where the model is available. The quota is specified in queries per minute (QPM).
Model | Region | Quotas | Context length |
---|---|---|---|
DeepSeek-V3.1 | |||
us-west2 |
|
163,840 | |
DeepSeek R1 (0528) | |||
us-central1 |
|
163,840 |
If you want to increase any of your quotas for Generative AI on Vertex AI, you can use the Google Cloud console to request a quota increase. To learn more about quotas, see the Cloud Quotas overview.
What's next
- Learn how to Call open model APIs.