Try Gemini 1.5 models, the latest multimodal models in Vertex AI, and see what you can build with up to a 2M token context window. Try Gemini 1.5 models, the latest multimodal models in Vertex AI, and see what you can build with up to a 2M token context window.

Method: projects.locations.publishers.models.serverStreamingPredict

Perform a server-side streaming online prediction request for Vertex LLM streaming.

HTTP request

POST https://{service-endpoint}/v1/{endpoint}:serverStreamingPredict

Where {service-endpoint} is one of the supported service endpoints.

Path parameters

Parameters

Parameters
`endpoint`	`string` Required. The name of the Endpoint requested to serve the prediction. Format: `projects/{project}/locations/{location}/endpoints/{endpoint}`

endpoint

string

Required. The name of the Endpoint requested to serve the prediction. Format: projects/{project}/locations/{location}/endpoints/{endpoint}

Request body

The request body contains data with the following structure:

JSON representation
{ "inputs": [ { object (`Tensor`) } ], "parameters": { object (`Tensor`) } }

Fields

Fields
`inputs[]`	`object (Tensor)` The prediction input.
`parameters`	`object (Tensor)` The parameters that govern the prediction.

inputs[]

object (Tensor)

The prediction input.

parameters

object (Tensor)

The parameters that govern the prediction.

Response body

If successful, the response body contains a stream of StreamingPredictResponse instances.

Authorization scopes

Requires one of the following OAuth scopes:

https://www.googleapis.com/auth/cloud-platform
https://www.googleapis.com/auth/cloud-platform.read-only
https://www.googleapis.com/auth/cloud-vertex-ai.firstparty.predict

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the endpoint resource:

aiplatform.endpoints.predict

For more information, see the IAM documentation.