Vertex AI in express mode lets you quickly try out core
generative AI features that are available on Vertex AI. This tutorial
shows you how to perform the following tasks by using the Vertex AI API
in express mode: The Google Gen AI SDK lets you use Google generative AI models and
features to build AI-powered applications. When using Vertex AI in
express mode, install and initialize the To install the Google Gen AI SDK for express mode, run the following
commands: If you're using Colab, ignore any dependency conflicts and restart the runtime
after installation. Configure the API key for express mode and environment variables. For details on
getting an API key, see Vertex AI in express mode overview. You can send either streaming or non-streaming requests to the
Gemini for Google Cloud API. Streaming requests return the response in chunks as
the request is being processed. To a human user, streamed responses reduce the
perception of latency. Non-streaming requests return the response in one chunk
after the request is processed. To send a streaming request, set The following code sample defines a function that sends a non-streaming request
to the The following code sample declares a function and passes it as a tool, and then
receives a function call part in the response. After you receive the function
call part from the model, you can invoke the function and get the response, and
then pass the response to the model. This tutorial does not create any Google Cloud resources, so no clean up is
needed to avoid charges.
Install and initialize the Google Gen AI SDK for express mode
google-genai
package to
authenticate using your generated API key.Install
# Developer TODO: If you're using Colab, uncomment the following lines:
# from google.colab import auth
# auth.authenticate_user()
!pip install google-genai
!pip install --force-reinstall -qq "numpy<2.0"
Initialize
from google import genai
from google.genai import types
# Developer TODO: Replace YOUR_API_KEY with your API key.
API_KEY = "YOUR_API_KEY"
client = genai.Client(
vertexai=True, api_key=API_KEY
)
Send a request to the Gemini for Google Cloud API
Streaming request
stream=True
and print the response in chunks.from google import genai
from google.genai import types
def generate():
client = genai.Client(vertexai=True, api_key=YOUR_API_KEY)
config=types.GenerateContentConfig(
temperature=0,
top_p=0.95,
top_k=20,
candidate_count=1,
seed=5,
max_output_tokens=100,
stop_sequences=["STOP!"],
presence_penalty=0.0,
frequency_penalty=0.0,
safety_settings=[
types.SafetySetting(
category="HARM_CATEGORY_HATE_SPEECH",
threshold="BLOCK_ONLY_HIGH",
)
],
)
for chunk in client.models.generate_content_stream(
model="gemini-2.5-flash-lite",
contents="Explain bubble sort to me",
config=config,
):
print(chunk.text)
generate()
Non-streaming request
gemini-2.5-flash-lite
. It shows you how to configure basic request
parameters and safety settings.from google import genai
from google.genai import types
def generate():
client = genai.Client(vertexai=True, api_key=YOUR_API_KEY)
config=types.GenerateContentConfig(
temperature=0,
top_p=0.95,
top_k=20,
candidate_count=1,
seed=5,
max_output_tokens=100,
stop_sequences=["STOP!"],
presence_penalty=0.0,
frequency_penalty=0.0,
safety_settings=[
types.SafetySetting(
category="HARM_CATEGORY_HATE_SPEECH",
threshold="BLOCK_ONLY_HIGH",
)
],
)
response = client.models.generate_content(
model="gemini-2.5-flash-lite",
contents="Explain bubble sort to me",
config=config,
)
print(response.text)
generate()
Function calling request
function_response_parts = [
{
'function_response': {
'name': 'get_current_weather',
'response': {
'name': 'get_current_weather',
'content': {'weather': 'super nice'},
},
},
},
]
manual_function_calling_contents = [
{'role': 'user', 'parts': [{'text': 'What is the weather in Boston?'}]},
{
'role': 'model',
'parts': [{
'function_call': {
'name': 'get_current_weather',
'args': {'location': 'Boston'},
}
}],
},
{'role': 'user', 'parts': function_response_parts},
]
function_declarations = [{
'name': 'get_current_weather',
'description': 'Get the current weather in a city',
'parameters': {
'type': 'OBJECT',
'properties': {
'location': {
'type': 'STRING',
'description': 'The location to get the weather for',
},
'unit': {
'type': 'STRING',
'enum': ['C', 'F'],
},
},
},
}]
response = client.models.generate_content(
model="gemini-2.0-flash-001",
contents=manual_function_calling_contents,
config=dict(tools=[{'function_declarations': function_declarations}]),
)
print(response.text)
Clean up
What's next
Tutorial: Vertex AI API in express mode
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-21 UTC.