You can use the Vertex AI SDK for Python to programmatically create solutions using Gemini model classes. You can use the Vertex AI SDK to load one of the multimodal models, Gemini Pro and Gemini Pro Vision. After you load a model, you can use it to generate text from a text prompt, an image, or a text prompt and an image. For more details about Gemini models, see Gemini API models.
The Gemini model classes represented in the Vertex AI SDK are in addition to classes that help you create Vertex AI solutions that aren't related to generative AI and language models. For information about how to use the Vertex AI SDK to automate data ingestion, train models, and get predictions on Vertex AI, see Introduction to the Vertex AI SDK for Python.
Install the Vertex AI SDK
To install the Vertex AI SDK for Python, run the following command:
pip install --upgrade google-cloud-aiplatform
For more information, see Install the Vertex AI SDK for Python. To view the language model section in the Vertex AI SDK reference guide, see Package language models.
Authenticate the Vertex AI SDK
After you install the Vertex AI SDK for Python, you need to authenticate. The following topics explain how to authenticate with the Vertex AI SDK if you're working locally and if you're working in Colaboratory:
If you're developing locally, do the following to set up Application Default Credentials (ADC) in your local environment:
Install the Google Cloud CLI, then initialize it by running the following command:
gcloud init
Create local authentication credentials for your Google Account:
gcloud auth application-default login
A login screen is displayed. After you sign in, your credentials are stored in the local credential file used by ADC. For more information about working with ADC in a local environment, see Local development environment.
If you're working in Colaboratory, run the following command in a Colab cell to authenticate:
from google.colab import auth auth.authenticate_user()
This command opens a window where you can complete the authentication.
Load a Gemini model
To use the Vertex AI SDK to reference a multimodal model, you import
GenerativeModel
from
vertexai.preview.generative_models
, then use
GenerativeModel
to load the model. The following sample code shows you how to
load the Gemini Pro and the Gemini Pro Vision models:
from vertexai.preview.generative_models import GenerativeModel
# Load Gemini Pro
gemini_pro_model = GenerativeModel("gemini-1.0-pro")
# Load Gemini Pro Vision
gemini_pro_vision_model = GenerativeModel("gemini-1.0-pro-vision")
GenerativeModel
class code samples
The GenerativeModel
class represents a Gemini model. You can use it to load
the Gemini Pro or Gemini Pro Vision model. The GenerativeModel
class includes
methods to help you generate content from text, images, and video. The following
code samples demonstrate how to use the GenerativeModel
class.
- Generate content using a text prompt
- Generate content using more than one text prompt
- Generate a description of an image
- Generate content from text and an image
- Generate content from a video
Generate content using a text prompt
The following code sample uses the Gemini Pro multimodal model to generate text about roses:
from vertexai.preview.generative_models import GenerativeModel
gemini_pro_model = GenerativeModel("gemini-1.0-pro")
model_response = gemini_pro_model.generate_content("Why do cars have four wheels?")
print("model_response\n",model_response)
The response to this sample code might be similar to the following. The returned text is truncated for brevity.
candidates {
content {
parts {
text: "1. **Stability:** Four wheels provide a wider base of support,
which increases the vehicle\'s stability. This is especially important
when cornering or driving on uneven surfaces.\n\n2...."
}
}
}
usage_metadata {
prompt_token_count: 7
candidates_token_count: 323
total_token_count: 330
}
Generate content using more than one text prompt
The following code sample uses the Gemini Pro multimodal model to generate text using more than one text prompt:
from vertexai.preview.generative_models import GenerativeModel
gemini_pro_model = GenerativeModel("gemini-1.0-pro")
model_response = gemini_pro_model.generate_content(["What is x multiplied by 2?", "x = 42"])
print("model_response\n",model_response)
The response to this sample code might be similar to the following:
candidates {
content {
parts {
text: "84"
}
}
}
usage_metadata {
prompt_token_count: 13
candidates_token_count: 2
total_token_count: 15
}
Generate a description of an image
The following code sample uses the Gemini Pro Vision multimodal model to generate text that describes a flower:
from vertexai.preview.generative_models import GenerativeModel
from vertexai.preview.generative_models import Part
gemini_pro_vision_model = GenerativeModel("gemini-1.0-pro-vision")
image = Part.from_uri("gs://cloud-samples-data/ai-platform/flowers/daisy/10559679065_50d2b16f6d.jpg", mime_type="image/jpeg")
model_response = gemini_pro_vision_model.generate_content(["what is this image?", image])
print("model_response\n",model_response)
The response to this sample code might be similar to the following:
candidates {
content {
role: "model"
parts {
text: " The image is a photograph of a daisy flower in a field of fallen leaves."
}
}
finish_reason: STOP
safety_ratings {
category: HARM_CATEGORY_HARASSMENT
probability: NEGLIGIBLE
}
safety_ratings {
category: HARM_CATEGORY_HATE_SPEECH
probability: NEGLIGIBLE
}
safety_ratings {
category: HARM_CATEGORY_SEXUALLY_EXPLICIT
probability: NEGLIGIBLE
}
safety_ratings {
category: HARM_CATEGORY_DANGEROUS_CONTENT
probability: NEGLIGIBLE
}
}
usage_metadata {
prompt_token_count: 263
candidates_token_count: 16
total_token_count: 279
}
Generate content from text and an image
The following code sample uses the Gemini Pro Vision multimodal model to generate content using a text prompt and a picture of a flower:
from vertexai import generative_models
from vertexai.generative_models import GenerativeModel
image = generative_models.Part.from_uri("gs://cloud-samples-data/ai-platform/flowers/daisy/10559679065_50d2b16f6d.jpg", mime_type="image/jpeg")
gemini_pro_vision_model = GenerativeModel("gemini-1.0-pro-vision")
model_response = gemini_pro_vision_model.generate_content(["What is shown in this image?", image])
print("model_response\n",model_response)
The response to this sample code might be similar to the following:
candidates {
content {
role: "model"
parts {
text: " This is an image of a white daisy growing in a pile of brown, dead leaves."
}
}
finish_reason: STOP
safety_ratings {
category: HARM_CATEGORY_HARASSMENT
probability: NEGLIGIBLE
}
safety_ratings {
category: HARM_CATEGORY_HATE_SPEECH
probability: NEGLIGIBLE
}
safety_ratings {
category: HARM_CATEGORY_SEXUALLY_EXPLICIT
probability: NEGLIGIBLE
}
safety_ratings {
category: HARM_CATEGORY_DANGEROUS_CONTENT
probability: NEGLIGIBLE
}
}
usage_metadata {
prompt_token_count: 265
candidates_token_count: 18
total_token_count: 283
}
Generate content from a video
You can use streaming to generate content using a video. Gemini models support streaming for only video content. The following code sample uses the Gemini Pro Vision multimodal model to generate content using a text prompt and a video of an advertisement for a movie.
from vertexai import generative_models
from vertexai.generative_models import GenerativeModel
gemini_pro_vision_model = GenerativeModel("gemini-1.0-pro-vision")
response = gemini_pro_vision_model.generate_content([
"What is in the video? ",
generative_models.Part.from_uri("gs://cloud-samples-data/video/animals.mp4", mime_type="video/mp4"),
], stream=True)
for chunk in response :
print(chunk.text)
The response to this sample code might be similar to the following:
The video is an advertisement for the movie Zootopia. It features a tiger, an
otter, and a sloth. The tiger is shown looking at the camera , while the otter
and the sloth are shown swimming in a pool. The video is set to the song "Try
Everything" by Shakira."
What's next
- Learn about foundation model classes and the Vertex AI SDK.
- Learn how to use text model classes and the Vertex AI SDK.
- Learn about use code model classes and the Vertex AI SDK.