I modelli Gemini sono accessibili utilizzando le librerie OpenAI (Python e TypeScript/JavaScript) insieme all'API REST. È supportata solo l'autenticazione Google Cloud utilizzando la libreria OpenAI in Vertex AI. Se non utilizzi già le librerie OpenAI, ti consigliamo di chiamare direttamente l'API Gemini.
Python
import openai
from google.auth import default
import google.auth.transport.requests
# TODO(developer): Update and un-comment below lines
#project_id = "PROJECT_ID"
location = "us-central1"
# # Programmatically get an access token
credentials, _ = default(scopes=["https://www.googleapis.com/auth/cloud-platform"])
credentials.refresh(google.auth.transport.requests.Request())
# OpenAI Client
client = openai.OpenAI(
base_url=f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",
api_key=credentials.token
)
response = client.chat.completions.create(
model="google/gemini-2.0-flash-001",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain to me how AI works"}
]
)
print(response.choices[0].message)
Che cosa è cambiato?
api_key=credentials.token
: per utilizzare l'autenticazione Google Cloud , ottieni un token di autenticazioneGoogle Cloud utilizzando il codice campione.base_url
: indica alla libreria OpenAI di inviare le richieste a Google Cloud anziché all'URL predefinito.model="google/gemini-2.0-flash-001"
: scegli un modello Gemini compatibile tra quelli ospitati da Vertex.
Pensiero
I modelli Gemini 2.5 sono addestrati a ragionare su problemi complessi, il che porta a un ragionamento notevolmente migliorato. L'API Gemini include un parametro "budget di pensiero" che consente di controllare in modo granulare la quantità di pensiero del modello.
A differenza dell'API Gemini, l'API OpenAI offre tre livelli di controllo del pensiero: "basso", "medio" e "alto", che vengono mappati dietro le quinte ai budget dei token di pensiero di 1000, 8000 e 24.000.
Per disattivare la funzionalità di pensiero, imposta lo sforzo di ragionamento su None
.
Python
import openai
from google.auth import default
import google.auth.transport.requests
# TODO(developer): Update and un-comment below lines
#project_id = PROJECT_ID
location = "us-central1"
# # Programmatically get an access token
credentials, _ = default(scopes=["https://www.googleapis.com/auth/cloud-platform"])
credentials.refresh(google.auth.transport.requests.Request())
# OpenAI Client
client = openai.OpenAI(
base_url=f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",
api_key=credentials.token
)
response = client.chat.completions.create(
model="google/gemini-2.5-flash-preview-04-17",
reasoning_effort="low",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{
"role": "user",
"content": "Explain to me how AI works"
}
]
)
print(response.choices[0].message)
Streaming
L'API Gemini supporta le risposte in streaming.
Python
import openai
from google.auth import default
import google.auth.transport.requests
# TODO(developer): Update and un-comment below lines
#project_id = PROJECT_ID
location = "us-central1"
credentials, _ = default(scopes=["https://www.googleapis.com/auth/cloud-platform"])
credentials.refresh(google.auth.transport.requests.Request())
client = openai.OpenAI(
base_url=f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",
api_key=credentials.token
)
response = client.chat.completions.create(
model="google/gemini-2.0-flash",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
],
stream=True
)
for chunk in response:
print(chunk.choices[0].delta)
usa la chiamata di funzione
La chiamata di funzioni semplifica l'ottenimento di output di dati strutturati da modelli generativi ed è supportata nell'API Gemini.
Python
import openai
from google.auth import default
import google.auth.transport.requests
# TODO(developer): Update and un-comment below lines
#project_id = PROJECT_ID
location = "us-central1"
credentials, _ = default(scopes=["https://www.googleapis.com/auth/cloud-platform"])
credentials.refresh(google.auth.transport.requests.Request())
client = openai.OpenAI(
base_url=f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",
api_key=credentials.token
)
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. Chicago, IL",
},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
},
"required": ["location"],
},
}
}
]
messages = [{"role": "user", "content": "What's the weather like in Chicago today?"}]
response = client.chat.completions.create(
model="google/gemini-2.0-flash",
messages=messages,
tools=tools,
tool_choice="auto"
)
print(response)
Comprensione delle immagini
I modelli Gemini sono multimodali in modo nativo e offrono prestazioni ottimali in molte attività di visione comuni.
Python
from google.auth import default
import google.auth.transport.requests
import base64
from openai import OpenAI
# TODO(developer): Update and un-comment below lines
# project_id = "PROJECT_ID"
location = "us-central1"
# Programmatically get an access token
credentials, _ = default(scopes=["https://www.googleapis.com/auth/cloud-platform"])
credentials.refresh(google.auth.transport.requests.Request())
# OpenAI Client
client = openai.OpenAI(
base_url=f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",
api_key=credentials.token,
)
# Function to encode the image
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
# Getting the base64 string
#base64_image = encode_image("Path/to/image.jpeg")
response = client.chat.completions.create(
model="google/gemini-2.0-flash",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "What is in this image?",
},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}"
},
},
],
}
],
)
print(response.choices[0])
Genera un'immagine
Python
from google.auth import default
import google.auth.transport.requests
import base64
from openai import OpenAI
# TODO(developer): Update and un-comment below lines
# project_id = "PROJECT_ID"
location = "us-central1"
# Programmatically get an access token
credentials, _ = default(scopes=["https://www.googleapis.com/auth/cloud-platform"])
credentials.refresh(google.auth.transport.requests.Request())
# OpenAI Client
client = openai.OpenAI(
base_url=f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",
api_key=credentials.token,
)
# Function to encode the image
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
# Getting the base64 string
#base64_image = encode_image("Path/to/image.jpeg")
base64_image = encode_image("/content/wayfairsofa.jpg")
response = client.chat.completions.create(
model="google/gemini-2.0-flash",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "What is in this image?",
},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}"
},
},
],
}
],
)
print(response.choices[0])
Comprensione dell'audio
Analizza l'input audio:
Python
from google.auth import default
import google.auth.transport.requests
import base64
from openai import OpenAI
# TODO(developer): Update and un-comment below lines
# project_id = "PROJECT_ID"
location = "us-central1"
# Programmatically get an access token
credentials, _ = default(scopes=["https://www.googleapis.com/auth/cloud-platform"])
credentials.refresh(google.auth.transport.requests.Request())
# OpenAI Client
client = openai.OpenAI(
base_url=f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",
api_key=credentials.token,
)
with open("/path/to/your/audio/file.wav", "rb") as audio_file:
base64_audio = base64.b64encode(audio_file.read()).decode('utf-8')
response = client.chat.completions.create(
model="gemini-2.0-flash",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "Transcribe this audio",
},
{
"type": "input_audio",
"input_audio": {
"data": base64_audio,
"format": "wav"
}
}
],
}
],
)
print(response.choices[0].message.content)
Output strutturato
I modelli Gemini possono generare oggetti JSON in qualsiasi struttura definita.
Python
from google.auth import default
import google.auth.transport.requests
from pydantic import BaseModel
from openai import OpenAI
# TODO(developer): Update and un-comment below lines
# project_id = "PROJECT_ID"
location = "us-central1"
# Programmatically get an access token
credentials, _ = default(scopes=["https://www.googleapis.com/auth/cloud-platform"])
credentials.refresh(google.auth.transport.requests.Request())
# OpenAI Client
client = openai.OpenAI(
base_url=f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",
api_key=credentials.token,
)
class CalendarEvent(BaseModel):
name: str
date: str
participants: list[str]
completion = client.beta.chat.completions.parse(
model="google/gemini-2.0-flash",
messages=[
{"role": "system", "content": "Extract the event information."},
{"role": "user", "content": "John and Susan are going to an AI conference on Friday."},
],
response_format=CalendarEvent,
)
print(completion.choices[0].message.parsed)
Limitazioni correnti
Per impostazione predefinita, i token di accesso hanno una durata di 1 ora. Dopo la scadenza, devono essere aggiornati. Per ulteriori informazioni, consulta questo esempio di codice.
Il supporto delle librerie OpenAI è ancora in anteprima mentre estendiamo il supporto delle funzionalità. Per domande o problemi, pubblica un post nella Google Cloud community.
Passaggi successivi
Sfrutta il potenziale di Gemini utilizzando le librerie Google Gen AI.
Visualizza altri esempi utilizzando l'API Chat Completions con la sintassi compatibile con OpenAI.
Consulta la pagina Panoramica per scoprire quali modelli e parametri Gemini sono supportati.