Compatibilidad con OpenAI

Se puede acceder a los modelos de Gemini con las bibliotecas de OpenAI (Python y TypeScript/JavaScript), además de la API de REST. Solo se admite Google Cloud Auth con la biblioteca de OpenAI en Vertex AI. Si todavía no usas las bibliotecas de OpenAI, te recomendamos que llames a la API de Gemini directamente.

Python

import openai
from google.auth import default
import google.auth.transport.requests

# TODO(developer): Update and un-comment below lines
#project_id = "PROJECT_ID"
location = "us-central1"

# # Programmatically get an access token
credentials, _ = default(scopes=["https://www.googleapis.com/auth/cloud-platform"])
credentials.refresh(google.auth.transport.requests.Request())

# OpenAI Client
client = openai.OpenAI(
  base_url=f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",
  api_key=credentials.token
)

response = client.chat.completions.create(
  model="google/gemini-2.0-flash-001",
  messages=[
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain to me how AI works"}
  ]
)

print(response.choices[0].message)

¿Qué cambió?

  • api_key=credentials.token: Para usar la autenticación de Google Cloud , obtén un token de autenticación deGoogle Cloud con el código de muestra.

  • base_url: Indica a la biblioteca de OpenAI que envíe solicitudes a Google Clouden lugar de la URL predeterminada.

  • model="google/gemini-2.0-flash-001": Elige un modelo de Gemini compatible entre los que aloja Vertex.

Razonamiento

Los modelos de Gemini 2.5 están entrenados para analizar problemas complejos, lo que mejora significativamente el razonamiento. La API de Gemini incluye un parámetro de"presupuesto de pensamiento" que brinda un control detallado sobre cuánto pensará el modelo.

A diferencia de la API de Gemini, la API de OpenAI ofrece tres niveles de control de pensamiento: "bajo", "medio" y "alto", que se asignan en segundo plano a presupuestos de tokens de pensamiento de 1 K, 8 K y 24 K.

Para inhabilitar el pensamiento, establece el esfuerzo de razonamiento en None.

Python

import openai
from google.auth import default
import google.auth.transport.requests

# TODO(developer): Update and un-comment below lines
#project_id = PROJECT_ID
location = "us-central1"

# # Programmatically get an access token
credentials, _ = default(scopes=["https://www.googleapis.com/auth/cloud-platform"])
credentials.refresh(google.auth.transport.requests.Request())

# OpenAI Client
client = openai.OpenAI(
  base_url=f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",
  api_key=credentials.token
)

response = client.chat.completions.create(
  model="google/gemini-2.5-flash-preview-04-17",
  reasoning_effort="low",
  messages=[
      {"role": "system", "content": "You are a helpful assistant."},
      {
          "role": "user",
          "content": "Explain to me how AI works"
      }
  ]
)
print(response.choices[0].message)

Transmisión

La API de Gemini admite respuestas de transmisión.

Python

import openai
from google.auth import default
import google.auth.transport.requests

# TODO(developer): Update and un-comment below lines
#project_id = PROJECT_ID
location = "us-central1"

credentials, _ = default(scopes=["https://www.googleapis.com/auth/cloud-platform"])
credentials.refresh(google.auth.transport.requests.Request())

client = openai.OpenAI(
  base_url=f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",
  api_key=credentials.token
)
response = client.chat.completions.create(
model="google/gemini-2.0-flash",
messages=[
  {"role": "system", "content": "You are a helpful assistant."},
  {"role": "user", "content": "Hello!"}
],
stream=True
)

for chunk in response:
  print(chunk.choices[0].delta)

Llamada a función

La llamada a función facilita la obtención de resultados de datos estructurados de los modelos generativos y es compatible con la API de Gemini.

Python

import openai
from google.auth import default
import google.auth.transport.requests

# TODO(developer): Update and un-comment below lines
#project_id = PROJECT_ID
location = "us-central1"

credentials, _ = default(scopes=["https://www.googleapis.com/auth/cloud-platform"])
credentials.refresh(google.auth.transport.requests.Request())

client = openai.OpenAI(
  base_url=f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",
  api_key=credentials.token
)

tools = [
{
  "type": "function",
  "function": {
    "name": "get_weather",
    "description": "Get the weather in a given location",
    "parameters": {
      "type": "object",
      "properties": {
        "location": {
          "type": "string",
          "description": "The city and state, e.g. Chicago, IL",
        },
        "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
      },
      "required": ["location"],
    },
  }
}
]

messages = [{"role": "user", "content": "What's the weather like in Chicago today?"}]
response = client.chat.completions.create(
model="google/gemini-2.0-flash",
messages=messages,
tools=tools,
tool_choice="auto"
)

print(response)

Comprensión de imágenes

Los modelos de Gemini son multimodales de forma nativa y ofrecen el mejor rendimiento de su clase en muchas tareas de visión comunes.

Python

from google.auth import default
import google.auth.transport.requests

import base64
from openai import OpenAI

# TODO(developer): Update and un-comment below lines
# project_id = "PROJECT_ID"
location = "us-central1"

# Programmatically get an access token
credentials, _ = default(scopes=["https://www.googleapis.com/auth/cloud-platform"])
credentials.refresh(google.auth.transport.requests.Request())

# OpenAI Client
client = openai.OpenAI(
  base_url=f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",
  api_key=credentials.token,
)

# Function to encode the image
def encode_image(image_path):
with open(image_path, "rb") as image_file:
  return base64.b64encode(image_file.read()).decode('utf-8')

# Getting the base64 string
#base64_image = encode_image("Path/to/image.jpeg")

response = client.chat.completions.create(
model="google/gemini-2.0-flash",
messages=[
  {
    "role": "user",
    "content": [
      {
        "type": "text",
        "text": "What is in this image?",
      },
      {
        "type": "image_url",
        "image_url": {
          "url":  f"data:image/jpeg;base64,{base64_image}"
        },
      },
    ],
  }
],
)

print(response.choices[0])

Generar una imagen

Python

from google.auth import default
import google.auth.transport.requests

import base64
from openai import OpenAI

# TODO(developer): Update and un-comment below lines
# project_id = "PROJECT_ID"
location = "us-central1"

# Programmatically get an access token
credentials, _ = default(scopes=["https://www.googleapis.com/auth/cloud-platform"])
credentials.refresh(google.auth.transport.requests.Request())

# OpenAI Client
client = openai.OpenAI(
  base_url=f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",
  api_key=credentials.token,
)

# Function to encode the image
def encode_image(image_path):
with open(image_path, "rb") as image_file:
  return base64.b64encode(image_file.read()).decode('utf-8')

# Getting the base64 string
#base64_image = encode_image("Path/to/image.jpeg")
base64_image = encode_image("/content/wayfairsofa.jpg")

response = client.chat.completions.create(
model="google/gemini-2.0-flash",
messages=[
  {
    "role": "user",
    "content": [
      {
        "type": "text",
        "text": "What is in this image?",
      },
      {
        "type": "image_url",
        "image_url": {
          "url":  f"data:image/jpeg;base64,{base64_image}"
        },
      },
    ],
  }
],
)

print(response.choices[0])

Comprensión de audio

Analizar la entrada de audio:

Python

from google.auth import default
import google.auth.transport.requests

import base64
from openai import OpenAI

# TODO(developer): Update and un-comment below lines
# project_id = "PROJECT_ID"
location = "us-central1"

# Programmatically get an access token
credentials, _ = default(scopes=["https://www.googleapis.com/auth/cloud-platform"])
credentials.refresh(google.auth.transport.requests.Request())

# OpenAI Client
client = openai.OpenAI(
  base_url=f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",
  api_key=credentials.token,
)

with open("/path/to/your/audio/file.wav", "rb") as audio_file:
base64_audio = base64.b64encode(audio_file.read()).decode('utf-8')

response = client.chat.completions.create(
  model="gemini-2.0-flash",
  messages=[
  {
    "role": "user",
    "content": [
      {
        "type": "text",
        "text": "Transcribe this audio",
      },
      {
            "type": "input_audio",
            "input_audio": {
              "data": base64_audio,
              "format": "wav"
        }
      }
    ],
  }
],
)

print(response.choices[0].message.content)

Resultados estructurados

Los modelos de Gemini pueden generar objetos JSON en cualquier estructura que definas.

Python

from google.auth import default
import google.auth.transport.requests

from pydantic import BaseModel
from openai import OpenAI

# TODO(developer): Update and un-comment below lines
# project_id = "PROJECT_ID"
location = "us-central1"

# Programmatically get an access token
credentials, _ = default(scopes=["https://www.googleapis.com/auth/cloud-platform"])
credentials.refresh(google.auth.transport.requests.Request())

# OpenAI Client
client = openai.OpenAI(
  base_url=f"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/endpoints/openapi",
  api_key=credentials.token,
)

class CalendarEvent(BaseModel):
  name: str
  date: str
  participants: list[str]

completion = client.beta.chat.completions.parse(
  model="google/gemini-2.0-flash",
  messages=[
      {"role": "system", "content": "Extract the event information."},
      {"role": "user", "content": "John and Susan are going to an AI conference on Friday."},
  ],
  response_format=CalendarEvent,
)

print(completion.choices[0].message.parsed)

Limitaciones actuales

  • De forma predeterminada, los tokens de acceso tienen una duración de 1 hora. Después del vencimiento, se deben actualizar. Consulta este ejemplo de código para obtener más información.

  • La compatibilidad con las bibliotecas de OpenAI aún está en versión preliminar mientras ampliamos la compatibilidad con funciones. Si tienes preguntas o problemas, publica tus consultas en la Google Cloud Comunidad.

¿Qué sigue?