Available models in Generative AI Studio

Vertex AI features a growing list of foundation models that you can test, deploy, and customize for use in your applications. Each foundation model is fine-tuned for specific use cases and is offered at different price points. This page summarizes the models that are available and gives you guidance on which models to use.

To learn more about all AI models and APIs on Vertex AI, see Explore AI models and APIs.

Model naming scheme

Foundation model names have three components: use case, model size, and version number. The naming convention is in the format <use case>-<model size>@<version number>. For example, text-bison@001 represents the Bison text model, version 001.

The model sizes are as follows:

  • Bison: The best value in terms of capability and cost.
  • Gecko: The smallest and lowest cost model for simple tasks.

Foundation models

The following table gives you an overview of the foundation models that are available in Vertex AI.

Model name Description Model properties
text-bison@001 Fine-tuned to follow natural language instructions and is suitable for a variety of language tasks, such as:
  • Classification
  • Sentiment Analysis
  • Entity extraction
  • Extractive Question Answering
  • Summarization
  • Re-writing text in a different style
  • Ad copy generation
  • Concept ideation
Max input token: 8,192
Max output tokens: 1,024
Training data: Up to Feb 2023
textembedding-gecko@001
(Model tuning not supported)
Returns model embeddings for text inputs. 3,072 input tokens and outputs 768-dimensional vector embeddings.
chat-bison@001
(model tuning not supported)
Fine-tuned for multi-turn conversation use cases. Max input token: 4,096
Max output tokens: 1,024
Training data: Up to Feb 2023
Max turns : 2,500

Language support

PaLM models currently only support English.

Parameter definitions

Requests to the Vertex AI PaLM API require different parameter configurations based on the model type.

Text model parameters

Parameter Description Acceptable values

prompt

Text input to generate model response. Prompts can include preamble, questions, suggestions, instructions, or examples. Text

temperature

The temperature is used for sampling during the response generation, which occurs when topP and topK are applied. Temperature controls the degree of randomness in token selection. Lower temperatures are good for prompts that require a more deterministic and less open-ended or creative response, while higher temperatures can lead to more diverse or creative results. A temperature of 0 is deterministic: the highest probability response is always selected. For most use cases, try starting with a temperature of 0.2.

0.0–1.0

Default: 0

maxOutputTokens

Maximum number of tokens that can be generated in the response. Specify a lower value for shorter responses and a higher value for longer responses.

A token may be smaller than a word. A token is approximately four characters. 100 tokens correspond to roughly 60-80 words.

1–1024

Default: 128

topK

Top-k changes how the model selects tokens for output. A top-k of 1 means the selected token is the most probable among all tokens in the model's vocabulary (also called greedy decoding), while a top-k of 3 means that the next token is selected from among the 3 most probable tokens (using temperature).

For each token selection step, the top K tokens with the highest probabilities are sampled. Then tokens are further filtered based on topP with the final token selected using temperature sampling.

Specify a lower value for less random responses and a higher value for more random responses.

1–40

Default: 40

topP

Top-p changes how the model selects tokens for output. Tokens are selected from most K (see topK parameter) probable to least until the sum of their probabilities equals the top-p value. For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1 and the top-p value is 0.5, then the model will select either A or B as the next token (using temperature) and doesn't consider C. The default top-p value is 0.95.

Specify a lower value for less random responses and a higher value for more random responses.

0.0–1.0

Default: 0.95

Sample code for text

REST

MODEL_ID="text-bison"
PROJECT_ID=PROJECT_ID

curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://us-central1-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/us-central1/publishers/google/models/${MODEL_ID}:predict -d \
$'{
  "instances": [
    { "prompt": "Hello"}
  ],
  "parameters": {
    "temperature": 0.2,
    "maxOutputTokens": 256,
    "topK": 40,
    "topP": 0.95
  }
}'

Python

from vertexai.preview.language_models import TextGenerationModel

def text_summarization_example(temperature=.2):
    """Summarization Example with a Large Language Model"""
    model = TextGenerationModel.from_pretrained("text-bison")
    response = model.predict(
        'Hello',
        temperature=temperature,
        top_k=40,
        top_p=.95,
        max_output_tokens=256,
    )
    print(f"Response from Model: {response.text}")

Chat model parameters

For chat API calls, the context, examples, and messages combine to form the prompt.

Parameter Description Acceptable values

context

(optional)

Context shapes how the model responds throughout the conversation. For example, you can use context to specify words the model can or cannot use, topics to focus on or avoid, or the response format or style. Text

examples

(optional)

List of structured messages to the model to learn how to respond to the conversation.
List[Structured Message]
   "input": {"content": "provide content"},
   "output": {"content": "provide content"}
}

messages

(required)

Conversation history provided to the model in a structured alternate-author form. Messages appear in chronological order: oldest first, newest last. When the history of messages causes the input to exceed the maximum length, the oldest messages are removed until the entire prompt is within the allowed limit.
List[Structured Message]
    "author": "user",
     "content": "user message",
}

temperature

The temperature is used for sampling during the response generation, which occurs when topP and topK are applied. Temperature controls the degree of randomness in token selection. Lower temperatures are good for prompts that require a more deterministic and less open-ended or creative response, while higher temperatures can lead to more diverse or creative results. A temperature of 0 is deterministic: the highest probability response is always selected. For most use cases, try starting with a temperature of 0.2.

0.0–1.0

Default: 0

maxOutputTokens

Maximum number of tokens that can be generated in the response. Specify a lower value for shorter responses and a higher value for longer responses.

A token may be smaller than a word. A token is approximately four characters. 100 tokens correspond to roughly 60-80 words.

1–1024

Default: 128

topK

Top-k changes how the model selects tokens for output. A top-k of 1 means the selected token is the most probable among all tokens in the model's vocabulary (also called greedy decoding), while a top-k of 3 means that the next token is selected from among the 3 most probable tokens (using temperature).

For each token selection step, the top K tokens with the highest probabilities are sampled. Then tokens are further filtered based on topP with the final token selected using temperature sampling.

Specify a lower value for less random responses and a higher value for more random responses.

1–40

Default: 40

topP

Top-p changes how the model selects tokens for output. Tokens are selected from most K (see topK parameter) probable to least until the sum of their probabilities equals the top-p value. For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1 and the top-p value is 0.5, then the model will select either A or B as the next token (using temperature) and doesn't consider C. The default top-p value is 0.95.

Specify a lower value for less random responses and a higher value for more random responses.

0.0–1.0

Default: 0.95

Sample code for chat

REST

MODEL_ID="chat-bison"
PROJECT_ID=PROJECT_ID

curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://us-central1-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/us-central1/endpoints/${MODEL_ID}:predict -d \
'{
  "instances": [{
      "context":  "My name is Ned. You are my personal assistant. My favorite movies are Lord of the Rings and Hobbit.",
      "examples": [{
          "input": {"content": "Who do you work for?"},
          "output": {"content": "I work for Ned."}
      },
      {
          "input": {"content": "What do I like?"},
          "output": {"content": "Ned likes watching movies."}
      }],
      "messages": [
      {
          "author": "user",
          "content": "Are my favorite movies based on a book series?",
      },
      {
          "author": "bot",
          "content": "Yes, your favorite movies, The Lord of the Rings and The Hobbit, are based on book series by J.R.R. Tolkien.",
      },
      {
          "author": "user",
          "content": "When where these books published?",
      }],
   }],
  "parameters": {
    "temperature": 0.3,
    "maxDecodeSteps": 200,
    "topP": 0.8,
    "topK": 40
  }
}'

Python

from vertexai.preview.language_models.language_models import ChatModel, InputOutputTextPair

chat_model = ChatModel.from_pretrained("chat-bison")

chat = chat_model.start_chat(
    # Optional:
    context="My name is Ned. You are my personal assistant. My favorite movies are Lord of the Rings and Hobbit.",
    examples=[
        InputOutputTextPair(
            input_text="Who do you work for?",
            output_text="I work for Ned.",
        ),
        InputOutputTextPair(
            input_text="What do I like?",
            output_text="Ned likes watching movies.",
        ),
    ],
)

print(chat.send_message("Are my favorite movies based on a book series?"))

print(chat.send_message("When where these books published?"))

What's next