Use checkpoints in supervised fine-tuning for Gemini models

This guide shows you how to use checkpoints in supervised fine-tuning for Gemini models. This page covers the following topics:

The following diagram summarizes the overall workflow:

A checkpoint is a snapshot of a model's state at a specific point in the fine-tuning process. By enabling checkpoints in Gemini model supervised fine-tuning, you can do the following:

  • Save tuning progress: Avoid losing work if the tuning job is interrupted.
  • Compare performance: Evaluate intermediate checkpoints to understand how the model improves over time.
  • Prevent overfitting: Select the best-performing checkpoint as the default for deployment, rather than automatically using the final version, which might be overfit to the training data.

For tuning jobs with fewer than 10 epochs, one checkpoint is saved approximately after each epoch. For jobs with 10 or more epochs, about 10 checkpoints are saved at evenly distributed intervals. The final checkpoint is always saved immediately after training completes.

As tuning progresses, each intermediate checkpoint is deployed to a new endpoint. The main endpoint for the tuned model always points to the default checkpoint.

Supported models

The following Gemini models support checkpoints:

  • gemini-2.0-flash-001
  • gemini-2.0-flash-lite-001
  • gemini-2.5-flash
  • gemini-2.5-flash-lite
  • gemini-2.5-pro

For detailed information about Gemini model versions, see Google models and Model versions and lifecycle.

Create a tuning job that exports checkpoints

You can create a supervised fine-tuning job that exports checkpoints by using the Google Gen AI SDK or the Google Cloud console.

Console

To create a tuning job that exports checkpoints, use the Tuning tab on the Vertex AI Studio page. For instructions, see Tune a model.

Google Gen AI SDK

import time

from google import genai
from google.genai.types import HttpOptions, CreateTuningJobConfig, TuningDataset, EvaluationConfig, OutputConfig, GcsDestination, Metric

# TODO(developer): Update and un-comment below line
# output_gcs_uri = "gs://your-bucket/your-prefix"

client = genai.Client(http_options=HttpOptions(api_version="v1beta1"))

training_dataset = TuningDataset(
    gcs_uri="gs://cloud-samples-data/ai-platform/generative_ai/gemini/text/sft_train_data.jsonl",
)
validation_dataset = TuningDataset(
    gcs_uri="gs://cloud-samples-data/ai-platform/generative_ai/gemini/text/sft_validation_data.jsonl",
)

evaluation_config = EvaluationConfig(
    metrics=[
        Metric(
            name="FLUENCY",
            prompt_template="""Evaluate this {response}"""
        )
    ],
    output_config=OutputConfig(
        gcs_destination=GcsDestination(
            output_uri_prefix=output_gcs_uri,
        )
    ),
)

tuning_job = client.tunings.tune(
    base_model="gemini-2.5-flash",
    training_dataset=training_dataset,
    config=CreateTuningJobConfig(
        tuned_model_display_name="Example tuning job",
        # Set to True to disable tuning intermediate checkpoints. Default is False.
        export_last_checkpoint_only=False,
        validation_dataset=validation_dataset,
        evaluation_config=evaluation_config,
    ),
)

running_states = set([
    "JOB_STATE_PENDING",
    "JOB_STATE_RUNNING",
])

while tuning_job.state in running_states:
    print(tuning_job.state)
    tuning_job = client.tunings.get(name=tuning_job.name)
    time.sleep(60)

print(tuning_job.tuned_model.model)
print(tuning_job.tuned_model.endpoint)
print(tuning_job.experiment)
# Example response:
# projects/123456789012/locations/us-central1/models/1234567890@1
# projects/123456789012/locations/us-central1/endpoints/123456789012345
# projects/123456789012/locations/us-central1/metadataStores/default/contexts/tuning-experiment-2025010112345678

if tuning_job.tuned_model.checkpoints:
    for i, checkpoint in enumerate(tuning_job.tuned_model.checkpoints):
        print(f"Checkpoint {i + 1}: ", checkpoint)
    # Example response:
    # Checkpoint 1:  checkpoint_id='1' epoch=1 step=10 endpoint='projects/123456789012/locations/us-central1/endpoints/123456789000000'
    # Checkpoint 2:  checkpoint_id='2' epoch=2 step=20 endpoint='projects/123456789012/locations/us-central1/endpoints/123456789012345'

List the checkpoints for a tuning job

You can view the checkpoints for your completed tuning job in the Google Cloud console or list them by using the Google Gen AI SDK. If intermediate checkpoints are disabled, only the final checkpoint is displayed or returned.

Console

  1. In the Google Cloud console, go to the Vertex AI Studio page.

    Go to Vertex AI Studio

  2. In the Tuning tab, find your model and click Monitor.

    The page displays the tuning metrics and checkpoints for your model. The metrics graphs display checkpoint numbers as annotations:

    • Step number: The exact step when a checkpoint was saved.
    • Epoch number: An estimated epoch number that the checkpoint belongs to. For the final checkpoint of a completed job, this is the exact epoch number.

Google Gen AI SDK

from google import genai
from google.genai.types import HttpOptions

client = genai.Client(http_options=HttpOptions(api_version="v1"))

# Get the tuning job and the tuned model.
# Eg. tuning_job_name = "projects/123456789012/locations/us-central1/tuningJobs/123456789012345"
tuning_job = client.tunings.get(name=tuning_job_name)

if tuning_job.tuned_model.checkpoints:
    for i, checkpoint in enumerate(tuning_job.tuned_model.checkpoints):
        print(f"Checkpoint {i + 1}: ", checkpoint)
# Example response:
# Checkpoint 1:  checkpoint_id='1' epoch=1 step=10 endpoint='projects/123456789012/locations/us-central1/endpoints/123456789000000'
# Checkpoint 2:  checkpoint_id='2' epoch=2 step=20 endpoint='projects/123456789012/locations/us-central1/endpoints/123456789012345'

View model details and checkpoints

You can view your tuned model in the Google Cloud console or use the Google Gen AI SDK to get model details, including its associated endpoints and checkpoints.

The Endpoint field for the tuned model behaves as follows:

  • It always points to the endpoint created for the default checkpoint.
  • The value is empty if the default checkpoint is not deployed, which can happen if tuning is still in progress or if deployment failed.
  • The value is also empty if the tuning job fails to create a model.

Console

You can view your tuned model in the Vertex AI Model Registry and on the Online prediction Endpoints page.

  1. In the Google Cloud console, go to the Model Registry page.

    Go to the Model Registry page

  2. Click the name of your model to see the default version.

  3. Click the Version details tab to see information about your model version.

    Note that the Objective is Large model, the Model type is Foundation, and the Source is Vertex AI Studio tuning.

  4. Click the Deploy & test tab to see the endpoint where the model is deployed.

  5. Click the endpoint name to go to the Endpoint page to see the list of checkpoints that are deployed to the endpoint. For each checkpoint, the model version ID and checkpoint ID are displayed.

Alternatively, you can view checkpoints on the Tuning Job Details page. To see this page, go to the Tuning page and click a tuning job.

Go to the Tuning page

Google Gen AI SDK

from google import genai
from google.genai.types import HttpOptions

client = genai.Client(http_options=HttpOptions(api_version="v1"))

# Get the tuning job and the tuned model.
# Eg. tuning_job_name = "projects/123456789012/locations/us-central1/tuningJobs/123456789012345"
tuning_job = client.tunings.get(name=tuning_job_name)
tuned_model = client.models.get(model=tuning_job.tuned_model.model)
print(tuned_model)
# Example response:
# Model(name='projects/123456789012/locations/us-central1/models/1234567890@1', ...)

print(f"Default checkpoint: {tuned_model.default_checkpoint_id}")
# Example response:
# Default checkpoint: 2

if tuned_model.checkpoints:
    for _, checkpoint in enumerate(tuned_model.checkpoints):
        print(f"Checkpoint {checkpoint.checkpoint_id}: ", checkpoint)
# Example response:
# Checkpoint 1:  checkpoint_id='1' epoch=1 step=10
# Checkpoint 2:  checkpoint_id='2' epoch=2 step=20

Test the checkpoints

You can test each checkpoint in the Google Cloud console or by using the Google Gen AI SDK.

Console

  1. In the Google Cloud console, go to the Vertex AI Studio page.

    Go to Vertex AI Studio

  2. In the Tuning tab, find your model and click Monitor.

  3. In the checkpoint table on the Monitor pane, find the checkpoint that you want to test and click Test.

Google Gen AI SDK

from google import genai
from google.genai.types import HttpOptions

client = genai.Client(http_options=HttpOptions(api_version="v1"))

# Get the tuning job and the tuned model.
# Eg. tuning_job_name = "projects/123456789012/locations/us-central1/tuningJobs/123456789012345"
tuning_job = client.tunings.get(name=tuning_job_name)

contents = "Why is the sky blue?"

# Predicts with the default checkpoint.
response = client.models.generate_content(
    model=tuning_job.tuned_model.endpoint,
    contents=contents,
)
print(response.text)
# Example response:
# The sky is blue because ...

# Predicts with Checkpoint 1.
checkpoint1_response = client.models.generate_content(
    model=tuning_job.tuned_model.checkpoints[0].endpoint,
    contents=contents,
)
print(checkpoint1_response.text)
# Example response:
# The sky is blue because ...

# Predicts with Checkpoint 2.
checkpoint2_response = client.models.generate_content(
    model=tuning_job.tuned_model.checkpoints[1].endpoint,
    contents=contents,
)
print(checkpoint2_response.text)
# Example response:
# The sky is blue because ...

Select a new default checkpoint

After testing, you can set the best-performing checkpoint as the new default. By default, the final checkpoint of a tuning job is set as the default.

  • Deployment: When you deploy a tuned model, the default checkpoint is the version that gets deployed. If you update the default checkpoint, the model's endpoint is updated to point to the new version, which you can use for prediction.
  • Copying models: When you copy a model with checkpoints, all checkpoints are copied, and the default checkpoint setting is preserved in the new model. You can then select a different default checkpoint for the copied model.

Console

  1. In the Google Cloud console, go to the Vertex AI Studio page.

    Go to Vertex AI Studio

  2. In the Tuning tab, find your model and click Monitor.

  3. In the checkpoint table on the Monitor pane, find the checkpoint that you want to set as the default, click Actions, and then select Set as default.

  4. Click Confirm.

    The console updates the metrics graphs and checkpoint table to show the new default checkpoint. The endpoint on the Tuning Job Details page updates to the endpoint of the new default checkpoint.

Google Gen AI SDK

from google import genai
from google.genai.types import HttpOptions, UpdateModelConfig

client = genai.Client(http_options=HttpOptions(api_version="v1"))

# Get the tuning job and the tuned model.
# Eg. tuning_job_name = "projects/123456789012/locations/us-central1/tuningJobs/123456789012345"
tuning_job = client.tunings.get(name=tuning_job_name)
tuned_model = client.models.get(model=tuning_job.tuned_model.model)

print(f"Default checkpoint: {tuned_model.default_checkpoint_id}")
print(f"Tuned model endpoint: {tuning_job.tuned_model.endpoint}")
# Example response:
# Default checkpoint: 2
# projects/123456789012/locations/us-central1/endpoints/123456789012345

# Set a new default checkpoint.
# Eg. checkpoint_id = "1"
tuned_model = client.models.update(
    model=tuned_model.name,
    config=UpdateModelConfig(default_checkpoint_id=checkpoint_id),
)

print(f"Default checkpoint: {tuned_model.default_checkpoint_id}")
print(f"Tuned model endpoint: {tuning_job.tuned_model.endpoint}")
# Example response:
# Default checkpoint: 1
# projects/123456789012/locations/us-central1/endpoints/123456789000000

What's next