Using Checkpoints for Large Models

By default, Cloud Fleet Routing can support models that require less than 30 minutes to solve. To be able to solve more complex problems, users can choose to enable the checkpoint mode. The checkpoint mode is only available in the batchOptimizeTours API. After enabling the checkpoint mode, users can send complex model requests that take up to 120 minutes to solve. During the process of the optimization, intermediate checkpoint files will be generated and saved to the user's Cloud Storage.

Checkpoints

A checkpoint is a snapshot of the intermediate optimization result. From users' view, a checkpoint has the exact same format as the final output. The only difference is the checkpoint usually has a worse result compared with the final output. The checkpoint file is generated every 20-30 mins depending on the complexity of the model request. The checkpoint files are saved in the /checkpoints folder, which is auto generated and put in the same folder as the final output. Users can check out these checkpoints themselves and use them to analyze their model request's optimization process.

Without checkpointing, users need to run multiple short optimizations for a single complex model request. Now users can trigger one optimization that runs up to 120 mins and let the API handle the process.

Example Request

This section includes an example showing how to enable the checkpoint mode. Users can enable the checkpoint mode by setting the enable_checkpoints field to true. The checkpoint mode is enabled per model_config. As in the example below, there can be multiple models in one single batch API request. Some models can enable the checkpointing while others don't. The data_format can also be different between models.

{
  "parent": "projects/${YOUR_GCP_PROJECT_ID}",
  "model_configs": [
    {
      "input_config": {
        "gcs_source": {
          "uri": "${REQUEST_MODEL_GCS_PATH_0}"
        },
        "data_format": "STRING"
      },
      "output_config": {
        "gcs_destination": {
          "uri": "${MODEL_SOLUTION_GCS_PATH_0}"
        },
        "data_format": "STRING"
      }
    },
    {
      "input_config": {
        "gcs_source": {
          "uri": "${REQUEST_MODEL_GCS_PATH_1}"
        },
        "data_format": "JSON"
      },
      "output_config": {
        "gcs_destination": {
          "uri": "${MODEL_SOLUTION_GCS_PATH_1}"
        },
        "data_format": "JSON"
      },
      "enable_checkpoints": true
    }
  ]
}

Make sure to update the ${YOUR_GCP_PROJECT_ID} placeholders with your project ID. Make sure to replace the ${REQUEST_MODEL_GCS_PATH_0}, ${MODEL_SOLUTION_GCS_PATH_0}, ${REQUEST_MODEL_GCS_PATH_1} and ${MODEL_SOLUTION_GCS_PATH_1} with your Cloud Storage URI. As in the example above, there can be multiple models in one single batch API request. Some models can enable the checkpointing while others don't. The data_format can also be different between models.